About

Areas of specialism

Audio scene classification; Compression of CNNs; Signal processing; Machine learning

Publications

ARSHDEEP SINGH, Raju Arvind, Padmanabhan Rajan (2022)HEALTH MONITORING OF INDUSTRIAL MACHINES USING SCENE-AWARE THRESHOLD SELECTION

This paper presents an autoencoder based unsupervised approach to identify anomaly in an industrial machine using sounds produced by the machine. The proposed framework is trained using log-melspectrogram representations of the sound signal. In classification, our hypothesis is that the reconstruction error computed for an abnormal machine is larger than that of the a normal machine, since only normal machine sounds are being used to train the autoencoder. A threshold is chosen to discriminate between normal and abnormal machines. However, the threshold changes as surrounding conditions vary. To select an appropriate threshold irrespective of the surrounding, we propose a scene classification framework, which can classify the underlying surrounding. Hence, the threshold can be selected adaptively irrespective of the surrounding. The experiment evaluation is performed on MIMII dataset for industrial machines namely fan, pump, valve and slide rail. Our experiment analysis shows that utilizing adaptive threshold, the performance improves significantly as that obtained using the fixed threshold computed for a given surrounding only.

Arshdeep Singh, Mark D. Plumbley (2022)Low-complexity CNNs for acoustic scene classification

This paper presents a low-complexity framework for acoustic scene classification (ASC). Most of the frameworks designed for ASC use convolutional neural networks (CNNs) due to their learning ability and improved performance compared to hand-engineered features. However, CNNs are resource hungry due to their large size and high computational complexity. Therefore, CNNs are difficult to deploy on resource constrained devices. This paper addresses the problem of reducing the computational complexity and memory requirement in CNNs. We propose a low-complexity CNN architecture, and apply pruning and quantization to further reduce the parameters and memory. We then propose an ensemble framework that combines various low-complexity CNNs to improve the overall performance. An experimental evaluation of the proposed framework is performed on the publicly available DCASE 2022 Task 1 that focuses on ASC. The proposed ensemble framework has approximately 60K parameters, requires 19M multiply-accumulate operations and improves the performance by approximately 2-4 percentage points compared to the DCASE 2022 Task 1 baseline network .

Yang Xiao, XUBO LIU, JAMES ALEXANDER KING, ARSHDEEP SINGH, Eng Siong Chng, Mark D. PLUMBLEY, WENWU WANG (2022)CONTINUAL LEARNING FOR ON-DEVICE ENVIRONMENTAL SOUND CLASSIFICATION

Continuously learning new classes without catastrophic forgetting is a challenging problem for on-device environmental sound classification given the restrictions on computation resources (e.g., model size, running memory). To address this issue, we propose a simple and efficient continual learning method. Our method selects the historical data for the training by measuring the per-sample classification uncertainty. Specifically, we measure the uncertainty by observing how the classification probability of data fluctuates against the parallel perturbations added to the classifier embedding. In this way, the computation cost can be significantly reduced compared with adding perturbation to the raw data. Experimental results on the DCASE 2019 Task 1 and ESC-50 dataset show that our proposed method outperforms baseline continual learning methods on classification accuracy and computational efficiency, indicating our method can efficiently and incrementally learn new classes without the catastrophic forgetting problem for on-device environmental sound classification.

We present a method to develop low-complexity convolu-tional neural networks (CNNs) for acoustic scene classification (ASC). The large size and high computational complexity of typical CNNs is a bottleneck for their deployment on resource-constrained devices. We propose a passive filter pruning framework , where a few convolutional filters from the CNNs are eliminated to yield compressed CNNs. Our hypothesis is that similar filters produce similar responses and give redundant information allowing such filters to be eliminated from the network. To identify similar filters, a cosine distance based greedy algorithm is proposed. A fine-tuning process is then performed to regain much of the performance lost due to filter elimination. To perform efficient fine-tuning, we analyze how the performance varies as the number of fine-tuning training examples changes. An experimental evaluation of the proposed framework is performed on the publicly available DCASE 2021 Task 1A base-line network trained for ASC. The proposed method is simple, reduces computations per inference by 27%, with 25% fewer parameters, with less than 1% drop in accuracy.