Placeholder image for staff profiles

Turab Iqbal

Postgraduate Research Student
+44 (0)1483 684742
09 BB 01

My publications


Iqbal Turab, Wang Wenwu (2018) Approximate Message Passing for Underdetermined Audio Source Separation, Proceedings of The IET 3rd International Conference on Intelligent Signal Processing (ISP 2017) Institution of Engineering and Technology (IET)
Approximate message passing (AMP) algorithms have shown great promise in sparse signal reconstruction due to their low computational requirements and fast convergence to an exact solution. Moreover, they provide a probabilistic framework that is often more intuitive than alternatives such as convex optimisation. In this paper, AMP is used for audio source separation from underdetermined instantaneous mixtures. In the time-frequency domain, it is typical to assume a priori that the sources are sparse, so we solve the corresponding sparse linear inverse problem using AMP. We present a block-based approach that uses AMP to process multiple time-frequency points simultaneously. Two algorithms known as AMP and vector AMP (VAMP) are evaluated in particular. Results show that they are promising in terms of artefact suppression.
Iqbal Turab, Xu Yong, Kong Qiuqiang, Wang Wenwu (2018) Capsule Routing for Sound Event Detection, Proceedings of 2018 26th European Signal Processing Conference (EUSIPCO) pp. 2255-2259 IEEE
The detection of acoustic scenes is a challenging
problem in which environmental sound events must be detected
from a given audio signal. This includes classifying the events as
well as estimating their onset and offset times. We approach this
problem with a neural network architecture that uses the recentlyproposed
capsule routing mechanism. A capsule is a group of
activation units representing a set of properties for an entity of
interest, and the purpose of routing is to identify part-whole
relationships between capsules. That is, a capsule in one layer is
assumed to belong to a capsule in the layer above in terms of the
entity being represented. Using capsule routing, we wish to train
a network that can learn global coherence implicitly, thereby
improving generalization performance. Our proposed method is
evaluated on Task 4 of the DCASE 2017 challenge. Results show
that classification performance is state-of-the-art, achieving an Fscore
of 58.6%. In addition, overfitting is reduced considerably
compared to other architectures.
Kong Qiuqiang, Iqbal Turab, Xu Yong, Wang Wenwu, Plumbley Mark D (2018) DCASE 2018 Challenge Surrey Cross-task convolutional neural network baseline, DCASE2018 Workshop
The Detection and Classi?cation of Acoustic Scenes and Events (DCASE) consists of ?ve audio classi?cation and sound event detectiontasks: 1)Acousticsceneclassi?cation,2)General-purposeaudio tagging of Freesound, 3) Bird audio detection, 4) Weakly-labeled semi-supervised sound event detection and 5) Multi-channel audio classi?cation. In this paper, we create a cross-task baseline system for all ?ve tasks based on a convlutional neural network (CNN): a ?CNN Baseline? system. We implemented CNNs with 4 layers and 8 layers originating from AlexNet and VGG from computer vision. We investigated how the performance varies from task to task with the same con?guration of neural networks. Experiments show that deeper CNN with 8 layers performs better than CNN with 4 layers on all tasks except Task 1. Using CNN with 8 layers, we achieve an accuracy of 0.680 on Task 1, an accuracy of 0.895 and a mean average precision (MAP) of 0.928 on Task 2, an accuracy of 0.751 andanareaunderthecurve(AUC)of0.854onTask3,asoundevent detectionF1scoreof20.8%onTask4,andanF1scoreof87.75%on Task 5. We released the Python source code of the baseline systems under the MIT license for further research.
Iqbal Turab, Kong Qiuqiang, Plumbley Mark D, Wang Wenwu (2018) General-purpose audio tagging from noisy labels using convolutional neural networks, Proceedings of the Detection and Classification of Acoustic Scenes and Events 2018 Workshop (DCASE2018 pp. 212-216 Tampere University of Technology
General-purpose audio tagging refers to classifying sounds that are
of a diverse nature, and is relevant in many applications where
domain-specific information cannot be exploited. The DCASE 2018
challenge introduces Task 2 for this very problem. In this task, there
are a large number of classes and the audio clips vary in duration.
Moreover, a subset of the labels are noisy. In this paper, we propose
a system to address these challenges. The basis of our system is
an ensemble of convolutional neural networks trained on log-scaled
mel spectrograms. We use preprocessing and data augmentation
methods to improve the performance further. To reduce the effects
of label noise, two techniques are proposed: loss function weighting
and pseudo-labeling. Experiments on the private test set of this task
show that our system achieves state-of-the-art performance with a
mean average precision score of 0.951
Kong Qiuqiang, Xu Yong, Iqbal Turab, Cao Yin, Wang Wenwu, Plumbley Mark D. (2019) Acoustic scene generation with conditional SampleRNN, Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2019) Institute of Electrical and Electronics Engineers (IEEE)
Acoustic scene generation (ASG) is a task to generate waveforms
for acoustic scenes. ASG can be used to generate audio
scenes for movies and computer games. Recently, neural networks
such as SampleRNN have been used for speech and
music generation. However, ASG is more challenging due to
its wide variety. In addition, evaluating a generative model is
also difficult. In this paper, we propose to use a conditional
SampleRNN model to generate acoustic scenes conditioned on
the input classes. We also propose objective criteria to evaluate
the quality and diversity of the generated samples based on
classification accuracy. The experiments on the DCASE 2016
Task 1 acoustic scene data show that with the generated audio
samples, a classification accuracy of 65:5% can be achieved
compared to samples generated by a random model of 6:7%
and samples from real recording of 83:1%. The performance
of a classifier trained only on generated samples achieves an
accuracy of 51:3%, as opposed to an accuracy of 6:7% with
samples generated by a random model.
Kong Qiuqiang, Yu Changsong, Xu Yong, Iqbal Turab, Wang Wenwu, Plumbley Mark D. (2019) Weakly Labelled AudioSet Tagging With Attention Neural Networks, IEEE/ACM Transactions on Audio, Speech, and Language Processing 27 (11) pp. 1791-1802 IEEE
Audio tagging is the task of predicting the presence or absence of sound classes within an audio clip. Previous work in audio tagging focused on relatively small datasets limited to recognising a small number of sound classes. We investigate audio tagging on AudioSet, which is a dataset consisting of over 2 million audio clips and 527 classes. AudioSet is weakly labelled, in that only the presence or absence of sound classes is known for each clip, while the onset and offset times are unknown. To address the weakly-labelled audio tagging problem, we propose attention neural networks as a way to attend the most salient parts of an audio clip. We bridge the connection between attention neural networks and multiple instance learning (MIL) methods, and propose decision-level and feature-level attention neural networks for audio tagging. We investigate attention neural networks modelled by different functions, depths and widths. Experiments on AudioSet show that the feature-level attention neural network achieves a state-of-the-art mean average precision (mAP) of 0.369, outperforming the best multiple instance learning (MIL) method of 0.317 and Google?s deep neural network baseline of 0.314. In addition, we discover that the audio
tagging performance on AudioSet embedding features has a weak correlation with the number of training examples and the quality of labels of each sound class.