# Professor Mark Plumbley

## Professor of Signal Processing

Email: m.plumbley@surrey.ac.uk

Phone: Work: 01483 68 9843

Room no: 03 BB 01

## Further information

## Biography

I was awarded my PhD degree in neural networks from Cambridge University Engineering Department in 1991, then becoming a Lecturer at King's College London.

I moved to Queen Mary University of London in 2002, later becoming Professor of Machine Learning and Signal Processing, and Director of the Centre for Digital Music. I joined the University of Surrey in January 2015 to become Professor of Signal Processing in the Centre for Vision, Speech and Signal Processing (CVSSP).

## Research Interests

My research concerns the analysis and processing of audio and music, using a wide range of signal processing techniques, including independent component analysis (ICA) and sparse representations.

As part of an EPSRC Leadership Fellowship on Machine Listening using Sparse Representations (2009-2014) I have been extending this work to include analysis of real-world sounds, including birdsong and environmental audio. Through the EPSRC-funded project SoundSofware.ac.uk, I have been promoting sustainable research software in audio and music research, including training researchers to follow the principles of reproducible research.

My new projects at CVSSP include an EPSRC project on "Musical Audio Repurposing using Source Separation", and I am coordinating two EU-funded Marie Curie Initial/Innovative Training Networks (ITNs) in Sparse Representations/Compressed Sensing and Machine Sensing.

## Publications

### Preprints

For preprints for recent papers, see Surrey Research Insight, Plumbley, M.

For preprints of earlier papers, see my earlier publications page.

### Journal articles

- 'Wideband spectrum sensing on real-time signals at sub-Nyquist sampling rates in single and cooperative multiple nodes'.
IEEE Transactions on Signal Processing, 64 (12), pp. 3106-3117.
**Full text**is available at: http://epubs.surrey.ac.uk/810680/#### Abstract

This paper presents two new algorithms for wideband spectrum sensing at sub-Nyquist sampling rates, for both single nodes and cooperative multiple nodes. In single-node spectrum sensing, a two-phase spectrum sensing algorithm based on compressive sensing is proposed to reduce the computational complexity and improve the robustness at secondary users (SUs). In the cooperative multiple nodes case, the signals received at SUs exhibit a sparsity property that yields a low-rank matrix of compressed measurements at the fusion center. This therefore leads to a two-phase cooperative spectrum sensing algorithm for cooperative multiple SUs based on low-rank matrix completion. In addition, the two proposed spectrum sensing algorithms are evaluated on the TV white space (TVWS), in which pioneering work aimed at enabling dynamic spectrum access into practice has been promoted by both the Federal Communications Commission and the U.K. Office of Communications. The proposed algorithms are tested on the real-time signals after they have been validated by the simulated signals in TVWS. The numerical results show that our proposed algorithms are more robust to channel noise and have lower computational complexity than the state-of-the-art algorithms.

.
(2016) - 'Non-Negative Group Sparsity with Subspace Note Modelling for Polyphonic Transcription'.
IEEE/ACM Transactions on Audio, Speech and Language Processing, 24 (3), pp. 530-542.
**Full text**is available at: http://epubs.surrey.ac.uk/810679/#### Abstract

Automatic music transcription (AMT) can be performed by deriving a pitch-time representation through decomposition of a spectrogram with a dictionary of pitch-labelled atoms. Typically, non-negative matrix factorisation (NMF) methods are used to decompose magnitude spectrograms. One atom is often used to represent each note. However, the spectrum of a note may change over time. Previous research considered this variability using different atoms to model specific parts of a note, or large dictionaries comprised of datapoints from the spectrograms of full notes. In this paper, the use of subspace modelling of note spectra is explored, with group sparsity employed as a means of coupling activations of related atoms into a pitched subspace. Stepwise and gradient-based methods for non-negative group sparse decompositions are proposed. Finally, a group sparse NMF approach is used to tune a generic harmonic subspace dictionary, leading to improved NMF-based AMT results.

.
(2016) - 'Analysis SimCO Algorithms for Sparse Analysis Model Based Dictionary Learning'.
IEEE Transactions on Signal Processing, 64 (2), pp. 417-431.
**Full text**is available at: http://epubs.surrey.ac.uk/809038/#### Abstract

In this paper, we consider the dictionary learning problem for the sparse analysis model. A novel algorithm is proposed by adapting the simultaneous codeword optimization (SimCO) algorithm, based on the sparse synthesis model, to the sparse analysis model. This algorithm assumes that the analysis dictionary contains unit l2-norm atoms and learns the dictionary by optimization on manifolds. This framework allows multiple dictionary atoms to be updated simultaneously in each iteration. However, similar to several existing analysis dictionary learning algorithms, dictionaries learned by the proposed algorithm may contain similar atoms, leading to a degenerate (coherent) dictionary. To address this problem, we also consider restricting the coherence of the learned dictionary and propose Incoherent Analysis SimCO by introducing an atom decorrelation step following the update of the dictionary. We demonstrate the competitive performance of the proposed algorithms using experiments with synthetic data and image denoising as compared with existing algorithms.

.
(2016) - 'Detection and Classification of Acoustic Scenes and Events.'.
IEEE Transactions on Multimedia, 17 (10), pp. 1733-1746.
**Full text**is available at: http://epubs.surrey.ac.uk/809542/#### Abstract

For intelligent systems to make best use of the audio modality, it is important that they can recognize not just speech and music, which have been researched as specific tasks, but also general sounds in everyday environments. To stimulate research in this field we conducted a public research challenge: the IEEE Audio and Acoustic Signal Processing Technical Committee challenge on Detection and Classification of Acoustic Scenes and Events (DCASE). In this paper, we report on the state of the art in automatically classifying audio scenes, and automatically detecting and classifying audio events. We survey prior work as well as the state of the art represented by the submissions to the challenge from various research groups. We also provide detail on the organization of the challenge, so that our experience as challenge hosts may be useful to those organizing challenges in similar domains. We created new audio datasets and baseline systems for the challenge; these, as well as some submitted systems, are publicly available under open licenses, to serve as benchmarks for further research in general-purpose machine listening.

.
(2015) - 'Event-based Multitrack Alignment using a Probabilistic Framework'.
Journal of New Music Research, 44 (2), pp. 71-82.
**Full text**is available at: http://epubs.surrey.ac.uk/809246/#### Abstract

© 2015 Taylor & Francis.This paper presents a Bayesian probabilistic framework for real-time alignment of a recording or score with a live performance using an event-based approach. Multitrack audio files are processed using existing onset detection and harmonic analysis algorithms to create a representation of a musical performance as a sequence of time-stamped events. We propose the use of distributions for the position and relative speed which are sequentially updated in real-time according to Bayes’ theorem. We develop the methodology for this approach by describing its application in the case of matching a single MIDI track and then extend this to the case of multitrack recordings. An evaluation is presented that contrasts ourmultitrack alignment method with state-of-the-art alignment techniques.

.
(2015) - 'Learning incoherent subspaces: Classification via incoherent dictionary learning'.
Journal of Signal Processing Systems, 79 (2), pp. 189-199.
#### Abstract

In this article we present the supervised iterative projections and rotations (S-IPR) algorithm, a method for learning discriminative incoherent subspaces from data. We derive S-IPR as a supervised extension of our previously proposed iterative projections and rotations (IPR) algorithm for incoherent dictionary learning, and we employ it to learn incoherent sub-spaces that model signals belonging to different classes. We test our method as a feature transform for supervised classification, first by visualising transformed features from a synthetic dataset and from the ‘iris’ dataset, then by using the resulting features in a classification experiment.

.
(2015) - 'Acoustic Scene Classification: Classifying environments from the sounds they produce'.
IEEE Signal Processing Magazine, 32 (3), pp. 16-34.
**Full text**is available at: http://epubs.surrey.ac.uk/807420/#### Abstract

In this article, we present an account of the state of the art in acoustic scene classification (ASC), the task of classifying environments from the sounds they produce. Starting from a historical review of previous research in this area, we define a general framework for ASC and present different implementations of its components. We then describe a range of different algorithms submitted for a data challenge that was held to provide a general and fair benchmark for ASC techniques. The data set recorded for this purpose is presented along with the performance metrics that are used to evaluate the algorithms and statistical significance tests to compare the submitted methods.

.
(2015) - 'Multichannel high-resolution NMF for modeling convolutive mixtures of non-stationary signals in the Time-Frequency domain'.
IEEE/ACM Transactions on Audio, Speech and Language Processing, 22 (11), pp. 1670-1680.
**Full text**is available at: http://epubs.surrey.ac.uk/807421/#### Abstract

Several probabilistic models involving latent components have been proposed for modeling time-frequency (TF) representations of audio signals such as spectrograms, notably in the nonnegative matrix factorization (NMF) literature. Among them, the recent high-resolution NMF (HR-NMF) model is able to take both phases and local correlations in each frequency band into account, and its potential has been illustrated in applications such as source separation and audio inpainting. In this paper, HR-NMF is extended to multichannel signals and to convolutive mixtures. The new model can represent a variety of stationary and non-stationary signals, including autoregressive moving average (ARMA) processes and mixtures of damped sinusoids. A fast variational expectation-maximization (EM) algorithm is proposed to estimate the enhanced model. This algorithm is applied to piano signals, and proves capable of accurately modeling reverberation, restoring missing observations, and separating pure tones with close frequencies.

.
(2014) - 'Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning'.
PeerJ, 2doi: 10.7717/peerj.488
**Full text**is available at: http://epubs.surrey.ac.uk/807401/#### Abstract

Automatic species classification of birds from their sound is a computational tool of increasing importance in ecology, conservation monitoring and vocal communication studies. To make classification useful in practice, it is crucial to improve its accuracy while ensuring that it can run at big data scales. Many approaches use acoustic measures based on spectrogram-type data, such as the Mel-frequency cepstral coefficient (MFCC) features which represent a manually-designed summary of spectral information. However, recent work in machine learning has demonstrated that features learnt automatically from data can often outperform manually-designed feature transforms. Feature learning can be performed at large scale and "unsupervised", meaning it requires no manual data labelling, yet it can improve performance on "supervised" tasks such as classification. In this work we introduce a technique for feature learning from large volumes of bird sound recordings, inspired by techniques that have proven useful in other domains. We experimentally compare twelve different feature representations derived from the Mel spectrum (of which six use this technique), using four large and diverse databases of bird vocalisations, classified using a random forest classifier. We demonstrate that in our classification tasks, MFCCs can often lead to worse performance than the raw Mel spectral data from which they are derived. Conversely, we demonstrate that unsupervised feature learning provides a substantial boost over MFCCs and Mel spectra without adding computational complexity after the model has been trained. The boost is particularly notable for single-label classification tasks at large scale. The spectro-temporal activations learned through our procedure resemble spectro-temporal receptive fields calculated from avian primary auditory forebrain. However, for one of our datasets, which contains substantial audio data but few annotations, increased performance is not discernible. We study the interaction between dataset characteristics and choice of feature representation through further empirical analysis.

.
(2014) - 'Large-scale analysis of frequency modulation in birdsong data bases'.
Methods in Ecology and Evolution, 5 (9), pp. 901-912.
**Full text**is available at: http://epubs.surrey.ac.uk/807482/#### Abstract

* Birdsong often contains large amounts of rapid frequency modulation (FM). It is believed that the use or otherwise of FM is adaptive to the acoustic environment and also that there are specific social uses of FM such as trills in aggressive territorial encounters. Yet temporal fine detail of FM is often absent or obscured in standard audio signal analysis methods such as Fourier analysis or linear prediction. Hence, it is important to consider high-resolution signal processing techniques for analysis of FM in bird vocalizations. If such methods can be applied at big data scales, this offers a further advantage as large data sets become available. * We introduce methods from the signal processing literature which can go beyond spectrogram representations to analyse the fine modulations present in a signal at very short time-scales. Focusing primarily on the genus Phylloscopus, we investigate which of a set of four analysis methods most strongly captures the species signal encoded in birdsong. We evaluate this through a feature selection technique and an automatic classification experiment. In order to find tools useful in practical analysis of large data bases, we also study the computational time taken by the methods, and their robustness to additive noise and MP3 compression. * We find three methods which can robustly represent species-correlated FM attributes and can be applied to large data sets, and that the simplest method tested also appears to perform the best. We find that features representing the extremes of FM encode species identity supplementary to that captured in frequency features, whereas bandwidth features do not encode additional information. * FM analysis can extract information useful for bioacoustic studies, in addition to measures more commonly used to characterize vocalizations. Further, it can be applied efficiently across very large data sets and archives.

.
(2014) - 'Score-Informed Source Separation for Musical Audio Recordings [An overview]'.
IEEE SIGNAL PROCESSING MAGAZINE, 31 (3), pp. 116-124.
**Full text**is available at: http://epubs.surrey.ac.uk/807474/
.
(2014) - 'Best practices for scientific computing.'.
PLoS Biol, United States: 12 (1)
**Full text**is available at: http://epubs.surrey.ac.uk/807424/
.
(2014) - 'Segregating Event Streams and Noise with a Markov Renewal Process Model'.
Journal of Machine Learning Research, 14, pp. 2213-2238.
#### Abstract

We describe an inference task in which a set of timestamped event observations must be clustered into an unknown number of temporal sequences with independent and varying rates of observations. Various existing approaches to multi-object tracking assume a fixed number of sources and/or a fixed observation rate; we develop an approach to inferring structure in timestamped data produced by a mixture of an unknown and varying number of similar Markov renewal processes, plus independent clutter noise. The inference simultaneously distinguishes signal from noise as well as clustering signal observations into separate source streams. We illustrate the technique via synthetic experiments as well as an experiment to track a mixture of singing birds. Source code is available.

.
(2013) - 'Hearing the shape of a room.'. Proc Natl Acad Sci U S A, United States: 110 (30), pp. 12162-12163. . (2013)
- 'Synchronizing sequencing software to a live drummer'.
Computer Music Journal, 37 (2), pp. 46-60.doi: 10.1162/COMJ-a-00178
#### Abstract

This article presents a method of adjusting the tempo of a music software sequencer so that it remains synchronized with a drummer's musical pulse. This allows music sequencer technology to be integrated into a band scenario without the compromise of using click tracks or triggering loops with a fixed tempo. Our design implements real-time mechanisms for both underlying tempo and phase adjustment using adaptable parameters that control its behavior. The aim is to create a system that responds to timing variations in the drummer's playing but is also stable during passages of syncopation and fills. We present an evaluation of the system using a stochastic drum machine that incorporates a level of noise in the underlying tempo and phase of the beat. We measure synchronization error between the output of the system and the underlying pulse of the drum machine and contrast this with other real-time beat trackers. The software, B-Keeper, has been released as a Max for Live device, available online at www.b-keeper.org. © 2013 Massachusetts Institute of Technology.

.
(2013) - 'On Theorem 10 in "On Polar Polytopes and the Recovery of Sparse Representations" (vol 50, pg 2231, 2004)'. IEEE TRANSACTIONS ON INFORMATION THEORY, 59 (8), pp. 5206-5209. . (2013)
- 'The serendiptichord: Reflections on the collaborative design process between artist and researcher'.
Leonardo, 46 (1), pp. 86-87.doi: 10.1162/LEON_a_00494
#### Abstract

The Serendiptichord is a wearable instrument, resulting from a collaboration crossing fashion, technology, music and dance. This paper reflects on the collaborative process and how defining both creative and research roles for each party led to a successful creative partnership built on mutual respect and open communication. After a brief snapshot of the instrument in performance, the instrument is considered within the context of dance-driven interactive music systems followed by a discussion on the nature of the collaboration and its impact upon the design process and final piece. © 2013 ISAST.

.
(2013) - 'Learning Incoherent Dictionaries for Sparse Approximation Using Iterative Projections and Rotations.'. IEEE Transactions on Signal Processing, 61 Article number 8 , pp. 2055-2065. . (2013)
- 'Audio inpainting'.
IEEE Transactions on Audio, Speech and Language Processing, 20 (3), pp. 922-932.
**Full text**is available at: http://epubs.surrey.ac.uk/810156/#### Abstract

We propose the audio inpainting framework that recovers portions of audio data distorted due to impairments such as impulsive noise, clipping, and packet loss. In this framework, the distorted data are treated as missing and their location is assumed to be known. The signal is decomposed into overlapping time-domain frames and the restoration problem is then formulated as an inverse problem per audio frame. Sparse representation modeling is employed per frame, and each inverse problem is solved using the Orthogonal Matching Pursuit algorithm together with a discrete cosine or a Gabor dictionary. The Signal-to-Noise Ratio performance of this algorithm is shown to be comparable or better than state-of-the-art methods when blocks of samples of variable durations are missing. We also demonstrate that the size of the block of missing samples, rather 8than the overall number of missing samples, is a crucial parameter for high quality signal restoration. We further introduce a constrained Matching Pursuit approach for the special case of audio declipping that exploits the sign pattern of clipped audio samples and their maximal absolute value, as well as allowing the user to specify the maximum amplitude of the signal. This approach is shown to outperform state-of-the-art and commercially available methods for audio declipping in terms of Signal-to-Noise Ratio. © 2006 IEEE.

.
(2012) - 'A measure of statistical complexity based on predictive information with application to finite spin systems'.
Physics Letters, Section A: General, Atomic and Solid State Physics, 376 (4), pp. 275-281.
#### Abstract

We propose the binding information as an information theoretic measure of complexity between multiple random variables, such as those found in the Ising or Potts models of interacting spins, and compare it with several previously proposed measures of statistical complexity, including excess entropy, Bialek et al.'s predictive information, and the multi-information. We discuss and prove some of the properties of binding information, particularly in relation to multi-information and entropy, and show that, in the case of binary random variables, the processes which maximise binding information are the 'parity' processes. The computation of binding information is demonstrated on Ising models of finite spin systems, showing that various upper and lower bounds are respected and also that there is a strong relationship between the introduction of high-order interactions and an increase of binding-information. Finally we discuss some of the implications this has for the use of the binding information as a measure of complexity. © 2011 Elsevier B.V. All rights reserved.

.
(2012) - 'Performance following: Real-time prediction of musical sequences without a score'.
IEEE Transactions on Audio, Speech and Language Processing, 20 (1), pp. 178-187.
#### Abstract

This paper introduces a technique for predicting harmonic sequences in a musical performance for which no score is available, using real-time audio signals. Recent short-term information is aligned with longer term information, contextualizing the present within the past, allowing predictions about the future of the performance to be made. Using a mid-level representation in the form of beat-synchronous harmonic sequences, we reduce the size of the information needed to represent the performance. This allows the implementation of real-time performance following in live performance situations. We conduct an objective evaluation on a database of rock, pop, and folk music. Our results show that we are able to predict a large majority of repeated harmonic content with no prior knowledge in the form of a score. © 2011 IEEE.

.
(2012) - 'Reliability-informed beat tracking of musical signals'.
IEEE Transactions on Audio, Speech and Language Processing, 20 (1), pp. 278-289.
#### Abstract

A new probabilistic framework for beat tracking of musical audio is presented. The method estimates the time between consecutive beat events and exploits both beat and non-beat information by explicitly modeling non-beat states. In addition to the beat times, a measure of the expected accuracy of the estimated beats is provided. The quality of the observations used for beat tracking is measured and the reliability of the beats is automatically calculated. A k-nearest neighbor regression algorithm is proposed to predict the accuracy of the beat estimates. The performance of the beat tracking system is statistically evaluated using a database of 222 musical signals of various genres. We show that modeling non-beat states leads to a significant increase in performance. In addition, a large experiment where the parameters of the model are automatically learned has been completed. Results show that simple approximations for the parameters of the model can be used. Furthermore, the performance of the system is compared with existing algorithms. Finally, a new perspective for beat tracking evaluation is presented. We show how reliability information can be successfully used to increase the mean performance of the proposed algorithm and discuss how far automatic beat tracking is from human tapping. © 2011 IEEE.

.
(2012) - 'Learning Timbre Analogies from Unlabelled Data by Multivariate Tree Regression'.
Journal of New Music Research, 40 (4), pp. 325-336.
#### Abstract

Applications such as concatenative synthesis (audio mosaicing) and query-by-example require the ability to search a database using a sound which is qualitatively different from the actual desired result-for example when using vocal queries to retrieve nonvocal sound. Standard query techniques such as nearest neighbours do not account for this difference between source and target; they perform retrieval but do not learn to make timbral analogies. This paper addresses this issue by considering timbral query as a multivariate regression problem from one timbre distribution onto another. We develop a novel variant of multivariate tree regression: given only a set of unlabelled and unpaired samples from two distributions on the same space, the regression learns a cross-associative mapping which assumes general similarities in structure of the two distributions, yet can accommodate differences in shape at various scales. We demonstrate the technique with a synthetic example and with a concatenative synthesizer. © 2011 Copyright Taylor and Francis Group, LLC.

.
(2011) - 'Fast Dictionary Learning for Sparse Representations of Speech Signals.'. J. Sel. Topics Signal Processing, 5 Article number 5 , pp. 1025-1031. . (2011)
- 'Measuring the Performance of Beat Tracking Algorithms Using a Beat Error Histogram.'. IEEE Signal Process. Lett., 18 Article number 3 , pp. 157-160. . (2011)
- 'Onset Event Decoding Exploiting the Rhythmic Structure of Polyphonic Music.'. J. Sel. Topics Signal Processing, 5 Article number 6 , pp. 1228-1239. . (2011)
- 'Delayed decision-making in real-time beatbox percussion classification'.
Journal of New Music Research, 39 (3), pp. 203-213.
#### Abstract

Real-time classification applied to a vocal percussion signal holds potential as an interface for live musical control. In this article we propose a novel approach to resolving the tension between the needs for low-latency reaction and reliable classification, by deferring the final classification decision until after a response has been initiated. We introduce a new dataset of annotated human beatbox recordings, and use it to study the optimal delay for classification accuracy. We then investigate the effect of such delayed decision-making on the quality of the audio output of a typical reactive system, via a MUSHRA-type listening test. Our results show that the effect depends on the output audio type: for popular dance/pop drum sounds the acceptable delay is on the order of 12-35 ms. © 2010 Taylor & Francis.

.
(2010) - 'Sparse Representations in Audio and Music: From Coding to Source Separation.'. Proceedings of the IEEE, 98 Article number 6 , pp. 995-1005. . (2010)
- 'Fast Multidimensional Entropy Estimation by k -d Partitioning.'. IEEE Signal Process. Lett., 16 Article number 6 , pp. 537-540. . (2009)
- 'Information dynamics: patterns of expectation and surprise in the perception of music.'. Connect. Sci., 21 Article number 2&3 , pp. 89-117. . (2009)
- 'Evaluation of live human-computer music-making: Quantitative and qualitative approaches.'. Int. J. Hum.-Comput. Stud., 67 Article number 11 , pp. 960-975. . (2009)
- 'Theorems on positive data: on the uniqueness of NMF.'.
Comput Intell Neurosci, United States: doi: 10.1155/2008/764206
#### Abstract

We investigate the conditions for which nonnegative matrix factorization (NMF) is unique and introduce several theorems which can determine whether the decomposition is in fact unique or not. The theorems are illustrated by several examples showing the use of the theorems and their limitations. We have shown that corruption of a unique NMF matrix by additive noise leads to a noisy estimation of the noise-free unique solution. Finally, we use a stochastic view of NMF to analyze which characterization of the underlying model will result in an NMF with small estimation errors.

.
(2008) - 'Efficient Bayesian inference for harmonic models via adaptive posterior factorization.'. Neurocomputing, 72 Article number 1-3 , pp. 79-87. . (2008)
- 'An adaptive stereo basis method for convolutive blind audio source separation.'. Neurocomputing, 71 Article number 10-12 , pp. 2087-2097. . (2008)
- 'Context-Dependent Beat Tracking of Musical Audio.'. IEEE Transactions on Audio, Speech & Language Processing, 15 Article number 3 , pp. 1009-1020. . (2007)
- 'Low Bit-Rate Object Coding of Musical Audio Using Bayesian Harmonic Models.'. IEEE Transactions on Audio, Speech & Language Processing, 15 Article number 4 , pp. 1273-1282. . (2007)
- 'Oracle estimators for the benchmarking of source separation algorithms.'. Signal Processing, 87 Article number 8 , pp. 1933-1950. . (2007)
- 'Audio source separation with a signal-adaptive local cosine transform.'. Signal Processing, 87 Article number 8 , pp. 1848-1858. . (2007)
- 'Unsupervised analysis of polyphonic music by sparse coding.'.
IEEE Trans Neural Netw, United States: 17 (1), pp. 179-196.
#### Abstract

We investigate a data-driven approach to the analysis and transcription of polyphonic music, using a probabilistic model which is able to find sparse linear decompositions of a sequence of short-term Fourier spectra. The resulting system represents each input spectrum as a weighted sum of a small number of "atomic" spectra chosen from a larger dictionary; this dictionary is, in turn, learned from the data in such a way as to represent the given training set in an (information theoretically) efficient way. When exposed to examples of polyphonic music, most of the dictionary elements take on the spectral characteristics of individual notes in the music, so that the sparse decomposition can be used to identify the notes in a polyphonic mixture. Our approach differs from other methods of polyphonic analysis based on spectral decomposition by combining all of the following: (a) a formulation in terms of an explicitly given probabilistic model, in which the process estimating which notes are present corresponds naturally with the inference of latent variables in the model; (b) a particularly simple generative model, motivated by very general considerations about efficient coding, that makes very few assumptions about the musical origins of the signals being processed; and (c) the ability to learn a dictionary of atomic spectra (most of which converge to harmonic spectral profiles associated with specific notes) from polyphonic examples alone-no separate training on monophonic examples is required.

.
(2006) - 'Sparse representations of polyphonic music.'. Signal Processing, 86 Article number 3 , pp. 417-431. . (2006)
- 'On polar polytopes and the recovery of sparse representations'.
IEEE Transactions on Information Theory, 53 (9), pp. 3188-3195.
#### Abstract

Suppose we have a signal y which we wish to represent using a linear combination of a number of basis atoms ai, y = Σi xiai = Ax. The problem of finding the minimum l0 norm representation for y is a hard problem. The basis pursuit (BP) approach proposes to find the minimum l1 norm representation instead, which corresponds to a linear program (LP) that can be solved using modern LP techniques, and several recent authors have given conditions for the BP (minimum l1 norm) and sparse (minimum l0 norm) representations to be identical. In this paper, we explore this sparse representation problem using the geometry of convex polytopes, as recently introduced into the field by Donoho. By considering the dual LP we find that the so-called polar polytope P* of the centrally symmetric polytope P whose vertices are the atom pairs plusmn;ai is particularly helpful in providing us with geometrical insight into optimality conditions given by Fuchs and Tropp for non-unit-norm atom sets. In exploring this geometry, we are able to tighten some of these earlier results, showing for example that a condition due to Fuchs is both necessary and sufficient for l1-unique-optimality, and there are cases where orthogonal matching pursuit (OMP) can eventually find all l1-unique-optimal solutions with m nonzeros even if the exact recover condition (ERC) fails for m. © 2007 IEEE.

.
(2005) - 'Geometrical methods for non-negative ICA: Manifolds, Lie groups and toral subalgebras'. NEUROCOMPUTING, 67, pp. 161-197. . (2005)
- 'Blind separation of positive sources by globally convergent gradient search.'.
Neural Comput, United States: 16 (9), pp. 1811-1825.
#### Abstract

The instantaneous noise-free linear mixing model in independent component analysis is largely a solved problem under the usual assumption of independent nongaussian sources and full column rank mixing matrix. However, with some prior information on the sources, like positivity, new analysis and perhaps simplified solution methods may yet become possible. In this letter, we consider the task of independent component analysis when the independent sources are known to be nonnegative and well grounded, which means that they have a nonzero pdf in the region of zero. It can be shown that in this case, the solution method is basically very simple: an orthogonal rotation of the whitened observation vector into nonnegative outputs will give a positive permutation of the original sources. We propose a cost function whose minimum coincides with nonnegativity and derive the gradient algorithm under the whitening constraint, under which the separating matrix is orthogonal. We further prove that in the Stiefel manifold of orthogonal matrices, the cost function is a Lyapunov function for the matrix gradient flow, implying global convergence. Thus, this algorithm is guaranteed to find the nonnegative well-grounded independent sources. The analysis is complemented by a numerical simulation, which illustrates the algorithm.

.
(2004) - 'A "nonnegative PCA" algorithm for independent component analysis.'.
IEEE Trans Neural Netw, United States: 15 (1), pp. 66-76.
#### Abstract

We consider the task of independent component analysis when the independent sources are known to be nonnegative and well-grounded, so that they have a nonzero probability density function (pdf) in the region of zero. We propose the use of a "nonnegative principal component analysis (nonnegative PCA)" algorithm, which is a special case of the nonlinear PCA algorithm, but with a rectification nonlinearity, and we conjecture that this algorithm will find such nonnegative well-grounded independent sources, under reasonable initial conditions. While the algorithm has proved difficult to analyze in the general case, we give some analytical results that are consistent with this conjecture and some numerical simulations that illustrate its operation.

.
(2004) - 'Algorithms for nonnegative independent component analysis.'.
IEEE Trans Neural Netw, United States: 14 (3), pp. 534-543.
#### Abstract

We consider the task of solving the independent component analysis (ICA) problem x=As given observations x, with a constraint of nonnegativity of the source random vector s. We refer to this as nonnegative independent component analysis and we consider methods for solving this task. For independent sources with nonzero probability density function (pdf) p(s) down to s=0 it is sufficient to find the orthonormal rotation y=Wz of prewhitened sources z=Vx, which minimizes the mean squared error of the reconstruction of z from the rectified version y/sup +/ of y. We suggest some algorithms which perform this, both based on a nonlinear principal component analysis (PCA) approach and on a geodesic search method driven by differential geometry considerations. We demonstrate the operation of these algorithms on an image separation problem, which shows in particular the fast convergence of the rotation and geodesic methods and apply the approach to a musical audio analysis task.

.
(2003) - 'Conditions for nonnegative independent component analysis'. IEEE SIGNAL PROCESSING LETTERS, 9 (6), pp. 177-180. . (2002)
- 'Do cortical maps adapt to optimize information density?'.
Network: Computation in Neural Systems, 10 (1), pp. 41-58.
#### Abstract

Topographic maps are found in many biological and artificial neural systems. In biological systems, some parts of these can form a significantly expanded representation of their sensory input, such as the representation of the fovea in the visual cortex. We propose that a cortical feature map should be organized to optimize the efficiency of information transmission through it. This leads to a principle of uniform cortical information density across the map as the desired optimum. An expanded representation in the cortex for a particular sensory area (i.e. a high magnification factor) means that a greater information density is concentrated in that sensory area, leading to finer discrimination thresholds. Improvement may ultimately be limited by the construction of the sensors themselves. This approach gives a good fit to threshold versus cortical area data of Recanzone et al on owl monkeys trained on a tactile frequency-discrimination task.

.
(1999) - 'Do cortical maps adapt to optimize information density?'. NETWORK-COMPUTATION IN NEURAL SYSTEMS, 10 (1), pp. 41-58. . (1999)
- 'The roles of neural and evolutionary computing in intelligent software systems'. BT TECHNOLOGY JOURNAL, 14 (4), pp. 46-54. . (1996)
- 'Unsupervised neural network learning procedures for feature extraction and classification'.
APPLIED INTELLIGENCE, 6 (3), pp. 185-203.doi: 10.1007/BF00126625
.
(1996)
- 'LYAPUNOV FUNCTIONS FOR CONVERGENCE OF PRINCIPAL COMPONENT ALGORITHMS'. NEURAL NETWORKS, 8 (1), pp. 11-23. . (1995)
- 'EFFICIENT INFORMATION-TRANSFER AND ANTI-HEBBIAN NEURAL NETWORKS'. NEURAL NETWORKS, 6 (6), pp. 823-833. . (1993)
- 'Generation and Adaptation of Neural Networks by Evolutionary Techniques (GANNET).'.
Neural Computing and Applications, 1 Article number 1 , pp. 23-31.doi: 10.1007/BF01411372
.
(1993)

### Conference papers

- 'Combining Mask Estimates for Single Channel Audio Source Separation using Deep Neural Networks'. ISCA
San Francisco, California: Interspeech2016

[ Status: Accepted ]#### Abstract

Deep neural networks (DNNs) are usually used for single channel source separation to predict either soft or binary time frequency masks. The masks are used to separate the sources from the mixed signal. Binary masks produce separated sources with more distortion and less interference than soft masks. In this paper, we propose to use another DNN to combine the estimates of binary and soft masks to achieve the advantages and avoid the disadvantages of using each mask individually. We aim to achieve separated sources with low distortion and low interference between each other. Our experimental results show that combining the estimates of binary and soft masks using DNN achieves lower distortion than using each estimate individually and achieves as low interference as the binary mask.

.
(2016) - 'Remixing musical audio on the web using source separation'.
Proceedings of the 2nd Web Audio Conference (WAC-2016), Atlanta, Georgia.: 2nd Web Audio Conference (WAC)
#### Abstract

Research in audio source separation has progressed a long way, producing systems that are able to approximate the component signals of sound mixtures. In recent years, many efforts have focused on learning time-frequency masks that can be used to filter a monophonic signal in the frequency domain. Using current web audio technologies, time-frequency masking can be implemented in a web browser in real time. This allows applying source separation techniques to arbitrary audio streams, such as internet radios, depending on cross-domain security configurations. While producing good quality separated audio from monophonic music mixtures is still challenging, current methods can be applied to remixing scenarios, where part of the signal is emphasized or deemphasized. This paper describes a system for remixing musical audio on the web by applying time-frequency masks estimated using deep neural networks. Our example prototype, implemented in client-side Javascript, provides reasonable quality results for small modifications.

.
(2016) - 'Detection of overlapping acoustic events using a temporally-constrained probabilistic model'. IEEE
Shanghai, China: ICASSP 2016

[ Status: Accepted ]#### Abstract

In this paper, a system for overlapping acoustic event detection is proposed, which models the temporal evolution of sound events. The system is based on probabilistic latent component analysis, supporting the use of a sound event dictionary where each exemplar consists of a succession of spectral templates. The temporal succession of the templates is controlled through event class-wise Hidden Markov Models (HMMs). As input time/frequency representation, the Equivalent Rectangular Bandwidth (ERB) spectrogram is used. Experiments are carried out on polyphonic datasets of office sounds generated using an acoustic scene simulator, as well as real and synthesized monophonic datasets for comparative purposes. Results show that the proposed system outperforms several state-of-the-art methods for overlapping acoustic event detection on the same task, using both frame-based and event-based metrics, and is robust to varying event density and noise levels.

.
(2016) - 'Evaluation of Audio Source Separation Models Using Hypothesis-Driven Non-Parametric Statistical Methods'.
Budapest: European Signal Processing Conference (EUSIPCO) 2016

[ Status: Accepted ]**Full text**is available at: http://epubs.surrey.ac.uk/811172/#### Abstract

Audio source separation models are typically evaluated using objective separation quality measures, but rigorous statistical methods have yet to be applied to the problem of model comparison. As a result, it can be difficult to establish whether or not reliable progress is being made during the development of new models. In this paper, we provide a hypothesis-driven statistical analysis of the results of the recent source separation SiSEC challenge involving twelve competing models tested on separation of voice and accompaniment from fifty pieces of “professionally produced” contemporary music. Using nonparametric statistics, we establish reliable evidence for meaningful conclusions about the performance of the various models.

.
(2016) - 'Single Channel Audio Source Separation using Deep Neural Network Ensembles'. AES
Paris, France: 140th Convention of the Audio Engineering Society
#### Abstract

Deep neural networks (DNNs) are often used to tackle the single channel source separation (SCSS) problem by predicting time-frequency masks. The predicted masks are then used to separate the sources from the mixed signal. Different types of masks produce separated sources with different levels of distortion and interference. Some types of masks produce separated sources with low distortion, while other masks produce low interference between the separated sources. In this paper, a combination of different DNNs’ predictions (masks) is used for SCSS to achieve better quality of the separated sources than using each DNN individually. We train four different DNNs by minimizing four different cost functions to predict four different masks. The first and second DNNs are trained to approximate reference binary and soft masks. The third DNN is trained to predict a mask from the reference sources directly. The last DNN is trained similarly to the third DNN but with an additional discriminative constraint to maximize the differences between the estimated sources. Our experimental results show that combining the predictions of different DNNs achieves separated sources with better quality than using each DNN individually

.
(2016) - 'Use of audio editors in radio production'.
Audio Engineering Society 138th Convention, Warsaw, Poland, May 7-10, 2015
**Full text**is available at: http://epubs.surrey.ac.uk/808461/#### Abstract

Audio editing is performed at scale in the production of radio, but often the tools used are poorly targeted toward the task at hand. There are a number of audio analysis techniques that have the potential to aid radio producers, but without a detailed understanding of their process and requirements, it can be difficult to apply these methods. To aid this understanding, a study of radio production practice was conducted on three varied case studies—a news bulletin, drama, and documentary. It examined the audio/metadata workflow, the roles and motivations of the producers, and environmental factors. The study found that producers prefer to interact with higher-level representations of audio content like transcripts and enjoy working on paper. The study also identified opportunities to improve the work flow with tools that link audio to text, highlight repetitions, compare takes, and segment speakers.

.
(2015) - 'Deep Karaoke: Extracting Vocals from Musical Mixtures Using a Convolutional Deep Neural Network'. Springer International Publishing
Latent Variable Analysis and Signal Separation: 12th International Conference, LVA/ICA 2015, Liberec, Czech Republic, August 25-28, 2015, Proceedings 9237, pp. 429-436.
#### Abstract

Identification and extraction of singing voice from within musical mixtures is a key challenge in source separation and machine audition. Recently, deep neural networks (DNN) have been used to estimate 'ideal' binary masks for carefully controlled cocktail party speech separation problems. However, it is not yet known whether these methods are capable of generalizing to the discrimination of voice and non-voice in the context of musical mixtures. Here, we trained a convolutional DNN (of around a billion parameters) to provide probabilistic estimates of the ideal binary mask for separation of vocal sounds from real-world musical mixtures. We contrast our DNN results with more traditional linear methods. Our approach may be useful for automatic removal of vocal sounds from musical mixtures for 'karaoke' type applications.

.
(2015) - 'Non-negative Matrix Factorisation incorporating greedy Hellinger sparse coding applied to polyphonic music transcription'.
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2015), Brisbane: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2015), pp. 2214-2218.
#### Abstract

Non-negative Matrix Factorisation (NMF) is a commonly used tool in many musical signal processing tasks, including Automatic Music Transcription (AMT). However unsupervised NMF is seen to be problematic in this context, and harmonically constrained variants of NMF have been proposed. While useful, the harmonic constraints may be constrictive in mixed signals. We have previously observed that recovery of overlapping signal elements using NMF is improved through introduction of a sparse coding step, and propose here the incorporation of a sparse coding step using the Hellinger distance into a NMF algorithm. Improved AMT results for unsupervised NMF are reported.

.
(2015) - 'A dynamic programming variant of non-negative matrix deconvolution for the transcription of struck string instruments'.
Brisbane, Australia: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2015), pp. 569-573.
**Full text**is available at: http://epubs.surrey.ac.uk/807616/#### Abstract

Given a musical audio recording, the goal of music transcription is to determine a score-like representation of the piece underlying the recording. Most current transcription methods employ variants of non-negative matrix factorization (NMF), which often fails to robustly model instruments producing non-stationary sounds. Using entire time-frequency patterns to represent sounds, non-negative matrix deconvolution (NMD) can capture certain types of nonstationary behavior but is only applicable if all sounds have the same length. In this paper, we present a novel method that combines the non-stationarity modeling capabilities available with NMD with the variable note lengths possible with NMF. Identifying frames in NMD patterns with states in a dynamical system, our method iteratively generates sound-object candidates separately for each pitch, which are then combined in a global optimization. We demonstrate the transcription capabilities of our method using piano pieces assuming the availability of single note recordings as training data.

.
(2015) - 'CHiME-Home: A dataset for sound source recognition in a domestic environment.'. IEEE Proc 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 18-21 Oct. 2015, . (2015)
- 'Efficient compressive spectrum sensing algorithm for M2M devices'. IEEE
Atlanta, Georgia: Proceedings of the 2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pp. 1170-1174.
**Full text**is available at: http://epubs.surrey.ac.uk/807458/#### Abstract

Spectrum used for Machine-to-Machine (M2M) communications should be as cheap as possible or even free in order to connect billions of devices. Recently, both UK and US regulators have conducted trails and pilots to release the UHF TV spectrum for secondary licence-exempt applications. However, it is a very challenging task to implement wideband spectrum sensing in compact and low power M2M devices as high sampling rates are very expensive and difficult to achieve. In recent years, compressive sensing (CS) technique makes fast wideband spectrum sensing possible by taking samples at sub-Nyquist sampling rates. In this paper, we propose a two-step CS based spectrum sensing algorithm. In the first step, the CS is implemented in an SU and only part of the spectrum of interest is supposed to be sensed by an SU in each sensing period to reduce the complexity in the signal recovery process. In the second step, a denoising algorithm is proposed to improve the detection performance of spectrum sensing. The proposed two-step CS based spectrum sensing is compared with the traditional scheme and the theoretical curves.

.
(2014) - 'Visualising chord progressions in music collections: a big data approach'.
Berlin, Germany: Proceedings of the 9th Conference on Interdisciplinary Musicology - CIM14
#### Abstract

In the Digital Music Lab project we work on the automatic analysis of large audio databases that results in rich annotations for large corpora of music. The musicological interpretation of this data from thousands of pieces is a challenging task that can benefit greatly from specifically designed interactive visualisation. Most existing big music data visualisation focuses on cultural attributes, mood, or listener behaviour. In this ongoing work we explore chord sequence patterns extracted by sequential pattern mining of more than one million tracks from the I Like Music commercial music collection. We present here several new visual representations that summarise chord patterns according to chord types, chroma, pattern structure and support, enabling musicologists to develop and answer questions about chord patterns in music collections. Our visualisations represent root movement and chord qualities mostly in a geometrical way and use colour to represent pattern support. We use two individually configurable views in parallel to encourage comparisons, either between different representations of one corpus, highlighting complimentary musical aspects, or between different datasets, here representing different genres. We adapt several visualisation techniques to chord pattern sets using some novel layouts to support musicologists with their exploration and interpretation of the corpora. We found that differences between chord patterns of different genres, e.g. Rock & Roll vs. Jazz, are visible and can be used to generate hypotheses for the study of individual pieces, further statistical investigations or new data processing and visualisation. Our designs will be adapted as user needs are established through ongoing work. Means of aggregating, focusing and filtering by selected characteristics (such as key, melodic patterns etc.) will be added as we develop our design for the visualisation of chord patterns in close collaboration with musicologists. The visualisations are available as a web application at http://dml.city.ac.uk/csvd/

.
(2014) - 'Big chord data extraction and mining'.
Berlin, Germany: Proceedings of the 9th Conference on Interdisciplinary Musicology - CIM14
#### Abstract

Harmonic progression is one of the cornerstones of tonal music composition and is thereby essential to many musical styles and traditions. Previous studies have shown that musical genres and composers could be discriminated based on chord progressions modeled as chord n-grams. These studies were however conducted on small-scale datasets and using symbolic music transcriptions. In this work, we apply pattern mining techniques to over 200,000 chord progression sequences out of 1,000,000 extracted from the I Like Music (ILM) commercial music audio collection. The ILM collection spans 37 musical genres and includes pieces released between 1907 and 2013. We developed a single program multiple data parallel computing approach whereby audio feature extraction tasks are split up and run simultaneously on multiple cores. An audio-based chord recognition model (Vamp plugin Chordino) was used to extract the chord progressions from the ILM set. To keep low-weight feature sets, the chord data were stored using a compact binary format. We used the CM-SPADE algorithm, which performs a vertical mining of sequential patterns using co-occurence information, and which is fast and efficient enough to be applied to big data collections like the ILM set. In order to derive key-independent frequent patterns, the transition between chords are modeled by changes of qualities (e.g. major, minor, etc.) and root keys (e.g. fourth, fifth, etc.). The resulting key-independent chord progression patterns vary in length (from 2 to 16) and frequency (from 2 to 19,820) across genres. As illustrated by graphs generated to represent frequent 4-chord progressions, some patterns like circleof- fifths movements are well represented in most genres but in varying degrees. These large-scale results offer the opportunity to uncover similarities and discrepancies between sets of musical pieces and therefore to build classifiers for search and recommendation. They also support the empirical testing of music theory. It is however more difficult to derive new hypotheses from such dataset due to its size. This can be addressed by using pattern detection algorithms or suitable visualisation which we present in a companion study.

.
(2014) - 'Visualising chord progressions in music collections: a big data approach'.
15th International Society for Music Information Retrieval Conference (ISMIR), Taipei, Taiwan, 27-31 Oct 2014 [Late Breaking/Demo paper]
**Full text**is available at: http://epubs.surrey.ac.uk/807469/#### Abstract

The analysis of large datasets of music audio and other representations entails the need for techniques that support musicologists and other users in interpreting extracted data. We explore and develop visualisation techniques of chord sequence patterns mined from a corpus of over one million tracks. The visualisations use different representations of root movements and chord qualities with geometrical representations, and mostly colour mappings for pattern support. The presented visualisations are being developed in close collaboration with musicologists and can help gain insights into the differences of musical genres and styles as well as support the development of new classification methods.

.
(2014) - 'Big Data for Musicology'. London, UK :
Proceedings of the 1st International Workshop on Digital Libraries for Musicology (DLfM 2014)
#### Abstract

Digital music libraries and collections are growing quickly and are increasingly made available for research. We argue that the use of large data collections will enable a better understanding of music performance and music in general, which will benefit areas such as music search and recommendation, music archiving and indexing, music production and education. However, to achieve these goals it is necessary to develop new musicological research methods, to create and adapt the necessary technological infrastructure, and to find ways of working with legal limitations. Most of the necessary basic technologies exist, but they need to be brought together and applied to musicology. We aim to address these challenges in the Digital Music Lab project, and we feel that with suitable methods and technology Big Music Data can provide new opportunities to musicology.

.
(2014) - 'Audio-only bird classification using unsupervised feature learning'.
Working Notes for CLEF 2014 Conference, Sheffield, UK, September 15-18, 2014, pp. 673-684.
**Full text**is available at: http://epubs.surrey.ac.uk/807467/#### Abstract

We describe our method for automatic bird species classification, which uses raw audio without segmentation and without using any auxiliary metadata. It successfully classifies among 501 bird categories, and was by far the highest scoring audio-only bird recognition algorithm submitted to BirdCLEF 2014. Our method uses unsupervised feature learning, a technique which learns regularities in spectro-temporal content without reference to the training labels, which helps a classifier to generalise to further content of the same type. Our strongest submission uses two layers of feature learning to capture regularities at two different time scales.

.
(2014) - 'Digital Music Lab: A Framework for Analysing Big Music Data'. Second European Conference on Data Analysis (ECDA 2014), July 2-4, 2014 . (2014)
- 'Robust bird species recognition: making it work for dawn chorus audio archives'.
Paris, Fance: Ecology and acoustics: emergent properties from community to landscape, 16-18 Jun 2014 Paris, France, pp. 94-94.
#### Abstract

The recent (2013) bird species recognition challenges organised by the SABIOD project attracted some strong performances from automatic classifiers applied to short audio excerpts from passive acoustic monitoring stations. Can such strong results be achieved for dawn chorus field recordings in audio archives? The question is important because archives (such as the British Library Sound Archive) hold thousands such recordings, covering many decades and many countries, but they are mostly unlabelled. Automatic labelling holds the potential to unlock their value to ecological studies. Audio in such archives is quite different from passive acoustic monitoring data: importantly, the recording conditions vary randomly (and are usually unknown), making the scenario a ”cross-condition” rather than ”single-condition” train/test task. Dawn chorus recordings are generally long, and the annotations often indicate which birds are in a 20-minute recording but not within which 5-second segments they are active. Further, the amount of annotation available is very small. We report on experiments to evaluate a variety of classifier configurations for automatic multilabel species annotation in dawn chorus archive recordings. The audio data is an order of magnitude larger than the SABIOD challenges, but the ground-truth data is an order of magnitude smaller. We report some surprising findings, including clear variation in the bene- fits of some analysis choices (audio features, pooling techniques noise-robustness techniques) as we move to handle the specific multi-condition case relevant for audio archives.

.
(2014) - 'Harmonic Motion: A Toolkit for Processing Gestural Data for Interactive Sound'. London, United Kingdom :
Proceedings of the International Conference on New Interfaces for Musical Expression (NIME), pp. 213-216.
#### Abstract

We introduce Harmonic Motion, a free open source toolkit for artists, musicians and designers working with gestural data. Extracting musically useful features from captured gesture data can be challenging, with projects often requiring bespoke processing techniques developed through iterations of tweaking equations involving a number of constant values – sometimes referred to as ‘magic numbers’. Harmonic Motion provides a robust interface for rapid prototyping of patches to process gestural data and a framework through which approaches may be encapsulated, reused and shared with others. In addition, we describe our design process in which both personal experience and a survey of potential users informed a set of specific goals for the software.

.
(2014) - 'Accounting for phase cancellations in non-negative matrix factorization using weighted distances'.
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 649-653.
**Full text**is available at: http://epubs.surrey.ac.uk/807423/#### Abstract

Techniques based on non-negative matrix factorization (NMF) have been successfully used to decompose a spectrogram of a music recording into a dictionary of templates and activations. While advanced NMF variants often yield robust signal models, there are usually some inaccuracies in the factorization since the underlying methods are not prepared for phase cancellations that occur when sounds with similar frequency are mixed. In this paper, we present a novel method that takes phase cancellations into account to refine dictionaries learned by NMF-based methods. Our approach exploits the fact that advanced NMF methods are often robust enough to provide information about how sound sources interact in a spectrogram, where they overlap, and thus where phase cancellations could occur. Using this information, the distances used in NMF are weighted entry-wise to attenuate the influence of regions with phase cancellations. Experiments on full-length, polyphonic piano recordings indicate that our method can be successfully used to refine NMF-based dictionaries.

.
(2014) - 'Polyphonic piano transcription using non-negative Matrix Factorisation with group sparsity'.
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3112-3116.
**Full text**is available at: http://epubs.surrey.ac.uk/807426/#### Abstract

Non-negative Matrix Factorisation (NMF) is a popular tool in musical signal processing. However, problems using this methodology in the context of Automatic Music Transcription (AMT) have been noted resulting in the proposal of supervised and constrained variants of NMF for this purpose. Group sparsity has previously been seen to be effective for AMT when used with stepwise methods. In this paper group sparsity is introduced to supervised NMF decompositions and a dictionary tuning approach to AMT is proposed based upon group sparse NMF using the β-divergence. Experimental results are given showing improved AMT results over the state-of-the-art NMF-based AMT system.

.
(2014) - 'Improving instrument recognition in polyphonic music through system integration'.
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5222-5226.
**Full text**is available at: http://epubs.surrey.ac.uk/807425/#### Abstract

A method is proposed for instrument recognition in polyphonic music which combines two independent detector systems. A polyphonic musical instrument recognition system using a missing feature approach and an automatic music transcription system based on shift invariant probabilistic latent component analysis that includes instrument assignment. We propose a method to integrate the two systems by fusing the instrument contributions estimated by the first system onto the transcription system in the form of Dirichlet priors. Both systems, as well as the integrated system are evaluated using a dataset of continuous polyphonic music recordings. Detailed results that highlight a clear improvement in the performance of the integrated system are reported for different training conditions.

.
(2014) - 'An Open Dataset for Research on Audio Field Recording Archives: freefield1010'.
Proceedings of the AES 53rd International Conference: Semantic Audio, pp. 80-86.
**Full text**is available at: http://epubs.surrey.ac.uk/807464/#### Abstract

We introduce a free and open dataset of 7690 audio clips sampled from the field-recording tag in the Freesound audio archive. The dataset is designed for use in research related to data mining in audio archives of field recordings / soundscapes. Audio is standardised, and audio and metadata are Creative Commons licensed. We describe the data preparation process, characterise the dataset descriptively, and illustrate its use through an auto-tagging experiment.

.
(2014) - 'Phase-based harmonic/percussive separation'.
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp. 1628-1632.
**Full text**is available at: http://epubs.surrey.ac.uk/807419/#### Abstract

In this paper, a method for separation of harmonic and percussive elements in music recordings is presented. The proposed method is based on a simple spectral peak detection step followed by a phase expectation analysis that discriminates between harmonic and percussive components. The proposed method was tested on a database of 10 audio tracks and has shown superior results to the reference state-of-the-art approach.

.
(2014) - 'Evidence that phrase-level tempo variation may be represented using a limited dictionary'. Presented at: 13th International Conference for Music Perception and Cognition (ICMPC13-APSCOM5), Seoul, South Korea, 4-8 August 2014, . (2014)
- 'Behavior of greedy sparse representation algorithms on nested supports'.
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, , pp. 5710-5714.
#### Abstract

In this work, we study the links between the recovery properties of sparse signals for Orthogonal Matching Pursuit (OMP) and the whole General MP class over nested supports. We show that the optimality of those algorithms is not locally nested: there is a dictionary and supports I and J with J included in I such that OMP will recover all signals of support I, but not all signals of support J. We also show that the optimality of OMP is globally nested: if OMP can recover all s-sparse signals, then it can recover all s′-sparse signals with s′ smaller than s. We also provide a tighter version of Donoho and Elad's spark theorem, which allows us to complete Tropp's proof that sparse representation algorithms can only be optimal for all s-sparse signals if s is strictly lower than half the spark of the dictionary. © 2013 IEEE.

.
(2013) - 'Score informed audio source separation using constrained nonnegative matrix factorization and score synthesis'.
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, , pp. 888-891.
#### Abstract

In this paper we present a new method for musical audio source separation, using the information from the musical score to supervise the decomposition process. An original framework using nonnegative matrix factorization (NMF) is presented, where the components are initially learnt on synthetic signals with temporal and harmonic constraints. A new dataset of multitrack recordings with manually aligned MIDI scores is created (TRIOS), and we compare our separation results with other methods from the literature using the BSS EVAL and PEASS evaluation toolboxes. The results show a general improvement of the BSS EVAL metrics for the various instrumental configurations used. © 2013 IEEE.

.
(2013) - 'Recognition of harmonic sounds in polyphonic audio using a missing feature approach'.
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, , pp. 8658-8662.
#### Abstract

A method based on local spectral features and missing feature techniques is proposed for the recognition of harmonic sounds in mixture signals. A mask estimation algorithm is proposed for identifying spectral regions that contain reliable information for each sound source and then bounded marginalization is employed to treat the feature vector elements that are determined as unreliable. The proposed method is tested on musical instrument sounds due to the extensive availability of data but it can be applied on other sounds (i.e. animal sounds, environmental sounds), whenever these are harmonic. In simulations the proposed method clearly outperformed a baseline method for mixture signals. © 2013 IEEE.

.
(2013) - 'Automatic Music Transcription using row weighted decompositions'.
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, , pp. 16-20.
#### Abstract

Automatic Music Transcription (AMT) seeks to understand a musical piece in terms of note activities. Matrix decomposition methods are often used for AMT, seeking to decompose a spectrogram over a dictionary matrix of note-specific template vectors. The performance of these methods can suffer due to the large harmonic overlap found in tonal musical spectra. We propose a row weighting scheme that transforms each spectrogram frame and the dictionary, with the weighting determined by the effective correlations in the decomposition. Experiments show improved AMT performance. © 2013 IEEE.

.
(2013) - 'Software techniques for good practice in audio and music research'.
134th Audio Engineering Society Convention 2013, , pp. 273-280.
#### Abstract

In this paper we discuss how software development can be improved in the audio and music research community by implementing tighter and more effective development feedback loops. We suggest first that researchers in an academic environment can benefit from the straightforward application of peer code review, even for ad-hoc research software; and second, that researchers should adopt automated software unit testing from the start of research projects. We discuss and illustrate how to adopt both code reviews and unit testing in a research environment. Finally, we observe that the use of a software version control system provides support for the foundations of both code reviews and automated unit tests. We therefore also propose that researchers should use version control with all their projects from the earliest stage.

.
(2013) - 'Probabilistic time-frequency source-filter decomposition of non-stationary signals'.
Proceedings of the 21st European European Signal Processing Conference 2013, Marrakech: European Signal Processing Conference 2013
#### Abstract

Probabilistic modelling of non-stationary signals in the time-frequency (TF) domain has been an active research topic recently. Various models have been proposed, notably in the nonnegative matrix factorization (NMF) literature. In this paper, we propose a new TF probabilistic model that can represent a variety of stationary and non-stationary signals, such as autoregressive moving average (ARMA) processes, uncorrelated noise, damped sinusoids, and transient signals. This model also generalizes and improves both the Itakura-Saito (IS)-NMF and high resolution (HR)-NMF models. © 2013 EURASIP.

.
(2013) - 'Eye Tracking as Interface for Parametric Design'. Paris, France :
CHI 2013 Workshop on "Gaze Interaction in the Post-WIMP World",
#### Abstract

This research investigates the potential of eye tracking as an interface to parameter search in visual design. We outline our experimental framework where a user's gaze acts as guiding feedback mechanism in an exploration of the state space of parametric designs. A small scale pilot study was carried out where participants in uence the evolution of generative patterns by looking at a screen while having their eyes tracked. Preliminary findings suggest that although our eye tracking system can be used to e ectively navigate small areas of a parametric design's state-space, there are challenges to overcome before such a system is practical in a design context. Finally we outline future directions of this research.

.
(2013) - 'Improved multiple birdsong tracking with distribution derivative method and Markov renewal process clustering'.
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, , pp. 468-472.
#### Abstract

Segregating an audio mixture containing multiple simultaneous bird sounds is a challenging task. However, birdsong often contains rapid pitch modulations, and these modulations carry information which may be of use in automatic recognition. In this paper we demonstrate that an improved spectrogram representation, based on the distribution derivative method, leads to improved performance of a segregation algorithm which uses a Markov renewal process model to track vocalisation patterns consisting of singing and silences. © 2013 IEEE.

.
(2013) - 'How Predictable Do We Like Our Music? Eliciting Aesthetic Preferences With The Melody Triangle Mobile App'. Logos Verlag Berlin
Stockholm, Sweden: Proceedings of the Sound and Music Computing Conference 2013, SMC 2013, pp. 80-85.
#### Abstract

The Melody Triangle is a smartphone application for Android that lets users easily create musical patterns and textures. The user creates melodies by specifying positions within a triangle, and these positions correspond to the information theoretic properties of generated musical sequences. A model of human expectation and surprise in the perception of music, information dynamics, is used to 'map out' a musical generative system's parameter space, in this case Markov chains. This enables a user to explore the possibilities afforded by Markov chains, not by directly selecting their parameters, but by specifying the subjective predictability of the output sequence. As users of the app find melodies and patterns they like, they are encouraged to press a 'like' button, where their setting are uploaded to our servers for analysis. Collecting the 'liked' settings of many users worldwide will allow us to elicit trends and commonalities in aesthetic preferences across users of the app, and to investigate how these might relate to the informationdynamic model of human expectation and surprise. We outline some of the relevant ideas from information dynamics and how the Melody Triangle is defined in terms of these. We then describe the Melody Triangle mobile application, how it is being used to collect research data and how the collected data will be evaluated.

.
(2013) - 'Dictionary learning via projected maximal exploration'.
2013 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2013 - Proceedings,
#### Abstract

This work presents a geometrical analysis of the Large Step Gradient Descent (LGD) dictionary learning algorithm. LGD updates the atoms of the dictionary using a gradient step with a step size equal to twice the optimal step size. We show that the large step gradient descent can be understood as a maximal exploration step where one goes as far away as possible without increasing the error. We also show that the LGD iteration is monotonic when the algorithm used for the sparse approximation step is close enough to orthogonal. © 2013 IEEE.

.
(2013) - 'Learning overcomplete dictionaries with ℓ
0 -sparse Non-negative Matrix Factorisation'. 2013 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2013 - Proceedings, , pp. 977-980.#### Abstract

Non-negative Matrix Factorisation (NMF) is a popular tool in which a 'parts-based' representation of a non-negative matrix is sought. NMF tends to produce sparse decompositions. This sparsity is a desirable property in many applications, and Sparse NMF (S-NMF) methods have been proposed to enhance this feature. Typically these enforce sparsity through use of a penalty term, and a ℓ1 norm penalty term is often used. However an ℓ1 penalty term may not be appropriate in a non-negative framework. In this paper the use of a ℓ0 norm penalty for NMF is proposed, approximated using backwards elimination from an initial NNLS decomposition. Dictionary recovery experiments using overcomplete dictionaries show that this method outperforms both NMF and a state of the art S-NMF method, in particular when the dictionary to be learnt is dense. © 2013 IEEE.

.
(2013) - 'Using Oracle Analysis for Decomposition-Based Automatic Music Transcription'. Springer Berlin Heidelberg
From Sounds to Music and Emotions: 9th International Symposium, CMMR 2012, London, UK, June 19-22, 2012, Revised Selected Papers 7900, pp. 353-365.
#### Abstract

One approach to Automatic Music Transcription (AMT) is to decompose a spectrogram with a dictionary matrix that contains a pitch-labelled note spectrum atom in each column. AMT performance is typically measured using frame-based comparison, while an alternative perspective is to use an event-based analysis. We have previously proposed an AMT system, based on the use of structured sparse representations. The method is described and experimental results are given, which are seen to be promising. An inspection of the graphical AMT output known as a piano roll may lead one to think that the performance may be slightly better than is suggested by the AMT metrics used. This leads us to perform an oracle analysis of the AMT system, with some interesting outcomes which may have implications for decomposition based AMT in general.

.
(2013) - 'Predictive Information in Gaussian Processes with Application to Music Analysis.'. Springer GSI, 8085, pp. 650-657. . (2013)
- 'Multichannel HR-NMF for modelling convolutive mixtures of non-stationary signals in the time-frequency domain'.
IEEE Workshop on Applications of Signal Processing to Audio and Acoustics,
#### Abstract

Several probabilistic models involving latent components have been proposed for modelling time-frequency (TF) representations of audio signals (such as spectrograms), notably in the nonnegative matrix factorization (NMF) literature. Among them, the recent high resolution NMF (HR-NMF) model is able to take both phases and local correlations in each frequency band into account, and its potential has been illustrated in applications such as source separation and audio inpainting. In this paper, HR-NMF is extended to multichannel signals and to convolutive mixtures. A fast variational expectation-maximization (EM) algorithm is proposed to estimate the enhanced model. This algorithm is applied to a stereophonic piano signal, and proves capable of accurately modelling reverberation and restoring missing observations. © 2013 IEEE.

.
(2013) - 'Recovery of nested supports by greedy sparse representation algorithms over non-normalized dictionaries'.
Lausanne, Switzerland: In Proceedings of the Workshop on Signal Processing with Adaptive Sparse Structured Representations (SPARS 2013), 8-11 July 2013. [Abstract only]
#### Abstract

We prove that if Orthogonal Matching Pursuit (OMP) recovers all s-sparse signals for a given dictionary, then it also recovers all s 0 -sparse signals on the same dictionary for any s 0 < s. We also extend Tropp’s Exact Recovery Condition (ERC) to dictionaries with non-normalized atoms. Our result does not contradict an earlier result stating that there are dictionaries and cardinalities s 0 < s such that all s-size supports satisfy Tropp’s (ERC) but not all s 0 -size supports do: that result was proved using non-normalized dictionaries and in that case Tropp’s ERC is not linked to the recovery by OMP.

.
(2013) - 'Structured sparsity using backwards elimination for Automatic Music Transcription'.
IEEE International Workshop on Machine Learning for Signal Processing, MLSP,
#### Abstract

Musical signals can be thought of as being sparse and structured, with few elements active at a given instant and temporal continuity of active elements observed. Greedy algorithms such as Orthogonal Matching Pursuit (OMP), and structured variants, have previously been proposed for Automatic Music Transcription (AMT), however some problems have been noted. Hence, we propose the use of a backwards elimination strategy in order to perform sparse decompositions for AMT, in particular with a proposed alternative sparse cost function. However, the main advantage of this approach is the ease with which structure can be incorporated. The use of group spar-sity is shown to give increased AMT performance, while a molecular method incorporating onset information is seen to provide further improvements with little computational effort. © 2013 IEEE.

.
(2013) - 'Low-rank matrix completion based malicious user detection in cooperative spectrum sensing'.
2013 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2013 - Proceedings, , pp. 1186-1189.
#### Abstract

In a cognitive radio (CR) system, cooperative spectrum sensing (CSS) is the key to improving sensing performance in deep fading channels. In CSS networks, signals received at the secondary users (SUs) are sent to a fusion center to make a final decision of the spectrum occupancy. In this process, the presence of malicious users sending false sensing samples can severely degrade the performance of the CSS network. In this paper, with the compressive sensing (CS) technique being implemented at each SU, we build a CSS network with double sparsity property. A new malicious user detection scheme is proposed by utilizing the adaptive outlier pursuit (AOP) based low-rank matrix completion in the CSS network. In the proposed scheme, the malicious users are removed in the process of signal recovery at the fusion center. The numerical analysis of the proposed scheme is carried out and compared with an existing malicious user detection algorithm. © 2013 IEEE.

.
(2013) - 'Learning incoherent subspaces for classification via supervised iterative projections and rotations'.
IEEE International Workshop on Machine Learning for Signal Processing, MLSP,
#### Abstract

In this paper we present the supervised iterative projections and rotations (S-IPR) algorithm, a method to optimise a set of discriminative subspaces for supervised classification. We show how the proposed technique is based on our previous unsupervised iterative projections and rotations (IPR) algorithm for incoherent dictionary learning, and how projecting the features onto the learned sub-spaces can be employed as a feature transform algorithm in the context of classification. Numerical experiments on the FISHERIRIS and on the USPS datasets, and a comparison with the PCA and LDA methods for feature transform demonstrates the value of the proposed technique and its potential as a tool for machine learning. © 2013 IEEE.

.
(2013) - 'Detection and classification of acoustic scenes and events: An IEEE AASP challenge'.
IEEE Workshop on Applications of Signal Processing to Audio and Acoustics,
#### Abstract

This paper describes a newly-launched public evaluation challenge on acoustic scene classification and detection of sound events within a scene. Systems dealing with such tasks are far from exhibiting human-like performance and robustness. Undermining factors are numerous: the extreme variability of sources of interest possibly interfering, the presence of complex background noise as well as room effects like reverberation. The proposed challenge is an attempt to help the research community move forward in defining and studying the aforementioned tasks. Apart from the challenge description, this paper provides an overview of systems submitted to the challenge as well as a detailed evaluation of the results achieved by those systems. © 2013 IEEE.

.
(2013) - 'A DATABASE AND CHALLENGE FOR ACOUSTIC SCENE CLASSIFICATION AND EVENT DETECTION'. IEEE
2013 PROCEEDINGS OF THE 21ST EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), Marrakesh, MOROCCO: 21st European Signal Processing Conference (EUSIPCO)
#### Abstract

An increasing number of researchers work in computational auditory scene analysis (CASA). However, a set of tasks, each with a well-defined evaluation framework and commonly used datasets do not yet exist. Thus, it is difficult for results and algorithms to be compared fairly, which hinders research on the field. In this paper we will introduce a newly-launched public evaluation challenge dealing with two closely related tasks of the field: acoustic scene classification and event detection. We give an overview of the tasks involved; describe the processes of creating the dataset; and define the evaluation metrics. Finally, illustrations on results for both tasks using baseline methods applied on this dataset are presented, accompanied by open-source code. © 2013 EURASIP

.
(2013) - 'A comparison of two different methods for score-informed source separation'.
Edinburgh, Scotland, UK: Proc. 5th International Workshop on Machine Learning and Music (MML 2012), pp. 11-12.
#### Abstract

We present a new method for score-informed source separation, combining ideas from two previous approaches: one based on paramet- ric modeling of the score which constrains the NMF updating process, the other based on PLCA that uses synthesized scores as prior probability distributions. We experimentally show improved separation results using the BSS EVAL and PEASS toolkits, and discuss strengths and weaknesses compared with the previous PLCA-based approach.

.
(2012) - 'Group non-negative basis pursuit for automatic music transcription'. Edinburgh, Scotland, UK :
Proc. 5th International Workshop on Machine Learning and Music (MML 2012), pp. 15-16.
#### Abstract

Automatic Music Transcription is often performed by decomposing a spectrogram over a dictionary of note specific atoms. Several note template atoms may be used to represent one note, and a group structure may be imposed on the dictionary. We propose a group sparse algorithm based on a multiplicative update and thresholding and show transcription results on a challenging dataset.

.
(2012) - 'Multi-target pitch tracking of vibrato sources in noise using the GM-PHD filter'.
Edinburgh, Scotland, UK: Proc. 5th International Workshop on Machine Learning and Music (MML 2012), pp. 27-28.
#### Abstract

Probabilistic approaches to tracking often use single-source Bayesian models; applying these to multi-source tasks is problematic. We apply a principled multi-object tracking implementation, the Gaussian mixture probability hypothesis density filter, to track multiple sources having fixed pitch plus vibrato. We demonstrate high-quality ltering in a synthetic experiment, and nd improved tracking using a richer feature set which captures underlying dynamics. Our implementation is available as open-source Python code.

.
(2012) - 'Analysis-based sparse reconstruction with synthesis-based solvers'.
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, , pp. 5401-5404.
#### Abstract

Analysis based reconstruction has recently been introduced as an alternative to the well-known synthesis sparsity model used in a variety of signal processing areas. In this paper we convert the analysis exact-sparse reconstruction problem to an equivalent synthesis recovery problem with a set of additional constraints. We are therefore able to use existing synthesis-based algorithms for analysis-based exact-sparse recovery. We call this the Analysis-By-Synthesis (ABS) approach. We evaluate our proposed approach by comparing it against the recent Greedy Analysis Pursuit (GAP) analysis-based recovery algorithm. The results show that our approach is a viable option for analysis-based reconstruction, while at the same time allowing many algorithms that have been developed for synthesis reconstruction to be directly applied for analysis reconstruction as well. © 2012 IEEE.

.
(2012) - 'Eye Tracking as Interface for the Design of Generative Visual Forms and Patterns'.
Edinburgh, UK: Predicting Perceptions: Proceedings of the 3rd International Conference on Appearance, pp. 117-121.
#### Abstract

When working with generative systems, designers enter into a loop of discrete steps; external evaluations of the output feedback into the system, and new outputs are subsequently reevaluated. In such systems, interacting low level elements can engender a difficult to predict emergence of macro-level characteristics. Furthermore, the state space of some systems can be vast. Consequently, designers generally rely on trial-and-error, experience or intuition in selecting parameter values to develop the aesthetic aspects of their designs. We investigate an alternative means of exploring the state spaces of generative visual systems by using a gaze- contingent display. A user's gaze continuously controls and directs an evolution of visual forms and patterns on screen. As time progresses and the viewer and system remain coupled in this evolution, a population of generative artefacts tends towards an area of their state space that is 'of interest', as defined by the eye tracking data. The evaluation-feedback loop is continuous and uninterrupted, gaze the guiding feedback mechanism in the exploration of state space.

.
(2012) - 'Group Polytope Faces Pursuit for Recovery of Block-Sparse Signals.'. Springer LVA/ICA, 7191, pp. 255-262. . (2012)
- 'INK-SVD: Learning incoherent dictionaries for sparse representations'.
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, , pp. 3573-3576.
#### Abstract

This work considers the problem of learning an incoherent dictionary that is both adapted to a set of training data and incoherent so that existing sparse approximation algorithms can recover the sparsest representation. A new decorrelation method is presented that computes a fixed coherence dictionary close to a given dictionary. That step iterates pairwise decorrelations of atoms in the dictionary. Dictionary learning is then performed by adding this decorrelation method as an intermediate step in the K-SVD learning algorithm. The proposed algorithm INK-SVD is tested on musical data and compared to another existing decorrelation method. INK-SVD can compute a dictionary that approximates the training data as well as K-SVD while decreasing the coherence from 0.6 to 0.2. © 2012 IEEE.

.
(2012) - 'Framewise heterodyne chirp analysis of birdsong'.
European Signal Processing Conference, , pp. 2694-2698.
#### Abstract

Harmonic birdsong is often highly nonstationary, which suggests that standard FFT representations may be of limited suitability. Wavelet and chirplet techniques exist in the literature, but are not often applied to signals such as bird vocalisations, perhaps due to analysis complexity. In this paper we develop a single-scale chirp analysis (computationally accelerated using FFT) which can be treated as an ordinary time-series. We then study a sinusoidal representation simply derived from the peak bins of this time-series. We show that it can lead to improved species classification from birdsong. © 2012 EURASIP.

.
(2012) - 'The Melody Triangle: Exploring Pattern and Predictability in Music'.
Musical Metacreation: Papers from the 2012 AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment. AAAI Technical Report WS-12-16, pp. 35-42.
#### Abstract

The Melody Triangle is an interface for the discovery of melodic materials, where the input � positions within a triangle � directly map to information theoretic properties of the output. A model of human expectation and surprise in the perception of music, information dynamics, is used to 'map out' a musical generative system's parameter space. This enables a user to explore the possibilities afforded by a generative algorithm, in this case Markov chains, not by directly selecting parameters, but by specifying the subjective predictability of the output sequence. We describe some of the relevant ideas from information dynamics and how the Melody Triangle is defined in terms of these. We describe its incarnation as a screen based performance tool and compositional aid for the generation of musical textures; the users control at the abstract level of randomness and predictability, and some pilot studies carried out with it. We also briefly outline a multi-user installation, where collaboration in a performative setting provides a playful yet informative way to explore expectation and surprise in music, and a forthcoming mobile phone version of the Melody Triangle.

.
(2012) - 'Dictionary Learning with Large Step Gradient Descent for Sparse Representations.'. Springer LVA/ICA, 7191, pp. 231-238. . (2012)
- 'AN ALTERNATING DESCENT ALGORITHM FOR THE OFF-GRID DOA ESTIMATION PROBLEM WITH SPARSITY CONSTRAINTS'. IEEE COMPUTER SOC 2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), Bucharest, ROMANIA: 20th European Signal Processing Conference (EUSIPCO), pp. 874-878. . (2012)
- 'A robust method for S1/S2 heart sounds detection without ecg reference based on music beat tracking'.
2012 10th International Symposium on Electronics and Telecommunications, ISETC 2012 - Conference Proceedings, , pp. 307-310.
#### Abstract

We present a robust method for the detection of the first and second heart sounds (s1 and s2), without ECG reference, based on a music beat tracking algorithm. An intermediate representation of the input signal is first calculated by using an onset detection function based on complex spectral difference. A music beat tracking algorithm is then used to determine the location of the first heart sound. The beat tracker works in two steps, it first calculates the beat period and then finds the temporal beat alignment. Once the first sound is detected, inverse Gaussian weights are applied to the onset function on the detected positions and the algorithm is run again to find the second heart sound. At the last step s1 and s2 labels are attributed to the detected sounds. The algorithm was evaluated in terms of location accuracy as well as sensitivity and specificity and the results showed good results even in the presence of murmurs or noisy signals. © 2012 IEEE.

.
(2012) - 'Choosing analysis or synthesis recovery for sparse reconstruction'.
European Signal Processing Conference, , pp. 869-873.
#### Abstract

The analysis sparsity model is a recently introduced alternative to the standard synthesis sparsity model frequently used in signal processing. However, the exact conditions when analysis-based recovery is better than synthesis recovery are still not known. This paper constitutes an initial investigation into determining when one model is better than the other, under similar conditions. We perform separate analysis and synthesis recovery on a large number of randomly generated signals that are simultaneously sparse in both models and we compare the average reconstruction errors with both recovery methods. The results show that analysis-based recovery is the better option for a large number of signals, but it is less robust with signals that are only approximately sparse or when fewer measurements are available. © 2012 EURASIP.

.
(2012) - 'Cognitive music modelling: An information dynamics approach'.
2012 3rd International Workshop on Cognitive Information Processing, CIP 2012,
#### Abstract

We describe an information-theoretic approach to the analysis of music and other sequential data, which emphasises the predictive aspects of perception, and the dynamic process of forming and modifying expectations about an unfolding stream of data, characterising these using the tools of information theory: entropies, mutual informations, and related quantities. After reviewing the theoretical foundations, we discuss a few emerging areas of application, including musicological analysis, real-time beat-tracking analysis, and the generation of musical materials as a cognitively-informed compositional aid. © 2012 IEEE.

.
(2012) - 'Structured sparsity for automatic music transcription'.
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, , pp. 441-444.
#### Abstract

Sparse representations have previously been applied to the automatic music transcription (AMT) problem. Structured sparsity, such as group and molecular sparsity allows the introduction of prior knowledge to sparse representations. Molecular sparsity has previously been proposed for AMT, however the use of greedy group sparsity has not previously been proposed for this problem. We propose a greedy sparse pursuit based on nearest subspace classification for groups with coherent blocks, based in a non-negative framework, and apply this to AMT. Further to this, we propose an enhanced molecular variant of this group sparse algorithm and demonstrate the effectiveness of this approach. © 2012 IEEE.

.
(2012) - 'Sound software: Towards software reuse in audio and music research'.
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, , pp. 2745-2748.
#### Abstract

Although researchers are increasingly aware of the need to publish and maintain software code alongside their results, practical barriers prevent this from happening in many cases. We examine these barriers, propose an incremental approach to overcoming some of them, and describe the Sound Software project, an effort to support software development practice in the UK audio and music research community. Finally we make some recommendations for research groups seeking to improve their own researchers' software practice. © 2012 IEEE.

.
(2012) - 'Instrumentation-based music similarity using sparse representations'.
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, , pp. 433-436.
#### Abstract

This paper describes a novelmusic similarity calculation method that is based on the instrumentation of music pieces. The approach taken here is based on the idea that sparse representations of musical audio signals are a rich source of information regarding the elements that constitute the observed spectra. We propose a method to extract feature vectors based on sparse representations and use these to calculate a similarity measure between songs. To train a dictionary for sparse representations from a large amount of training data, a novel dictionary-initialization method based on agglomerative clustering is proposed. An objective evaluation shows that the new features improve the performance of similarity calculation compared to the standard mel-frequency cepstral coefficients features. © 2012 IEEE.

.
(2012) - 'Oracle analysis of sparse automatic music transcription'. London, UK :
Proc. 9th International Symposium on Computer Music Modeling and Retrieval (CMMR 2012): Music and Emotions, 19-22 June 2012, pp. 591-598.
**Full text**is available at: http://epubs.surrey.ac.uk/807558/#### Abstract

We have previously proposed a structured sparse approach to piano transcription with promising results recorded on a challenging dataset. The approach taken was measured in terms of both frame-based and onset-based metrics. Close inspection of the results revealed problems in capturing frames displaying low-energy of a given note, for example in sustained notes. Further problems were also noticed in the onset detection, where for many notes seen to be active in the output trancription an onset was not detected. A brief description of the approach is given here, and further analysis of the system is given by considering an oracle transcription, derived from the ground truth piano roll and the given dictionary of spectral template atoms, which gives a clearer indication of the problems which need to be overcome in order to improve the proposed approach.

.
(2012) - 'Denoising and segmentation of the second heart sound using matching pursuit'.
Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS, , pp. 3440-3443.
#### Abstract

We propose a denoising and segmentation technique for the second heart sound (S2). To denoise, Matching Pursuit (MP) was applied using a set of non-linear chirp signals as atoms. We show that the proposed method can be used to segment the phonocardiogram of the second heart sound into its two clinically meaningful components: the aortic (A2) and pulmonary (P2) components. © 2012 IEEE.

.
(2012) - 'Causal prediction of continuous-valued music features'.
Proceedings of the 12th International Society for Music Information Retrieval Conference, ISMIR 2011, , pp. 501-506.
#### Abstract

This paper investigates techniques for predicting sequences of continuous-valued feature vectors extracted from musical audio. In particular, we consider prediction of beatsynchronous Mel-frequency cepstral coefficients and chroma features in a causal setting, where features are predicted as they unfold in time. The methods studied comprise autoregressive models, N-gram models incorporating a smoothing scheme, and a novel technique based on repetition detection using a self-distance matrix. Furthermore, we propose a method for combining predictors, which relies on a running estimate of the error variance of the predictors to inform a linear weighting of the predictor outputs. Results indicate that incorporating information on long-term structure improves the prediction performance for continuous-valued, sequential musical data. For the Beatles data set, combining the proposed self-distance based predictor with both N-gram and autoregressive methods results in an average of 13% improvement compared to a linear predictive baseline. © 2011 International Society for Music Information Retrieval.

.
(2011) - 'Sequential minimal eigenvalues - An approach to analysis dictionary learning'.
European Signal Processing Conference, , pp. 1465-1469.
#### Abstract

Over the past decade there has been a great interest in a synthesis-based model for signals, based on sparse and redundant representations. Such a model assumes that the signal of interest can be decomposed as a linear combination of few columns from a given matrix (the dictionary). An alternative, analysis-based, model can be envisioned, where an analysis operator multiplies the signal, leading to a sparse outcome. In this paper we propose a simple but effective analysis operator learning algorithm, where analysis "atoms" are learned sequentially by identifying directions that are orthogonal to a subset of the training data. We demonstrate the effectiveness of the algorithm in three experiments, treating synthetic data and real images, showing a successful and meaningful recovery of the analysis operator. © 2011 EURASIP.

.
(2011) - 'Real-time Visual Beat Tracking using a Comb Filter Matrix.'. Michigan Publishing ICMC, . (2011)
- 'Dictionary learning of convolved signals.'. IEEE ICASSP, , pp. 5812-5815. . (2011)
- 'Structure-aware dictionary learning with harmonic atoms'.
European Signal Processing Conference, , pp. 1761-1765.
#### Abstract

Non-negative blind signal decomposition methods are widely used for musical signal processing tasks, such as automatic transcription and source separation. A spectrogram can be decomposed into a dictionary of full spectrum basis atoms and their corresponding time activation vectors using methods such as Non-negative Matrix Factorisation (NMF) and Non-negative K-SVD (NN-K-SVD). These methods are constrained by their learning order and problems posed by overlapping sources in the time and frequency domains of the source spectrogram. We consider that it may be possible to improve on current results by providing prior knowledge on the number of sources in a given spectrogram and on the individual structure of the basis atoms, an approach we refer to as structure-aware dictionary learning. In this work we consider dictionary recoverability of harmonic atoms, as harmonicity is a common structure in music signals. We present results showing improvements in recoverability using structure-aware decomposition methods, based on NN-KSVD and NMF. Finally we propose an alternative structureaware dictionary learning algorithm incorporating the advantages of NMF and NN-K-SVD. © EURASIP, 2011.

.
(2011) - 'A constrained matching pursuit approach to audio declipping.'. IEEE ICASSP, , pp. 329-332. . (2011)
- 'The medium is the message: Composing instruments and performing mappings'.
Proceedings of the International Conference on New Interfaces for Musical Expression (NIME 2011), pp. 56-59.
#### Abstract

Many performers of novel musical instruments find it diffi- cult to engage audiences beyond those in the field. Previous research points to a failure to balance complexity with usability, and a loss of transparency due to the detachment of the controller and sound generator. The issue is often exacerbated by an audience�s lack of prior exposure to the instrument and its workings. However, we argue that there is a conflict underlying many novel musical instruments in that they are intended to be both a tool for creative expression and a creative work of art in themselves, resulting in incompatible requirements. By considering the instrument, the composition and the performance together as a whole with careful consideration of the rate of learning demanded of the audience, we propose that a lack of transparency can become an asset rather than a hindrance. Our approach calls for not only controller and sound generator to be designed in sympathy with each other, but composition, performance and physical form too. Identifying three design principles, we illustrate this approach with the Serendiptichord, a wearable instrument for dancers created by the authors.

.
(2011) - 'Blind source separation of periodic sources from sequentially recorded instantaneous mixtures'.
ISPA 2011 - 7th International Symposium on Image and Signal Processing and Analysis, , pp. 540-545.
#### Abstract

We consider the separation of sources when only one movable sensor is available to record a set of mixtures at distinct locations. A single mixture signal is acquired, which is firstly segmented. Then, based on the assumption that the underlying sources are temporally periodic, we align the resulting signals and form a measurement vector on which source separation can be performed. We demonstrate that this approach can successfully recover the original sources both when working with simulated data, and for a real problem of heart sound separation. © 2011 University of Zagreb.

.
(2011) - 'Separating sources from sequentially acquired mixtures of heart signals.'. IEEE ICASSP, , pp. 653-656. . (2011)
- 'On the disjointess of sources in music using different time-frequency representations.'. IEEE WASPAA, , pp. 261-264. . (2011)
- 'Cross-associating unlabelled timbre distributions to create expressive musical mappings.'. JMLR.org WAPA, 11, pp. 28-35. . (2010)
- 'SMALLbox - An Evaluation Framework for Sparse Representations and Dictionary Learning Algorithms.'. Springer LVA/ICA, 6365, pp. 418-425. . (2010)
- 'Gradient Polytope Faces Pursuit for large scale sparse recovery problems.'. IEEE ICASSP, , pp. 2030-2033. . (2010)
- 'A doubly sparse greedy adaptive dictionary learning algorithm for music and large-scale data'.
128th Audio Engineering Society Convention 2010, 2, pp. 940-945.
#### Abstract

We consider the extension of the greedy adaptive dictionary learning algorithm that we introduced previously, to applications other than speech signals. The algorithm learns a dictionary of sparse atoms, while yielding a sparse representation for the speech signals. We investigate its behavior in the analysis of music signals, and propose a different dictionary learning approach that can be applied to large data sets. This facilitates the application of the algorithm to problems that generate large amounts of data, such as multimedia of multi-channel application areas.

.
(2010) - 'The Serendiptichord: A wearable instrument for contemporary dance performance'.
128th Audio Engineering Society Convention 2010, 3, pp. 1547-1554.
#### Abstract

We describe a novel musical instrument designed for use in contemporary dance performance. This instrument, the Serendiptichord, takes the form of a headpiece plus associated pods which sense movements of the dancer, together with associated audio processing software driven by the sensors. Movements such as translating the pods or shaking the trunk of the headpiece cause selection and modification of sampled sounds. We discuss how we have closely integrated physical form, sensor choice and positioning and software to avoid issues which otherwise arise with disconnection of the innate physical link between action and sound, leading to an instrument that non-musicians (in this case, dancers) are able to enjoy using immediately.

.
(2010) - 'Performance following: Tracking a performance without a score.'. IEEE ICASSP, , pp. 2482-2485. . (2010)
- 'Note onset detection using rhythmic structure.'. IEEE ICASSP, , pp. 5526-5529. . (2010)
- 'A Multichannel Spatial Compressed Sensing Approach for Direction of Arrival Estimation.'. Springer LVA/ICA, 6365, pp. 458-465. . (2010)
- 'Timbre remapping through a regression-tree technique'.
Proceedings of the 7th Sound and Music Computing Conference, SMC 2010,
#### Abstract

We consider the task of inferring associations between two differently-distributed and unlabelled sets of timbre data. This arises in applications such as concatenative synthesis/ audio mosaicing in which one audio recording is used to control sound synthesis through concatenating fragments of an unrelated source recording. Timbre is a multidimensional attribute with interactions between dimensions, so it is non-trivial to design a search process which makes best use of the timbral variety available in the source recording. We must be able to map from control signals whose timbre features have different distributions from the source material, yet labelling large collections of timbral sounds is often impractical, so we seek an unsupervised technique which can infer relationships between distributions. We present a regression tree technique which learns associations between two unlabelled multidimensional distributions, and apply the technique to a simple timbral concatenative synthesis system. We demonstrate numerically that the mapping makes better use of the source material than a nearest-neighbour search. © 2010 Dan Stowell et al.

.
(2010) - 'Improving the performance of pitch estimators'.
128th Audio Engineering Society Convention 2010, 2, pp. 1319-1332.
#### Abstract

We are looking to use pitch estimators to provide an accurate high-resolution pitch track for resynthesis of musical audio. We found that current evaluation measures such as gross error rate (GER) are not suitable for algorithm selection. In this paper we examine the issues relating to evaluating pitch estimators and use these insights to improve performance of existing algorithms such as the well-known YIN pitch estimation algorithm.

.
(2010) - 'An L1 criterion for dictionary learning by subspace identification.'. IEEE ICASSP, , pp. 5482-5485. . (2010)
- 'Towards a musical beat emphasis function.'. IEEE WASPAA, , pp. 61-64. . (2009)
- 'Sparse reconstruction for compressed sensing using stagewise polytope faces pursuit'.
DSP 2009: 16th International Conference on Digital Signal Processing, Proceedings,
#### Abstract

Compressed sensing, also known as compressive sampling, is an approach to the measurement of signals which have a sparse representation, that can reduce the number of measurements that are needed to reconstruct the signal. The signal reconstruction part requires efficient methods to perform sparse reconstruction, such as those based on linear programming. In this paper we present a method for sparse reconstruction which is an extension of our earlier Polytope Faces Pursuit algorithm, based on the polytope geometry of the dual linear program. The new algorithm adds several basis vectors at each stage, in a similar way to the recent Stagewise Orthogonal Matching Pursuit (StOMP) algorithm. We demonstrate the application of the algorithm to some standard compressed sensing problems. © 2009 IEEE.

.
(2009) - 'Using phase linearity in frequency-domain ICA to tackle the permutation problem.'. IEEE ICASSP, , pp. 3165-3168. . (2009)
- 'Speech denoising based on a greedy adaptive dictionary algorithm'.
European Signal Processing Conference, , pp. 1423-1426.
#### Abstract

In this paper we consider the problem of speech denoising based on a greedy adaptive dictionary (GAD) algorithm. The transform is orthogonal by construction, and is found to give a sparse representation of the data being analysed, and to be robust to additive Gaussian noise. The performance of the algorithm is compared to that of the principal component analysis (PCA) method, for a speech denoising application. It is found that the GAD algorithm offers a sparser solution than PCA, while having a similar performance in the presence of noise. © EURASIP, 2009.

.
(2009) - 'Information dynamics and the perception of temporal structure'. Connectionist Models of Behavior and Cognition II: Proceedings of the 11th Neural Computation and Psychology Workshop, Oxford, UK, 16-18 July 2008 18, pp. 179-190. . (2009)
- 'Real-time beat-synchronous analysis of musical audio'.
Proceedings of the 12th International Conference on Digital Audio Effects, DAFx 2009, , pp. 299-304.
#### Abstract

In this paper we present a model for beat-synchronous analysis of musical audio signals. Introducing a real-time beat tracking model with performance comparable to offline techniques, we discuss its application to the analysis of musical performances segmented by beat. We discuss the various design choices for beat-synchronous analysis and their implications for real-time implementations before presenting some beat-synchronous harmonic analysis examples. We make available our beat tracker and beat-synchronous analysis techniques as externals for Max/MSP.

.
(2009) - 'Benchmarking flexible adaptive time-frequency transforms for underdetermined audio source separation'.
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, , pp. 37-40.
#### Abstract

We have implemented several fast and flexible adaptive lapped orthogonal transform (LOT) schemes for underdetermined audio source separation. This is generally addressed by time-frequency masking, requiring the sources to be disjoint in the time-frequency domain. We have already shown that disjointness can be increased via adaptive dyadic LOTs. By taking inspiration from the windowing schemes used in many audio coding frameworks, we improve on earlier results in two ways. Firstly, we consider non-dyadic LOTs which match the time-varying signal structures better. Secondly, we allow for a greater range of overlapping window profiles to decrease window boundary artifacts. This new scheme is benchmarked through oracle evaluations, and is shown to decrease computation time by over an order of magnitude compared to using very general schemes, whilst maintaining high separation performance and flexible signal adaptivity. As the results demonstrate, this work may find practical applications in high fidelity audio source separation. ©2009 IEEE.

.
(2009) - 'Real-time chord recognition for live performance'.
Proceedings of the 2009 International Computer Music Conference, ICMC 2009, , pp. 85-88.
#### Abstract

This paper describes work aimed at creating an efficient, real-time, robust and high performance chord recognition system for use on a single instrument in a live performance context. An improved chroma calculation method is combined with a classification technique based on masking out expected note positions in the chromagram and minimising the residual energy. We demonstrate that our approach can be used to classify a wide range of chords, in real-time, on a frame by frame basis. We present these analysis techniques as externals for Max/MSP. © July 2009- All copyright remains with the individual authors.

.
(2009) - 'Extension of Sparse, Adaptive Signal Decompositions to Semi-blind Audio Source Separation.'. Springer ICA, 5441, pp. 605-612. . (2009)
- 'Rendering audio using expressive MIDI'.
127th Audio Engineering Society Convention 2009, 1, pp. 176-184.
#### Abstract

MIDI renderings of audio are traditionally regarded as lifeless and unnatural - lacking in expression. However, MIDI is simply a protocol for controlling a synthesizer. Lack of expression is caused by either an expressionless synthesizer or by the difficulty in setting the MIDI parameters to provide expressive output. We have developed a system to construct an expressive MIDI representation of an audio signal, i.e. an audio representation which uses tailored pitch variations in addition to the note base pitch parameters which audio-to-MIDI systems usually attempt. A pitch envelope is estimated from the original audio, and a genetic algorithm is then used to estimate pitch modulator parameters from that envelope. These pitch modulations are encoded in a MIDI file and rerendered using a sampler. We present some initial comparisons between the final output audio and the estimated pitch envelopes.

.
(2009) - 'Estimating parameters from audio for an EG+LFO model of pitch envelopes'.
Proceedings of the 12th International Conference on Digital Audio Effects, DAFx 2009, , pp. 451-455.
#### Abstract

Envelope generator (EG) and Low Frequency Oscillator (LFO) parameters give a compact representation of audio pitch envelopes. By estimating these parameters from audio per-note, they could be used as part of an audio coding scheme. Recordings of various instruments and articulations were examined, and pitch envelopes found. Using an evolutionary algorithm, EG and LFO parameters for the envelopes were estimated. The resulting estimated envelopes are compared to both the original envelope, and to a fixed-pitch estimate. Envelopes estimated using EG+LFO can closely represent the envelope from the original audio and provide a more accurate estimate than the mean pitch.

.
(2009) - 'Estimating Phase Linearity in the Frequency-Domain ICA Demixing Matrix.'. Springer ICA, 5441, pp. 362-370. . (2009)
- 'Post-processing fiddle~: A real-time multi-pitch tracking technique using harmonic partial subtraction for use within live performance systems'.
Proceedings of the 2009 International Computer Music Conference, ICMC 2009, , pp. 227-230.
#### Abstract

We present a method for real-time pitch-tracking which generates an estimation of the relative amplitudes of the partials relative to the fundamental for each detected note. We then employ a subtraction method, whereby lower fundamentals in the spectrum are accounted for when looking at higher fundamental notes. By tracking notes which are playing, we look for note off events and continually update our expected partial weightings for each note. The resulting algorithm makes use of these relative partial weightings within its decision process. We have evaluated the system against a data set and compared it with specialised offline pitch-trackers. © July 2009- All copyright remains with the individual authors.

.
(2009) - 'A Turing Test for B-Keeper: Evaluating an Interactive Real-Time Beat-Tracker'. Proceedings of the 8th International Conference on New Interfaces for Musical Expression (NIME 2008), , pp. 319-324. . (2008)
- 'Robustness and independence of voice timbre features under live performance acoustic degradations'.
Proceedings - 11th International Conference on Digital Audio Effects, DAFx 2008, , pp. 325-332.
#### Abstract

Live performance situations can lead to degradations in the vocal signal from a typical microphone, such as ambient noise or echoes due to feedback. We investigate the robustness of continuousvalued timbre features measured on vocal signals (speech, singing, beatboxing) under simulated degradations. We also consider nonparametric dependencies between features, using information theoretic measures and a feature-selection algorithm. We discuss how robustness and independence issues reflect on the choice of acoustic features for use in constructing a continuous-valued vocal timbre space. While some measures (notably spectral crest factors) emerge as good candidates for such a task, others are poor, and some features such as ZCR exhibit an interaction with the type of voice signal being analysed.

.
(2008) - 'Separation of stereo speech signals based on a sparse dictionary algorithm'.
European Signal Processing Conference,
#### Abstract

We address the problem of source separation in echoic and anechoic environments, with a new algorithm which adaptively learns a set of sparse stereo dictionary elements, which are then clustered to identify the original sources. The atom pairs learned by the algorithm are found to capture information about the direction of arrival of the source signals, which allows to determine the clusters. A similar approach is also used here to extend the dictionary learning K singular value decomposition (K-SVD) algorithm, to address the source separation problem, and results from the two methods are compared. Computer simulations indicate that the proposed adaptive sparse stereo dictionary (ASSD) algorithm yields good performance in both anechoic and echoic environments. copyright by EURASIP.

.
(2008) - 'Discourse Analysis Evaluation Method for Expressive Musical Interfaces.'. nime.org NIME, , pp. 81-86. . (2008)
- 'An adaptive orthogonal sparsifying transform for speech signals'.
2008 3rd International Symposium on Communications, Control, and Signal Processing, ISCCSP 2008, , pp. 786-790.
#### Abstract

In this paper we consider the problem of representing a speech signal with an adaptive transform that captures the main features of the data. The transform is orthogonal by construction, and is found to give a sparse representation of the data being analysed. The orthogonality property implies that evaluation of both the forward and inverse transform involve a simple matrix multiplication. The proposed dictionary learning algorithm is compared to the K singular value decomposition (K-SVD) method, which is found to yield very sparse representations, at the cost of a high approximation error. The proposed algorithm is shown to have a much lower computational complexity than K-SVD, while the resulting signal representation remains relatively sparse. ©2008 IEEE.

.
(2008) - 'Exploring the effect of rhythmic style classification on automatic tempo estimation'.
European Signal Processing Conference,
#### Abstract

Within ballroom dance music, tempo and rhythmic style are strongly related. In this paper we explore this relationship, by using knowledge of rhythmic style to improve tempo estimation in musical audio signals. We demonstrate how the use of a simple 1-NN classification method, able to determine rhythmic style with 75% accuracy, can lead to an 8% point improvement over existing tempo estimation algorithms with further gains possible through the use of more sophisticated classification techniques.

.
(2008) - 'Natural Conjugate Gradient on Complex Flag Manifolds for Complex Independent Subspace Analysis.'. Springer ICANN (1), 5163, pp. 165-174. . (2008)
- 'Rhythmic analysis for real-time audio effects'.
International Computer Music Conference, ICMC 2008,
#### Abstract

We outline a set of audio effects that use rhythmic analysis, in particular the extraction of beat and tempo information, to automatically synchronise temporal parameters to the input signal. We demonstrate that this analysis, known as beat-tracking, can be used to create adaptive parameters that adjust themselves according to changes in the properties of the input signal. We present common audio effects such as delay, tremolo and auto-wah augmented in this fashion and discuss their real-time implementation as Audio Unit plug-ins and objects for Max/MSP.

.
(2008) - 'Real Time Gesture Learning and Recognition: Towards Automatic Categorization.'. nime.org NIME, , pp. 215-218. . (2008)
- 'Oracle estimation of adaptive cosine packet transforms for underdetermined audio source separation.'. IEEE ICASSP, , pp. 41-44. . (2008)
- 'Speech separation using an adaptive sparse dictionary algorithm'.
2008 Hands-free Speech Communication and Microphone Arrays, Proceedings, HSCMA 2008, , pp. 25-28.
#### Abstract

We present a greedy adaptive algorithm that builds a sparse orthogonal dictionary from the observed data. In this paper, the algorithm is used to separate stereo speech signals, and the phase information that is inherent to the extracted atom pairs is used for clustering and identification of the original sources. The performance of the algorithm is compared that of the adaptive stereo basis algorithm, when the sources are mixed in echoic and anechoic environments. We find that the algorithm correctly separates the sources, and can do this even with a relatively small number of atoms. ©2008 IEEE.

.
(2008) - 'Automated rhythmic transformation of musical audio'.
Proceedings - 11th International Conference on Digital Audio Effects, DAFx 2008, , pp. 177-180.
#### Abstract

Time-scale transformations of audio signals have traditionally relied exclusively upon manipulations of tempo. We present a novel technique for automatic mixing and synchronization between two musical signals. In this transformation, the original signal assumes the tempo, meter, and rhythmic structure of the model signal, while the extracted downbeats and salient intra-measure infrastructure of the original are maintained.

.
(2008) - 'Object-Coding for Resolution-Free Musical Audio'.
Proceedings of the AES International Conference,
#### Abstract

Object-based coding of audio represents the signal as a parameter stream for a set of sound-producing objects. Encoding in this manner can provide a resolution-free representation of an audio signal. Given a robust estimation of the object-parameters and a multi-resolution synthesis engine, the signal can be "intelligently" upsampled, extending the bandwidth and getting best use out of a high-resolution signal-chain. We present some initial findings on extending bandwidth using harmonic models.

.
(2007) - 'Fast factorization-based inference for bayesian harmonic models'.
Proceedings of the 2006 16th IEEE Signal Processing Society Workshop on Machine Learning for Signal Processing, MLSP 2006, , pp. 117-122.
#### Abstract

Harmonie sinusoidal models are a fundamental tool for audio signal analysis. Bayesian harmonic models guarantee a good resynthesis quality and allow joint use of learnt parameter priors and auditory motivated distortion measures. However inference algorithms based on Monte Carlo sampling are rather slow for realistic data. In this paper, we investigate fast inference algorithms based on approximate factorization of the joint posterior into a product of independent distributions on small subsets of parameters. We discuss the conditions under which these approximations hold true and evaluate their performance experimentally. We suggest how they could be used together with Monte Carlo algorithms for a faster sampling-based inference. © 2006 IEEE.

.
(2007) - 'Real-time beat-synchronous audio effects'.
Proceedings of the 7th International Conference on New Interfaces for Musical Expression, NIME '07, , pp. 344-345.
#### Abstract

We present a new group of audio effects that use beat tracking, the detection of beats in an audio signal, to relate effect parameters to the beats in an input signal. Conventional audio effects are augmented so that their operation is related to the output of a beat tracking system. We present a temposynchronous delay effect and a set of beat synchronous low frequency oscillator effects including tremolo, vibrato and auto-wah. All effects are implemented in real-time as VST plug-ins to allow for their use in live performance.

.
(2007) - 'Convolutive blind source separation of speech signals in the low frequency bands'.
Audio Engineering Society - 123rd Audio Engineering Society Convention 2007, 3, pp. 1195-1198.
#### Abstract

Sub-band methods are often used to address the problem of convolutive blind speech separation, as they offer the computational advantage of approximating convolutions by multiplications. The computational load, however, often remains quite high, because separation is performed on several sub-bands. In this paper, we exploit the well known fact that the high frequency content of speech signals typically conveys little information, since most of the speech power is found in frequencies up to 4kHz, and consider separation only in frequency bands below a certain threshold. We investigate the effect of changing the threshold, and find that separation performed only in the low frequencies can lead to the recovered signals being similar in quality to those extracted from all frequencies.

.
(2007) - 'Adaptive whitening for improved real-time audio onset detection'.
International Computer Music Conference, ICMC 2007, , pp. 312-319.
#### Abstract

We describe a new method for preprocessing STFT phasevocoder frames for improved performance in real-time onset detection, which we term "adaptive whitening". The procedure involves normalising the magnitude of each bin according to a recent maximum value for that bin, with the aim of allowing each bin to achieve a similar dynamic range over time, which helps to mitigate against the influence of spectral roll-off and strongly-varying dynamics. Adaptive whitening requires no training, is relatively lightweight to compute, and can run in real-time. Yet it can improve onset detector performance by more than ten percentage points (peak F-measure) in some cases, and improves the performance of most of the onset detectors tested. We present results demonstrating that adaptive whitening can significantly improve the performance of various STFT-based onset detection functions, including functions based on the power, spectral flux, phase deviation, and complex deviation measures. Our results find the process to be especially beneficial for certain types of audio signal (e.g. complex mixtures such as pop music).

.
(2007) - 'Geometry and manifolds for independent component analysis'.
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 4
#### Abstract

In the last few years, there has been a great interest in the use of geometrical methods for independent component analysis (ICA), both to gain insight into the optimization process and to develop more efficient optimization algorithms. Much of this work involves concepts from differential geometry, such as Lie groups, Stiefel manifolds, or tangent planes that may be unfamiliar to signal processing researchers. The purpose of this tutorial paper is to introduce some of these geometry concepts to signal processing and ICA researchers, without assuming any existing background in differential geometry. The emphasis of the paper is on making the important concepts in this field accessible, rather than mathematical rigour. © 2007 IEEE.

.
(2007) - 'Dictionary Learning for L1-Exact Sparse Coding.'. Springer ICA, 4666, pp. 406-413. . (2007)
- 'Audio effects for real-time performance using beat tracking'.
Audio Engineering Society - 122nd Audio Engineering Society Convention 2007, 2, pp. 866-872.
#### Abstract

We present a new class of digital audio effects which can automatically relate parameter values to the tempo of a musical input in real-time. Using a beat tracking system as the front end, we demonstrate a tempo-dependent delay effect and a set of beat-synchronous low frequency oscillator (LFO) effects including auto-wah, tremolo and vibrato. The effects show better performance than might be expected as they are blind to certain beat tracker errors. All effects are implemented as VST plug-ins which operate in real-time, enabling their use both in live musical performance and the off-line modification of studio recordings.

.
(2007) - 'B-Keeper: A beat-tracker for live performance'.
Proceedings of the 7th International Conference on New Interfaces for Musical Expression, NIME '07, , pp. 234-237.
#### Abstract

This paper describes the development of B-Keeper, a reatime beat tracking system implemented in Java and Max/MSP, which is capable of maintaining synchronisation between an electronic sequencer and a drummer. This enables musicians to interact with electronic parts which are triggered automatically by the computer from performance information. We describe an implementation which functions with the sequencer Ableton Live.

.
(2007) - 'On the use of entropy for beat tracking evaluation'.
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 4
#### Abstract

Despite continued attention toward the problem of automatic beat detection in musical audio, the issue of how to evaluate beat tracking systems remains pertinent and controversial. As yet no consistent evaluation metric has been adopted by the research community. To this aim, we propose a new method for beat tracking evaluation by measuring beat accuracy in terms of the entropy of a beat error histogram. We demonstrate the ability of our approach to address several shortcomings of existing methods. © 2007 IEEE.

.
(2007) - 'The Role of High Frequencies in Convolutive Blind Source Separation of Speech Signals.'. Springer ICA, 4666, pp. 488-494. . (2007)
- 'Flag manifolds for subspace ICA problems'.
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 4
#### Abstract

We investigate the use of the Riemannian optimization method over the flag manifold in subspace ICA problems such as independent subspace analysis (ISA) and complex ICA. In the ISA experiment, we use the Riemannian approach over the flag manifold together with an MCMC method to overcome the problem of local minima of the ISA cost function. Experiments demonstrate the effectiveness of both Riemannian methods - simple geodesic gradient descent and hybrid geodesic gradient descent, compared with the ordinary gradient method. © 2007 IEEE.

.
(2007) - 'Musical audio analysis using sparse representations'. Physica-Verlag HD
Compstat 2006 - Proceedings in Computational Statistics: 17th Symposium Held in Rome, Italy, 2006, pp. 105-117.
#### Abstract

Sparse representations are becoming an increasingly useful tool in the analysis of musical audio signals. In this paper we will given an overview of work by ourselves and others in this area, to give a flavour of the work being undertaken, and to give some pointers for further information about this interesting and challenging research topic.

.
(2006) - 'Riemannian Optimization Method on the Flag Manifold for Independent Subspace Analysis.'. Springer
ICA, 3889, pp. 295-302.doi: 10.1007/11679363_37
.
(2006)
- 'Riemannian optimization method on generalized flag manifolds for complex and subspace ICA'.
AIP Conference Proceedings, 872, pp. 89-96.doi: 10.1063/1.2423264
#### Abstract

In this paper we introduce a new class of manifolds, generalized flag manifolds, for the complex and subspace ICA problems. A generalized flag manifold is a manifold consisting of subspaces which are orthogonal to each other. The class of generalized flag manifolds include the class of Grassmann manifolds. We extend the Riemannian optimization method to include this new class of manifolds by deriving the formulas for the natural gradient and geodesics on these manifolds. We show how the complex and subspace ICA problems can be solved by optimization of cost functions on a generalized flag manifold. Computer simulations demonstrate our algorithm gives good performance compared with the ordinary gradient descent method. © 2006 American Institute of Physics.

.
(2006) - 'Source extraction from two-channel mixtures by joint cosine packet analysis'.
European Signal Processing Conference,
#### Abstract

This paper describes novel, computationally efficient approaches to source separation of underdetermined instantaneous two-channel mixtures. A best basis algorithm is applied to trees of local cosine bases to determine a sparse transform. We assume that the mixing parameters are known and focus on demixing sources by binary time-frequency masking. We describe a method for deriving a best local cosine basis from the mixtures by minimising an l1 norm cost function. This basis is adapted to the input of the masking process. Then, we investigate how to increase sparsity by adapting local cosine bases to the expected output of a single source instead of to the input mixtures. The heuristically derived cost function maximises the energy of the transform coefficients associated with a particular direction. Experiments on a mixture of four musical instruments are performed, and results are compared. It is shown that local cosine bases can give better results than fixed-basis representations.

.
(2006) - 'A spectral difference approach to downbeat extraction in musical audio'.
European Signal Processing Conference,
#### Abstract

We introduce a method for detecting downbeats in musical audio given a sequence of beat times. Using musical knowledge that lower frequency bands are perceptually more important, we find the spectral difference between band-limited beat synchronous analysis frames as a robust downbeat indicator. Initial results are encouraging for this type of system.

.
(2006) - 'Sparse Coding for Convolutive Blind Audio Source Separation.'. Springer
ICA, 3889, pp. 132-139.doi: 10.1007/11679363_17
.
(2006)
- 'Single-Channel Mixture Decomposition Using Bayesian Harmonic Models.'. Springer
ICA, 3889, pp. 722-730.doi: 10.1007/11679363_90
.
(2006)
- 'Recovery of Sparse Representations by Polytope Faces Pursuit.'. Springer
ICA, 3889, pp. 206-213.doi: 10.1007/11679363_26
.
(2006)
- 'Beat tracking towards automatic musical accompaniment'.
Audio Engineering Society - 118th Convention Spring Preprints 2005, 2, pp. 751-757.
#### Abstract

In this paper we address the issue of causal rhythmic analysis, primarily towards predicting the locations of musical beats such that they are consistent with a musical audio input. This will be a key component required for a system capable of automatic accompaniment with a live musician. We are implementing our approach as part of the aubio real-time audio library. While performance for this causal system is reduced in comparison to our previous non-causal system, it is still suitable for our intended purpose.

.
(2005) - 'A prototype system for object coding of musical audio'.
IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, , pp. 239-242.
#### Abstract

This article deals with low bitrate object coding of musical audio, and more precisely with the extraction of pitched sound objects in polyphonic music. After a brief review of existing methods, we discuss the potential benefits of recasting this problem in a Bayesian framework. We define pitched objects by a set of probabilistic priors and derive efficient algorithms to infer active objects and their parameters. Preliminary experiments suggest that the proposed method results in a better sound quality than simple sinusoidal coding while achieving a lower bitrate. © 2005 IEEE.

.
(2005) - 'Beat tracking with a two state model [music applications].'. IEEE ICASSP (3), , pp. 241-244. . (2005)
- 'Application of geometric dependency analysis to the separation of convolved mixtures'.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3195, pp. 540-547.
#### Abstract

We investigate a generalisation of the structure of frequency domain ICA as applied to the separation of convolved mixtures, and show how a geometric representation of residual dependency can be used both as an aid 'to visualisation and intuition, and as tool for clustering components into independent subspaces, thus providing a solution to the source separation problem. © Springer-Verlag 2004.

.
(2004) - 'Lie group methods for optimization with orthogonality constraints'.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3195, pp. 1245-1252.
#### Abstract

Optimization of a cost function J(W) under an orthogonality constraint WWT = I is a common requirement for ICA methods. In this paper, we will review the use of Lie group methods to perform this constrained optimization. Instead of searching in the space of n × n matrices W, we will introduce the concept of the Lie group SO(n) of orthogonal matrices, and the corresponding Lie algebra so(n). Using so(n) for our coordinates, we can multiplicatively update W by a rotation matrix R so that W′ = RW always remains orthogonal. Steepest descent and conjugate gradient algorithms can be used in this framework.

.
(2004) - 'Optimization using fourier expansion over a geodesic for non-negative ICA'.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3195, pp. 49-56.
#### Abstract

We propose a new algorithm for the non-negative ICA problem, based on the rotational nature of optimization over a set of square orthogonal (orthonormal) matrices W, i.e. where WTW = WWT = In. Using a truncated Fourier expansion of J(t), we obtain a Newton-like update step along the steepest-descent geodesic, which automatically approximates to a usual (Taylor expansion) Newton update step near to a minimum. Experiments confirm that this algorithm is effective, and it compares favourably with existing non-negative ICA algorithms. We suggest that this approach could modified for other algorithms, such as the normal ICA task. © Springer-Verlag 2004.

.
(2004) - 'Polyphonic transcription by non-negative sparse coding of power spectra.'. ISMIR, . (2004)
- 'Causal Tempo Tracking of Audio.'. ISMIR, . (2004)
- 'Fast labelling of notes in music signals.'. ISMIR, . (2004)
- 'Matching live sources with physical models'. Proc. of the 6th Int. Conference on Digital Audio Effects (DAFx-03), London, UK, September 8-11, 2003, pp. 305-307. . (2003)
- 'Automatic music transcription and audio source separation'. TAYLOR & FRANCIS INC CYBERNETICS AND SYSTEMS, PAISLEY, SCOTLAND: International Conference on Soft Computing and Intelligent Systems for Industry (SOCO/ISFI) 33 (6), pp. 603-627. . (2002)
- 'Identification of dental bacteria using statistical and neural approaches'.
Proceedings of the 9th International Conference on Neural Information Processing (ICONIP '02) 2, pp. 606-610.
#### Abstract

This paper is devoted to enhancing rapid decision-making and identification of lactobacilli from dental plaque using statistical and neural network methods. Current techniques of identification such as clustering and principal component analysis are discussed with respect to the field of bacterial taxonomy. Decision-making using multilayer perceptron neural network and Kohonen self-organizing feature map is highlighted. Simulation work and corresponding results are presented with main emphasis on neural network convergence and identification capability using resubstitution, leave-one-out and cross validation techniques. Rapid analyses on two separate sets of bacterial data from dental plaque revealed accuracy of more than 90% in the identification process. The risk of misdiagnosis was estimated at 14% worst case. Test with unknown strains yields close correlation to cluster dendograms. The use of the AXEON VindAX simulator indicated close correlations of the results. The paper concludes that artificial neural networks are suitable for use in the rapid identification of dental bacteria.

.
(2002) - 'On-Line Connectionist Q-Learning Produces Unreliable Performance with A Synonym Finding Task.'. IJCNN (3), , pp. 451-458. . (2000)
- 'Maximizing information about a noisy signal with a single non-linear neuron.'. INST ELECTRICAL ENGINEERS INSPEC INC NINTH INTERNATIONAL CONFERENCE ON ARTIFICIAL NEURAL NETWORKS (ICANN99), VOLS 1 AND 2, UNIV EDINBURGH, EDINBURGH, SCOTLAND: 9th International Conference on Artificial Neural Networks (ICANN99) (470), pp. 581-586. . (1999)
- 'Communications and neural networks: Theory and practice'.
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 1, pp. 135-138.
#### Abstract

In this paper we shall see that neural networks and communications are interlinked in a number of ways, towards the goal of efficient communication of information. One concrete example of this is the use of neural networks to ensure efficient use of communication channels, through connection admission control in ATM networks. In addition, however, efficient communication is also important within a decision making system such as a neural network. Finally we examine what type of neural network solutions are suggested by this approach.

.
(1997) - 'Information processing in negative feedback neural networks'. IOP PUBLISHING LTD NETWORK-COMPUTATION IN NEURAL SYSTEMS, UNIV STIRLING, STIRLING, SCOTLAND: Workshop on Information Theory and the Brain 7 (2), pp. 301-305. . (1996)
- 'Information theory and neural network learning algorithms'. Institute of Physics Publishing
Neural Computing Research and Applications, Proceedings of the Second Irish Neural Networks Conference, Queen's University, Belfast, Northern Ireland, 25-26 June 1992, pp. 145-155.
#### Abstract

There have been a number of recent papers on information theory and neural networks, especially in a perceptual system such as vision.� Some of these approaches are examined, and their implications for neural network learning algorithms are considered.� Existing supervised learning algorithms such as Back Propagation to minimize mean squared error can be viewed as attempting to minimize an upper bound on information loss.� By making an assumption of noise either at the input or the output to the system, unsupervised learning algorithms such as those based on Hebbian (principal component analysing) or anti-Hebbian (decorrelating) approaches can also be viewed in a similar light.� The optimization of information by the use of interneurons to decorrelate output units suggests a role for inhibitory interneurons and cortical loops in biological sensory systems.

.
(1993) - 'Hebbian/anti-Hebbian network which optimizes information capacity by orthonormalizing the principal subspace'.
IEE Conference Publication, (372), pp. 86-90.
#### Abstract

A number of recent papers have used the approach of maximizing information capacity on mutual information(MI) to examine unsupervised neural networks. In this paper we extend this work to develop an algorithm for the case of both input and output noise, with an output power constraint. We find that it is possible to simplify the obvious algorithm obtained by concatenating the two previous solutions.

.
(1993) - 'Direct Approaches to Improving the Robustness of Multilayer Neural Networks'. Amsterdam : North-Holland
Artificial Neural Networks, 2: Proceedings of the 1992 International Conference on Artificial Neural Networks (ICANN–92), Brighton, United Kingdom, 4–7 September, 1992, pp. 1063-1066.
#### Abstract

Abstract Multilayer neural networks trained with backpropagation are in general not robust against the loss of a hidden neuron. In this paper we define a form of robustness called 1-node robustness and propose methods to improve it. One approach is based on a modification of the error function by the addition of a "robustness error". It leads to more robust networks but at the cost of a reduced accuracy. A second approach, "pruning-and-duplication", consists of duplicating the neurons whose loss is the most damaging for the network. Pruned neurons are used for the duplication. This procedure leads to robust and accurate networks at low computational cost. It may also prove benefical for generalisation. Both methods are evaluated on the XOR function.

.
(1992) - 'The effect of receptor signal-to-noise levels on optimal filtering in a sensory system'.
Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing, 4, pp. 2321-2324.
#### Abstract

Consideration is given to image filtering (temporal and spatial) in a neural system for transmitting images through a limited capacity channel, in the case of a noisy image at the receptors. The authors use an extension of Shannon's formula for the capacity of a Gaussian channel to determine the optimum filter to be used. For realistic image statistics, they show that the bandwidth of this filter is self-limiting, and it has a high frequency boost that disappears at low signal levels. This behavior is mirrored in biological retinas.

.
(1991) - 'Sensory adaptation: An information-theoretic viewpoint'.
IJCNN Int Jt Conf Neural Network,
#### Abstract

Summary form only given. The authors examine the goals of early stages of a perceptual system, before the signal reaches the cortex, and describe its operation in information-theoretic terms. The effects of receptor adaptation, lateral inhibition, and decorrelation can all be seen as part of an optimization of information throughput, given that available resources such as average power and maximum firing rates are limited. The authors suggest a modification to Gabor functions which improves their performance as band-pass filters.

.
(1989)

### Books

- . (2007) Independent Component Analysis and Signal Separation, 7th International Conference, ICA 2007, London, UK, September 9-12, 2007.. Springer 4666

### Book chapters

- 'Sound Source Separation'. in Zölzer U (ed.)
*DAFX: Digital Audio Effects*Chichester, UK : John Wiley & Sons, Ltd Article number 14 , pp. 551-588.
.
(2011) - 'Probabilistic modeling paradigms for audio source separation'. in (ed.)
*Machine Audition: Principles, Algorithms and Systems*, pp. 162-185.#### Abstract

Most sound scenes result from the superposition of several sources, which can be separately perceived and analyzed by human listeners. Source separation aims to provide machine listeners with similar skills by extracting the sounds of individual sources from a given scene. Existing separation systems operate either by emulating the human auditory system or by inferring the parameters of probabilistic sound models. In this chapter, the authors focus on the latter approach and provide a joint overview of established and recent models, including independent component analysis, local time-frequency models and spectral template-based models. They show that most models are instances of one of the following two general paradigms: linear modeling or variance modeling. They compare the merits of either paradigm and report objective performance figures. They also,conclude by discussing promising combinations of probabilistic priors and inference algorithms that could form the basis of future state-of-the-art systems. © 2011, IGI Global.

.
(2010) - 'Audio source separation using sparse representations'. in (ed.)
*Machine Audition: Principles, Algorithms and Systems*, pp. 246-264.#### Abstract

The authors address the problem of audio source separation, namely, the recovery of audio signals from recordings of mixtures of those signals. The sparse component analysis framework is a powerful method for achieving this. Sparse orthogonal transforms, in which only few transform coefficients differ significantly from zero, are developed; once the signal has been transformed, energy is apportioned from each transform coefficient to each estimated source, and, finally, the signal is reconstructed using the inverse transform. The overriding aim of this chapter is to demonstrate how this framework, as exemplified here by two different decomposition methods which adapt to the signal to represent it sparsely, can be used to solve different problems in different mixing scenarios. To address the instantaneous (neither delays nor echoes) and underdetermined (more sources than mixtures) mixing model, a lapped orthogonal transform is adapted to the signal by selecting a basis from a library of predetermined bases. This method is highly related to the windowing methods used in the MPEG audio coding framework. In considering the anechoic (delays but no echoes) and determined (equal number of sources and mixtures) mixing case, a greedy adaptive transform is used based on orthogonal basis functions that are learned from the observed data, instead of being selected from a predetermined library of bases. This is found to encode the signal characteristics, by introducing a feedback system between the bases and the observed data. Experiments on mixtures of speech and music signals demonstrate that these methods give good signal approximations and separation performance, and indicate promising directions for future research. © 2011, IGI Global.

.
(2010) - 'Non-negative mixtures'. in (ed.)
*Handbook of Blind Source Separation*, pp. 515-547.#### Abstract

This chapter discusses some algorithms for the use of non-negativity constraints in unmixing problems, including positive matrix factorization, nonnegative matrix factorization (NMF), and their combination with other unmixing methods such as non-negative independent component analysis and sparse non-negative matrix factorization. The 2D models can be naturally extended to multiway array (tensor) decompositions, especially non-negative tensor factorization (NTF) and non-negative tucker decomposition (NTD). The standard NMF model has been extended in various ways, including semi-NMF, multilayer NMF, tri-NMF, orthogonal NMF, nonsmooth NMF, and convolutive NMF. When gradient descent is a simple procedure, convergence can be slow, and the convergence can be sensitive to the step size. This can be overcome by applying multiplicative update rules, which have proved particularly popular in NMF. These multiplicative update rules have proved to be attractive since they are simple, do not need the selection of an update parameter, and their multiplicative nature, and non-negative terms on the RHS ensure that the elements cannot become negative. © 2010 Elsevier Ltd. All rights reserved.

.
(2010) - 'Information Theory and Neural Networks'. 51, pp. 307-340. . (1993)

### Reports

- Predictive Information Rate in Discrete-time Gaussian Processes.
#### Abstract

We derive expressions for the predicitive information rate (PIR) for the class of autoregressive Gaussian processes AR(N), both in terms of the prediction coefficients and in terms of the power spectral density. The latter result suggests a duality between the PIR and the multi-information rate for processes with mutually inverse power spectra (i.e. with poles and zeros of the transfer function exchanged). We investigate the behaviour of the PIR in relation to the multi-information rate for some simple examples, which suggest, somewhat counter-intuitively, that the PIR is maximised for very `smooth' AR processes whose power spectra have multiple poles at zero frequency. We also obtain results for moving average Gaussian processes which are consistent with the duality conjectured earlier. One consequence of this is that the PIR is unbounded for MA(N) processes.

.
(2012) - A measure of statistical complexity based on predictive information.
#### Abstract

We introduce an information theoretic measure of statistical structure, called 'binding information', for sets of random variables, and compare it with several previously proposed measures including excess entropy, Bialek et al.'s predictive information, and the multi-information. We derive some of the properties of the binding information, particularly in relation to the multi-information, and show that, for finite sets of binary random variables, the processes which maximises binding information are the 'parity' processes. Finally we discuss some of the implications this has for the use of the binding information as a measure of complexity.

.
(2010) - Polar Polytopes and Recovery of Sparse Representations. . (2005)