Matthew Vowels

Associate Lecturer in Electroacoustics and PhD Researcher in AI
M.S., MSc, BMus (Hons) Tonmeister


My research project


Research interests


Matthew Vowels, Necati Cihan Camgöz, Richard Bowden (2020)Nested VAE:Isolating Common Factors via Weak Supervision, In: 15th IEEE International Conference on Automatic Face and Gesture Recognition

Fair and unbiased machine learning is an important and active field of research, as decision processes are increasingly driven by models that learn from data. Unfortunately, any biases present in the data may be learned by the model, thereby inappropriately transferring that bias into the decision making process. We identify the connection between the task of bias reduction and that of isolating factors common between domains whilst encouraging domain specific invariance. To isolate the common factors we combine the theory of deep latent variable models with information bottleneck theory for scenarios whereby data may be naturally paired across domains and no additional supervision is required. The result is the Nested Variational AutoEncoder (NestedVAE). Two outer VAEs with shared weights attempt to reconstruct the input and infer a latent space, whilst a nested VAE attempt store construct the latent representation of one image,from the latent representation of its paired image. In so doing,the nested VAE isolates the common latent factors/causes and becomes invariant to unwanted factors that are not shared between paired images. We also propose a new metric to provide a balanced method of evaluating consistency and classifier performance across domains which we refer to as the Adjusted Parity metric. An evaluation of Nested VAE on both domain and attribute invariance, change detection,and learning common factors for the prediction of biological sex demonstrates that NestedVAE significantly outperforms alternative methods.

Matthew Vowels, Necati Cihan Camgöz, Richard Bowden (2020)Gated Variational AutoEncoders: Incorporating Weak Supervision to Encourage Disentanglement, In: 15th IEEE International Conference on Automatic Face and Gesture Recognition

Variational AutoEncoders (VAEs) provide a means to generate representational latent embeddings. Previous research has highlighted the benefits of achieving representations that are disentangled, particularly for downstream tasks. However, there is some debate about how to encourage disentanglement with VAEs, and evidence indicates that existing implementations do not achieve disentanglement consistently. The evaluation of how well a VAE’s latent space has been disentangled is often evaluated against our subjective expectations of which attributes should be disentangled for a given problem. Therefore, by definition, we already have domain knowledge of what should be achieved and yet we use unsupervised approaches to achieve it. We propose a weakly supervised approach that incorporates any available domain knowledge into the training process to form a Gated-VAE. The process involves partitioning the representational embedding and gating backpropagation. All partitions are utilised on the forward pass but gradients are backpropagated through different partitions according to selected image/target pairings. The approach can be used to modify existing VAE models such as beta-VAE, InfoVAE and DIP-VAE-II. Experiments demonstrate that using gated backpropagation, latent factors are represented in their intended partition. The approach is applied to images of faces for the purpose of disentangling head-pose from facial expression. Quantitative metrics show that using Gated-VAE improves average disentanglement, completeness and informativeness, as compared with un-gated implementations. Qualitative assessment of latent traversals demonstrate its disentanglement of head-pose from expression, even when only weak/noisy supervision is available.

MATTHEW VOWELS, NECATI CIHAN CAMGOZ, RICHARD BOWDEN (2021)Targeted VAE: Variational and Targeted Learning for Causal Inference

—Undertaking causal inference with observational data is incredibly useful across a wide range of tasks including the development of medical treatments, advertisements and marketing, and policy making. There are two significant challenges associated with undertaking causal inference using observational data: treatment assignment heterogeneity (i.e., differences between the treated and untreated groups), and an absence of counterfactual data (i.e., not knowing what would have happened if an individual who did get treatment, were instead to have not been treated). We address these two challenges by combining structured inference and targeted learning. In terms of structure, we factorize the joint distribution into risk, confounding, instrumental, and miscellaneous factors, and in terms of targeted learning, we apply a regularizer derived from the influence curve in order to reduce residual bias. An ablation study is undertaken, and an evaluation on benchmark datasets demonstrates that TVAE has competitive and state of the art performance across.

Disentangled representations support a range of downstream tasks including causal reasoning, generative modeling, and fair machine learning. Unfortunately, disentanglement has been shown to be impossible without the incorporation of supervision or inductive bias. Given that supervision is often expensive or infeasible to acquire, we choose to incorporate structural inductive bias and present an unsupervised, deep State-Space-Model for Video Disentanglement (VDSM). The model disentangles latent time-varying and dynamic factors via the incorporation of hierarchical structure with a dynamic prior and a Mixture of Experts decoder. VDSM learns separate disentangled representations for the identity of the object or person in the video, and for the action being performed. We evaluate VDSM across a range of qualitative and quantitative tasks including identity and dynamics transfer, sequence generation, Fréchet Inception Distance, and factor classification. VDSM achieves state-of-the-art performance and exceeds adversarial methods, even when the methods use additional supervision.

Matthew Vowels, Russell Mason (2020)Comparison of pairwise dissimilarity and projective mapping tasks with auditory stimuli, In: Journal of the Audio Engineering Society Audio Engineering Society

Two methods for undertaking subjective evaluation were compared: a pairwise dissimilarity task (PDT) and a projective mapping task (PMT). For a set of unambiguous, synthetic, auditory stimuli the aim was to determine: whether the PMT limits the recovered dimensionality to two dimensions; how subjects respond using PMT’s two-dimensional response format; the relative time required for PDT and PMT; and hence whether PMT is an appropriate alternative to PDT for experiments involving auditory stimuli. The results of both Multi-Dimensional Scaling (MDS) analyses and Multiple Factor Analyses (MFA) indicate that, with multiple participants, PMT allows for the recovery of three meaningful dimensions. The results from the MDS and MFA analyses of the PDT data, on the other hand, were ambiguous and did not enable recovery of more than two meaningful dimensions. This result was unexpected given that PDT is generally considered not to limit the dimensionality that can be recovered. Participants took less time to complete the experiment using PMT compared to PDT (a median ratio of approximately 1:4), and employed a range of strategies to express three perceptual dimensions using PMT’s two-dimensional response format. PMT may provide a viable and efficient means to elicit up to 3-dimensional responses from listeners.

Additional publications