Academic and research departmentsDepartment of Music and Media, Centre for Vision, Speech and Signal Processing (CVSSP).
Statistics, machine learning, digital signal processing, deep learning, artificial intelligence, music and audio, psychology, mental health, engineering, computer vision.
Two methods for undertaking subjective evaluation were compared: a pairwise dissimilarity task (PDT) and a projective mapping task (PMT). For a set of unambiguous, synthetic, auditory stimuli the aim was to determine: whether the PMT limits the recovered dimensionality to two dimensions; how subjects respond using PMT’s two-dimensional response format; the relative time required for PDT and PMT; and hence whether PMT is an appropriate alternative to PDT for experiments involving auditory stimuli. The results of both Multi-Dimensional Scaling (MDS) analyses and Multiple Factor Analyses (MFA) indicate that, with multiple participants, PMT allows for the recovery of three meaningful dimensions. The results from the MDS and MFA analyses of the PDT data, on the other hand, were ambiguous and did not enable recovery of more than two meaningful dimensions. This result was unexpected given that PDT is generally considered not to limit the dimensionality that can be recovered. Participants took less time to complete the experiment using PMT compared to PDT (a median ratio of approximately 1:4), and employed a range of strategies to express three perceptual dimensions using PMT’s two-dimensional response format. PMT may provide a viable and efficient means to elicit up to 3-dimensional responses from listeners.
Disentangled representations support a range of downstream tasks including causal reasoning, generative modeling, and fair machine learning. Unfortunately, disentanglement has been shown to be impossible without the incorporation of supervision or inductive bias. Given that supervision is often expensive or infeasible to acquire, we choose to incorporate structural inductive bias and present an unsupervised, deep State-Space-Model for Video Disentanglement (VDSM). The model disentangles latent time-varying and dynamic factors via the incorporation of hierarchical structure with a dynamic prior and a Mixture of Experts decoder. VDSM learns separate disentangled representations for the identity of the object or person in the video, and for the action being performed. We evaluate VDSM across a range of qualitative and quantitative tasks including identity and dynamics transfer, sequence generation, Fréchet Inception Distance, and factor classification. VDSM achieves state-of-the-art performance and exceeds adversarial methods, even when the methods use additional supervision.
—Undertaking causal inference with observational data is incredibly useful across a wide range of tasks including the development of medical treatments, advertisements and marketing, and policy making. There are two significant challenges associated with undertaking causal inference using observational data: treatment assignment heterogeneity (i.e., differences between the treated and untreated groups), and an absence of counterfactual data (i.e., not knowing what would have happened if an individual who did get treatment, were instead to have not been treated). We address these two challenges by combining structured inference and targeted learning. In terms of structure, we factorize the joint distribution into risk, confounding, instrumental, and miscellaneous factors, and in terms of targeted learning, we apply a regularizer derived from the influence curve in order to reduce residual bias. An ablation study is undertaken, and an evaluation on benchmark datasets demonstrates that TVAE has competitive and state of the art performance across.
Variational AutoEncoders (VAEs) provide a means to generate representational latent embeddings. Previous research has highlighted the benefits of achieving representations that are disentangled, particularly for downstream tasks. However, there is some debate about how to encourage disentanglement with VAEs, and evidence indicates that existing implementations do not achieve disentanglement consistently. The evaluation of how well a VAE’s latent space has been disentangled is often evaluated against our subjective expectations of which attributes should be disentangled for a given problem. Therefore, by definition, we already have domain knowledge of what should be achieved and yet we use unsupervised approaches to achieve it. We propose a weakly supervised approach that incorporates any available domain knowledge into the training process to form a Gated-VAE. The process involves partitioning the representational embedding and gating backpropagation. All partitions are utilised on the forward pass but gradients are backpropagated through different partitions according to selected image/target pairings. The approach can be used to modify existing VAE models such as beta-VAE, InfoVAE and DIP-VAE-II. Experiments demonstrate that using gated backpropagation, latent factors are represented in their intended partition. The approach is applied to images of faces for the purpose of disentangling head-pose from facial expression. Quantitative metrics show that using Gated-VAE improves average disentanglement, completeness and informativeness, as compared with un-gated implementations. Qualitative assessment of latent traversals demonstrate its disentanglement of head-pose from expression, even when only weak/noisy supervision is available.
Fair and unbiased machine learning is an important and active ﬁeld of research, as decision processes are increasingly driven by models that learn from data. Unfortunately, any biases present in the data may be learned by the model, thereby inappropriately transferring that bias into the decision making process. We identify the connection between the task of bias reduction and that of isolating factors common between domains whilst encouraging domain speciﬁc invariance. To isolate the common factors we combine the theory of deep latent variable models with information bottleneck theory for scenarios whereby data may be naturally paired across domains and no additional supervision is required. The result is the Nested Variational AutoEncoder (NestedVAE). Two outer VAEs with shared weights attempt to reconstruct the input and infer a latent space, whilst a nested VAE attempt store construct the latent representation of one image,from the latent representation of its paired image. In so doing,the nested VAE isolates the common latent factors/causes and becomes invariant to unwanted factors that are not shared between paired images. We also propose a new metric to provide a balanced method of evaluating consistency and classiﬁer performance across domains which we refer to as the Adjusted Parity metric. An evaluation of Nested VAE on both domain and attribute invariance, change detection,and learning common factors for the prediction of biological sex demonstrates that NestedVAE signiﬁcantly outperforms alternative methods.
MJ Vowels, KP Mark, LM Vowels, ND Wood (2018) Using spectral and cross-spectral analysis to identify patterns and synchrony in couples’ sexual desire, PloS one 13 (10)
Vowels Matthew, Camgöz Necati Cihan, Bowden Richard (2020) NestedVAE: Isolating Common Factors via Weak Supervision, IEEE Conference on Computer Vision and Pattern Recognition
Peter Hilpert, Timothy R Brick, Christoph Flueckiger, Matthew Vowels, Eva Ceuleman, Peter Kuppens, Laura Sels (2019) What Can Be Learned From Couple Research: Examining Emotional Co-Regulation Processes in Face-to-Face Interactions, Journal of Counselling Psychology
B Ross, J Gale, K Wickrama, J Goetz, MJ Vowels (2019) The Impact of Family Economic Strain On Work-Family Conflict, Marital Support, Marital Quality, and Marital Stability During the Middle Years, Journal of Personal Finance 18 (2)