
Gabriel Bibbó
Academic and research departments
Centre for Vision, Speech and Signal Processing (CVSSP), Faculty of Engineering and Physical Sciences.About
Biography
Gabriel Bibbó is an R&D Engineer with the AI for Sound group at the University of Surrey's CVSSP. He holds a BSc in Electrical Engineering (UdelaR) focused on DSP and embedded systems, and an MSc in Sound & Music Computing from UPF Barcelona, where his research on harmonic mixing was published (Springer). Gabriel's industry experience includes developing embedded Python solutions for Bang & Olufsen (B&O) audio and automation systems at Ikatu, along with roles related to Google and KPMG. This practical engineering background, combined with his experience as a DJ, informs his current work on Sound Event Detection, embedded AI robustness, and ethical AI. He aims to develop interactive, real-time AI music generation tools.
University roles and responsibilities
- Fire warden
My qualifications
Previous roles
Affiliations and memberships
ResearchResearch interests
Audio signal processing, Audio classification; Compression of CNNs; Signal processing; Machine learning; Sustainable AI: AI Model compression
Research interests
Audio signal processing, Audio classification; Compression of CNNs; Signal processing; Machine learning; Sustainable AI: AI Model compression
Supervision
Postgraduate research supervision
MSc students:
(Co-supervisor, Primary Supervisor: Prof Mark D Plumbley)
- Sam Watts, Dissertation title: Specialized Sound Event Detection for Real-World Environments
Sustainable development goals
My research interests are related to the following:



Publications
This paper presents a residential audio dataset to support sound event detection research for smart home applications aimed at promoting wellbeing for older adults. The dataset is constructed by deploying audio recording systems in the homes of 8 participants aged 55-80 years for a 7-day period. Acoustic characteristics are documented through detailed floor plans and construction material information to enable replication of the recording environments for AI model deployment. A novel automated speech removal pipeline is developed, using pre-trained audio neural networks to detect and remove segments containing spoken voice, while preserving segments containing other sound events. The resulting dataset consists of privacy-compliant audio recordings that accurately capture the sound-scapes and activities of daily living within residential spaces. The paper details the dataset creation methodology, the speech removal pipeline utilizing cascaded model architectures, and an analysis of the vocal label distribution to validate the speech removal process. This dataset enables the development and benchmarking of sound event detection models tailored specifically for in-home applications.
Convolutional neural networks (CNNs) have exhibited state-of-the-art performance in various audio classification tasks. However, their real-time deployment remains a challenge on resource-constrained devices like embedded systems. In this paper, we present a demonstration of our standalone hardware device designed for real-time recognition of sound events commonly known as audio tagging. Our system incorporates a real-time implementation of a CNN-based pre-trained audio neural networks (PANNs) on an embedded hardware device, Raspberry Pi. We refer to our standalone device as "PiSoundSensing" system, which makes sense of surrounding sounds using a Raspberry Pi based hardware. Users can interact with the system through a physical button or using an online web interface. The web interface allows users to remotely control the standalone device, and visualize sound events detected over time. We provide a detailed description of the hardware and software used to build PiSoundSensing device. Also, we highlight useful observations including hardware-based standalone device performance compared to that of the software-based performance.
Convolutional neural networks (CNNs) have exhibited state-of-the-art performance in various audio classification tasks. However, their real-time deployment remains a challenge on resource constrained devices such as embedded systems. In this paper, we analyze how the performance of large-scale pre-trained audio neural networks designed for audio pattern recognition changes when deployed on a hardware such as a Raspberry Pi. We empirically study the role of CPU temperature, microphone quality and audio signal volume on performance. Our experiments reveal that the continuous CPU usage results in an increased temperature that can trigger an automated slowdown mechanism in the Raspberry Pi, impacting inference latency. The quality of a microphone, specifically with affordable devices such as the Google AIY Voice Kit, and audio signal volume, all affect the system performance. In the course of our investigation, we encounter substantial complications linked to library compatibility and the unique processor architecture requirements of the Raspberry Pi, making the process less straightforward compared to conventional computers (PCs). Our observations, while presenting challenges, pave the way for future researchers to develop more compact machine learning models, design heat-dissipative hardware, and select appropriate microphones when AI models are deployed for real-time applications on edge devices.
DJ track selection can benefit from software-generated recommendations that optimise harmonic transitions. Emerging techniques (such as Tonal Interval Vectors) enable the definition of new metrics for harmonic compatibility (HC) estimation that improve the performance of existing applications. Thus, the aim of this study is to provide the DJ with a new tool to improve his/her musical selections. We present a software package that can estimate the HC between digital music recordings, with a particular focus on modern dance music and the workflow of the DJ. The user must define a target track for which the calculation is to be made, and obtains the HC values expressed as a percentage with respect to each track in the music collection. The system also calculates a pitch transposition interval for each candidate track that, if applied, maximizes the HC with respect to the target track. Its graphical user interface allows the user to easily run it simultaneously with the DJ software of choice during live performances. The system, tested with musically experienced users, generates pitch transposition suggestions that improve mixes in 73.7% of cases.
Indoor soundscapes significantly impact wellbeing, yet methodologies for understanding their perception among older adults remain underdeveloped. This paper presents Soundscape Experience Mapping (SEM), combining ecological momentary assessment with participatory design methods to capture and analyse indoor acoustic environments. Through structured listening activities and audio data collection, participants document their acoustic experiences in context. Our pilot study engaged eight older adults (57+) in a town in Belgium, collecting continuous audio recordings and qualitative data over one week. Using momentary judgements , retrospective evaluations, and sound journalling, we gained insights into how older adults perceive their indoor soundscapes. The method produced findings on the personal control of sound environments, soundscape preferences, and how situational factors influence acoustic perception. Participants demonstrated agency in curating their sonic environments, while expressing frustration with uncontrollable sounds. Daily routines and domestic rhythms emerged as key contextual factors shaping sound-scape experiences. This work advances AI-assisted indoor soundscape design by providing evidence-based methods to understand occupant needs, particularly for older adults who could benefit from tailored acoustic environments.
Poor workplace soundscapes can negatively impact productivity and employee satisfaction. While current regulations and physical acoustic treatments are beneficial, the potential of AI sound systems to enhance worker wellbeing is not fully explored. This paper investigates the use of AI-enabled sound technologies in workplaces, aiming to boost wellbeing and productivity through a soundscape approach while addressing user concerns. To evaluate these systems, we used scenario-based design and focus groups with knowledge workers from open-plan offices and those working remotely. Participants were presented with initial design concepts for AI sound analysis and control systems. This paper outlines user requirements and recommendations gathered from these focus groups, with a specific emphasis on soundscape personalisation and the creation of relevant datasets.