Gabriel Bibbó

Research Fellow in Sound Sensing

g.bibbo@surrey.ac.uk

Academic and research departments

Centre for Vision, Speech and Signal Processing (CVSSP), Faculty of Engineering and Physical Sciences.

About

Biography

Gabriel Bibbó is an R&D Engineer with the AI for Sound group at the University of Surrey's CVSSP. He holds a BSc in Electrical Engineering (UdelaR) focused on DSP and embedded systems, and an MSc in Sound & Music Computing from UPF Barcelona, where his research on harmonic mixing was published (Springer). Gabriel's industry experience includes developing embedded Python solutions for Bang & Olufsen (B&O) audio and automation systems at Ikatu, along with roles related to Google and KPMG. This practical engineering background, combined with his experience as a DJ, informs his current work on Sound Event Detection, embedded AI robustness, and ethical AI. He aims to develop interactive, real-time AI music generation tools.

University roles and responsibilities

Fire warden

My qualifications

20 December 2017

Electrical Engineering (Specialization in electronics and signal processing)

Universidad de la República

31 August 2021

Master on Sound and Music Computing

Universitat Pompeu Fabra

Previous roles

20 April 2022 - 20 November 2022

Technical Support Engineer

Google Workspace

20 November 2021 - 20 April 2022

IT Auditor

KPMG España

20 April 2016 - 20 September 2019

R&D Engineer

Ikatu / Bang & Olufsen

Affiliations and memberships

Institute of Electrical and Electronics Engineers (IEEE)

IEEE Signal Processing Society

Research

Research interests

Audio signal processing, Audio classification; Compression of CNNs; Signal processing; Machine learning; Sustainable AI: AI Model compression

Supervision

Postgraduate research supervision

MSc students:

(Co-supervisor, Primary Supervisor: Prof Mark D Plumbley)

Sam Watts, Dissertation title: Specialized Sound Event Detection for Real-World Environments

Sustainable development goals

My research interests are related to the following:

Publications

Thomas Edward Deacon, Gabriel Bibbo, Arshdeep Singh, Mark D. Plumbley (2025)Soundscape experience mapping: A deep listening approach for eliciting older adults’ perceptions of indoor soundscapes

Indoor soundscapes significantly impact wellbeing, yet methodologies for understanding their perception among older adults remain underdeveloped. This paper presents Soundscape Experience Mapping (SEM), combining ecological momentary assessment with participatory design methods to capture and analyse indoor acoustic environments. Through structured listening activities and audio data collection, participants document their acoustic experiences in context. Our pilot study engaged eight older adults (57+) in a town in Belgium, collecting continuous audio recordings and qualitative data over one week. Using momentary judgements , retrospective evaluations, and sound journalling, we gained insights into how older adults perceive their indoor soundscapes. The method produced findings on the personal control of sound environments, soundscape preferences, and how situational factors influence acoustic perception. Participants demonstrated agency in curating their sonic environments, while expressing frustration with uncontrollable sounds. Daily routines and domestic rhythms emerged as key contextual factors shaping sound-scape experiences. This work advances AI-assisted indoor soundscape design by providing evidence-based methods to understand occupant needs, particularly for older adults who could benefit from tailored acoustic environments.

Gabriel Bibbó, Thomas Edward Deacon, Arshdeep Singh, Mark D. Plumbley (2024)The Sounds of Home: A Speech-Removed Residential Audio Dataset for Sound Event Detection, In: Proceedings of the 8th International Workshop on Speech Processing in Everyday Environments (CHiME 2024)pp. 49-53 International Speech Communication Association (ISCA)

DOI: 10.21437/CHiME.2024-11

This paper presents a residential audio dataset to support sound event detection research for smart home applications aimed at promoting wellbeing for older adults. The dataset is constructed by deploying audio recording systems in the homes of 8 participants aged 55-80 years for a 7-day period. Acoustic characteristics are documented through detailed floor plans and construction material information to enable replication of the recording environments for AI model deployment. A novel automated speech removal pipeline is developed, using pre-trained audio neural networks to detect and remove segments containing spoken voice, while preserving segments containing other sound events. The resulting dataset consists of privacy-compliant audio recordings that accurately capture the sound-scapes and activities of daily living within residential spaces. The paper details the dataset creation methodology, the speech removal pipeline utilizing cascaded model architectures, and an analysis of the vocal label distribution to validate the speech removal process. This dataset enables the development and benchmarking of sound event detection models tailored specifically for in-home applications.

Gabriel Bibbó, Craig Cieciura, Mark D. Plumbley (2025)Room Acoustics and Microphone Characteristics Show Systematic Impact on Sound Event Recognition, In: Proceedings of the 54th International Congress and Exposition on Noise Control Engineering (INTER-NOISE 2025) International Institute of Noise Control Engineering (I-INCE)

The robustness of audio pattern recognition systems under varying acoustic conditions and hardware remains a critical challenge for real-world applications. We examine how room acoustics, microphone characteristics, and overlapping events affect classification performance for domestic events. We conducted experiments in four rooms at the University of Surrey—with reverberation times (RT60: 0.27–0.78 s, 50 Hz–10 kHz) and clarity indices (C50: 11.6–18.5 dB; C80: 13.1–25.9 dB, 500 Hz–1 kHz)—using four microphones: USB Condenser, ICS-43432 stereo, AudioMoth, and Earthworks M23 reference. For two CNN-14 architectures, baseline performance obtained from the original audio was used for comparison with different microphone/room configurations. Results expressed as the percentage of audio frames correctly detected versus ground truth show: First, high RT60 degraded detection of impulsive events (e.g., door knocks) by approximately 50%, while sustained events (e.g., speech, music) remained above 90%. Second, overlapping events produced masking effects that reduced performance by about 20%. Third, while microphone differences affect accuracy, low-cost devices matched reference performance for speech and music classes. Both CNN-14 architectures exhibited similar degradation patterns across conditions. These results underscore the need for improved acoustic characterization and hardware-aware processing. We suggest that future work should integrate adaptive feature extraction and training strategies to mitigate reverberation and overlap in complex environments.

Rhys Burchett-Vass, Arshdeep Singh, Gabriel Bibbó, Mark D. Plumbley (2025)Integrating IP broadcasting with audio tags: Workflow and challenges, In: Audio Engineering Society Conference on AI and Machine Learning for Audio (AES AIMLA) Audio Engineering Society (AES)

The broadcasting industry has adopted IP technologies, revolutionising both live and pre-recorded content production, from news gathering to live music events. IP broadcasting allows for the transport of audio and video signals in an easily configurable way, aligning with modern networking techniques. This shift towards an IP workflow allows for much greater flexibility, not only in routing signals but with the integration of tools using standard web development techniques. One possible tool could include the use of live audio tagging, which has a number of uses in the production of content. These could include adding sound effects to automated closed captioning or identifying unwanted sound events within a scene. In this paper, we describe the process of containerising an audio tagging model into a microservice, a small segregated code module that can be integrated into a multitude of different network setups. The goal is to develop a modular, accessible, and flexible tool capable of seamless deployment into broadcasting workflows of all sizes, from small productions to large corporations. Challenges surrounding latency of the selected audio tagging model and its effect on the usefulness of the end product are discussed.

Thomas Deacon, Arshdeep Singh, Gabriel Bibbo, Mark D. Plumbley (2024)Soundscape Personalisation at Work: Designing AI-Enabled Sound Technologies for the Workplace, In: Proceedings of the 21st Sound and Music Computing Conferencepp. 116-126

DOI: 10.5281/zenodo.14336495

Poor workplace soundscapes can negatively impact productivity and employee satisfaction. While current regulations and physical acoustic treatments are beneficial, the potential of AI sound systems to enhance worker wellbeing is not fully explored. This paper investigates the use of AI-enabled sound technologies in workplaces, aiming to boost wellbeing and productivity through a soundscape approach while addressing user concerns. To evaluate these systems, we used scenario-based design and focus groups with knowledge workers from open-plan offices and those working remotely. Participants were presented with initial design concepts for AI sound analysis and control systems. This paper outlines user requirements and recommendations gathered from these focus groups, with a specific emphasis on soundscape personalisation and the creation of relevant datasets.

Gabriel Bibbó, Arshdeep Singh, Mark David Plumbley (2023)Recognise and Notify Sound Events using a Raspberry PI based Standalone Device

DOI: 10.5281/zenodo.15465882

Convolutional neural networks (CNNs) have exhibited state-of-the-art performance in various audio classification tasks. However, their real-time deployment remains a challenge on resource-constrained devices like embedded systems. In this paper, we present a demonstration of our standalone hardware device designed for real-time recognition of sound events commonly known as audio tagging. Our system incorporates a real-time implementation of a CNN-based pre-trained audio neural networks (PANNs) on an embedded hardware device, Raspberry Pi. We refer to our standalone device as "PiSoundSensing" system, which makes sense of surrounding sounds using a Raspberry Pi based hardware. Users can interact with the system through a physical button or using an online web interface. The web interface allows users to remotely control the standalone device, and visualize sound events detected over time. We provide a detailed description of the hardware and software used to build PiSoundSensing device. Also, we highlight useful observations including hardware-based standalone device performance compared to that of the software-based performance.

Gabriel Bibbó, Arshdeep Singh, Mark D. Plumbley (2024)Environmental sound classification on an embedded hardware platform, In: INTER-NOISE and NOISE-CON Congress and Conference Proceedings270(5)pp. 6376-6385 Institute of Noise Control Engineering (INCE-USA)

DOI: 10.3397/IN_2024_3723

Convolutional neural networks (CNNs) have exhibited state-of-the-art performance in various audio classification tasks. However, their real-time deployment remains a challenge on resource constrained devices such as embedded systems. In this paper, we analyze how the performance of large-scale pre-trained audio neural networks designed for audio pattern recognition changes when deployed on a hardware such as a Raspberry Pi. We empirically study the role of CPU temperature, microphone quality and audio signal volume on performance. Our experiments reveal that the continuous CPU usage results in an increased temperature that can trigger an automated slowdown mechanism in the Raspberry Pi, impacting inference latency. The quality of a microphone, specifically with affordable devices such as the Google AIY Voice Kit, and audio signal volume, all affect the system performance. In the course of our investigation, we encounter substantial complications linked to library compatibility and the unique processor architecture requirements of the Raspberry Pi, making the process less straightforward compared to conventional computers (PCs). Our observations, while presenting challenges, pave the way for future researchers to develop more compact machine learning models, design heat-dissipative hardware, and select appropriate microphones when AI models are deployed for real-time applications on edge devices.

Gabriel Bibbó, Angel Faraldo (2022)A New Compatibility Measure for Harmonic EDM Mixing, In: Web Engineering: 22nd International Conference, ICWE 2022, Bari, Italy, July 5–8, 2022, Proceedings13362pp. 469-472 Springer Nature

DOI: 10.1007/978-3-031-09917-5_37

DJ track selection can benefit from software-generated recommendations that optimise harmonic transitions. Emerging techniques (such as Tonal Interval Vectors) enable the definition of new metrics for harmonic compatibility (HC) estimation that improve the performance of existing applications. Thus, the aim of this study is to provide the DJ with a new tool to improve his/her musical selections. We present a software package that can estimate the HC between digital music recordings, with a particular focus on modern dance music and the workflow of the DJ. The user must define a target track for which the calculation is to be made, and obtains the HC values expressed as a percentage with respect to each track in the music collection. The system also calculates a pitch transposition interval for each candidate track that, if applied, maximizes the HC with respect to the target track. Its graphical user interface allows the user to easily run it simultaneously with the DJ software of choice during live performances. The system, tested with musically experienced users, generates pitch transposition suggestions that improve mixes in 73.7% of cases.