Semantic Audio Tools for Radio Production

Chris Baume

PhD awarded 30th April 2018

Download thesis from here.

Radio producers use specialist audio editing tools to browse long sound recordings and edit them into a programme. These tools predominantly use audio waveforms to visually assist users in navigating and editing the audio. However, audio waveforms display very limited information so do not always fulfil the needs of radio producers. We found through a short task-based online study that by colourising waveforms, users were able to perform simple edits with less effort. In this thesis, we investigate whether the radio production process can be improved by developing visualisations and tools that better match the requirements of professional users.

We conducted a brief ethnographic study of radio production techniques in three environments – news, drama and documentaries. We found that radio production is mainly speech-based and often involves written transcripts of the speech. Many producers annotate the transcripts with editorial decisions, then use audio editing software to manually apply those edits to the audio. We identified an opportunity to improve this process by using a semantic audio editor to allow producers to edit the audio directly using a transcript.

To investigate the impact of this approach, we developed a semantic audio editing tool based on automated speech-to-text that integrated with professional radio production systems. We used this tool to conduct a qualitative study comparing a semantic audio editing workflow to the existing process. Our study involved five professional radio producers who used our tool to create programmes that were later broadcast. We found that the automated transcripts were accurate enough to be used for audio editing, but that listening is still an important part of audio production. For long recordings, we found that semantic editing was twice as fast as the existing technique. The participants continued to use our system in the months after the study, and it has since been developed into a supported production tool.

This study also revealed that many producers prefer working with printed transcripts to working on screen. To investigate how paper transcripts are used, we observed five radio producers interacting with printouts of automated transcripts. We found that they use a variety of annotation methods, but underlining and strikethrough are most common. Working with a commercial partner, we developed a pen-based interface to our semantic audio editor. This enabled users to directly edit audio content on paper by making underline and strikethrough marks with a digital pen.

We conducted a qualitative study of eight radio producers to perform a three-way comparison of audio editing using normal printed transcripts, screen-based semantic editing and pen-based semantic editing. The producers scored the normal transcripts as most useful and usable, as it was well-aligned to their existing workflow. The screen-based editor appealed to those who have migrated to a purely digital workflow, and showed potential for being developed into a collaborative editing tool. Producers edited fastest using the pen-based system, and it was popular with those who already use paper, but its limited functionality and simplicity meant it scored least useful and usable.


Funding: The work was fully funded by the British Broadcasting Corporation as part of the BBC Audio Research Partnership with the University of Surrey.

Supervisors: Professor Mark Plumbley, Dr Janko Calic, Professor David Frohlich




Chapter 3: Chris Baume, Mark D. Plumbley, and Janko Ćalić (2015). “Use of audio editors in radio production”. In Proceedings of the 138th Audio Engineering Society Convention.

Chapter 5: Chris Baume, Mark D. Plumbley, Janko Ćalić, and David Frohlich (2018). “A Contextual Study of Semantic Speech Editing in Radio Production”. In International Journal of Human-Computer Studies, in press.

Chapter 6: Chris Baume, Mark D. Plumbley, David Frohlich, and Janko Ćalić (2018). “PaperClip: A Digital Pen Interface for Semantic Speech Editing in Radio Production”. In Journal of the Audio Engineering Society, in press.



Dialogger: A semantic speech editing interface (see Appendix A.1).

Vampeyer: A plugin framework for generating semantic audio visualizations (see Appendix A.2).

BeatMap: A user interface component for navigating audio in web browsers using audio visualization bitmaps (see Appendix A.3).