Dr Saeid Safavi


Research fellow in adaptive predictive fault detection for connected autonomous systems
PhD
+44 (0)1483 684718
09 BB 01
8:00-17:00

Academic and research departments

Centre for Vision, Speech and Signal Processing (CVSSP).

Biography

Areas of specialism

Machine learning; Speech processing; Music analysis; Para-linguistic processing of speech signals

University roles and responsibilities

  • Research fellow under H2020 project (Audio Commons)

My qualifications

PhD
University of Birmingham
Meng
University of Birmingham

Previous roles

01 November 2015 - 01 November 2017
Research fellow under a H2020 project with the title of "Objective Control of TAlker VErication (OCTAVE)".
University of Hertfordshire
2015 - 2015
Postdoctoral researcher at the University of Birmingham working on a EU funded project.
University of Birmingham

Research

Research interests

Research projects

My publications

Publications

Safavi Saeid, Pearce Andy, Wang Wenwu, Plumbley Mark (2018) Predicting the perceived level of reverberation using machine learning, Proceedings of the 52nd Asilomar Conference on Signals, Systems and Computers (ACSSC 2018) Institute of Electrical and Electronics Engineers (IEEE)
Perceptual measures are usually considered more
reliable than instrumental measures for evaluating the perceived
level of reverberation. However, such measures are time consuming
and expensive, and, due to variations in stimuli or assessors,
the resulting data is not always statistically significant. Therefore,
an (objective) measure of the perceived level of reverberation
becomes desirable. In this paper, we develop a new method to
predict the level of reverberation from audio signals by relating
the perceptual listening test results with those obtained from a
machine learned model. More specifically, we compare the use of
a multiple stimuli test for within and between class architectures
to evaluate the perceived level of reverberation. An expert set
of 16 human listeners rated the perceived level of reverberation
for a same set of files from different audio source types. We
then train a machine learning model using the training data
gathered for the same set of files and a variety of reverberation
related features extracted from the data such as reverberation
time, and direct to reverberation ratio. The results suggest that
the machine learned model offers an accurate prediction of the
perceptual scores.
Safavi Saeid, Wang Wenwu, Plumbley Mark, Choobbasti Ali Janalizadeh, Fazekas George (2018) Predicting the Perceived Level of Reverberation using Features from Nonlinear Auditory Model, Proceedings of the 23rd FRUCT conference pp. 527-531 Institute of Electrical and Electronics Engineers (IEEE)
Perceptual measurements have typically been recognized
as the most reliable measurements in assessing perceived
levels of reverberation. In this paper, a combination of blind
RT60 estimation method and a binaural, nonlinear auditory
model is employed to derive signal-based measures (features)
that are then utilized in predicting the perceived level of reverberation.
Such measures lack the excess of effort necessary for
calculating perceptual measures; not to mention the variations
in either stimuli or assessors that may cause such measures to
be statistically insignificant. As a result, the automatic extraction
of objective measurements that can be applied to predict the
perceived level of reverberation become of vital significance.
Consequently, this work is aimed at discovering measurements
such as clarity, reverberance, and RT60 which can automatically
be derived directly from audio data. These measurements along
with labels from human listening tests are then forwarded to a
machine learning system seeking to build a model to estimate
the perceived level of reverberation, which is labeled by an
expert, autonomously. The data has been labeled by an expert
human listener for a unilateral set of files from arbitrary audio
source types. By examining the results, it can be observed that
the automatically extracted features can aid in estimating the
perceptual rates.
Vaheb Amir, Choobbasti Ali Janalizadeh, Najafabadi S. H. E. Mortazavi, Safavi Saeid (2018) Investigating Language Variability on the Performance of Speaker Verification Systems, Proceedings of the 20th International Conference on Speech and Computer (SPECOM 2018). Lecture Notes in Computer Science 11096 pp. 718-727 Springer Nature Switzerland

In recent years, speaker verification technologies have received an extensive amount of attention. Designing and developing machines that could communicate with humans are believed to be one of the primary motivations behind such developments. Speaker verification technologies are applied to numerous fields such as security, Biometrics, and forensics.

In this paper, the authors study the effects of different languages on the performance of the automatic speaker verification (ASV) system. The MirasVoice speech corpus (MVSC), a bilingual English and Farsi speech corpus, is used in this study. This study collects results from both an I-vector based ASV system and a GMM-UBM based ASV system. The experimental results show that a mismatch between the enrolled data used for training and verification data can lead to a significant decrease in the overall system efficiency. This study shows that it is best to use an i-vector based framework with data from the English language used in the enrollment phase to improve the robustness of the ASV systems. The achieved results in this study indicate that this can narrow the degradation gap caused by the language mismatch.

Choobbasti Ali Janalizadeh, Gholamian Mohammad Erfan, Vaheb Amir, Safavi Saeid (2018) JSpeech: a multi-lingual conversational speech corpus, Proceedings of the 2018 IEEE Spoken Language Technology Workshop (SLT 2018) Institute of Electrical and Electronics Engineers (IEEE)
Speech processing, automatic speech and speaker recognition are the major area of interests in the field of computational linguistics. Research and development of computer and human interaction, forensic technologies and dialogue systems have been the motivating factor behind this interest. In this paper, JSpeech is introduced, a multilingual corpus. This corpus contains 1332 hours of conversational speech from 47 different languages. This corpus can be used in a variety of studies, created from 106 public chat group the effect of language variability on the performance of speaker recognition systems and automatic language detection. To this end, we include speaker verification results obtained for this corpus using a state of the art method based on 3D convolutional neural network.