Dynamic Speech Emotion Recognition with State-Space Models

Abstract : Automatic emotion recognition from speech has been focused mainly on identifying categorical or static affect states, but the spectrum of human emotion is continuous and time-varying. In this paper, we present a recognition system for dynamic speech emotion based on state-space models (SSMs). The prediction of the unknown emotion trajectory in the affect space spanned by Arousal, Valence, and Dominance (A-V-D) descriptors is cast as a time series filtering task. The state- space models we investigated include a standard linear model (Kalman filter) as well as novel non-linear, non-parametric Gaussian Processes (GP) based SSM. We use the AVEC 2014 database for evaluation, which provides ground truth A-V-D labels which allows state and measurement functions to be learned separately simplifying the model training. For the filtering with GP SSM, we used two approximation methods: a recently proposed analytic method and Particle filter. All models were evaluated in terms of average Pearson correla- tion R and root mean square error (RMSE). The results show that using the same feature vectors, the GP SSMs achieve twice higher correlation and twice smaller RMSE than a Kalman filter.
Document type :
Conference papers
Complete list of metadatas

https://hal-imt.archives-ouvertes.fr/hal-01198424
Contributor : François Septier <>
Submitted on : Saturday, September 12, 2015 - 3:59:57 PM
Last modification on : Thursday, October 17, 2019 - 12:35:47 PM

Identifiers

  • HAL Id : hal-01198424, version 1

Citation

Konstantin Markov, Tomoko Matsui, François Septier, Gareth W. Peters. Dynamic Speech Emotion Recognition with State-Space Models. 23rd European Signal Processing Conference (EUSIPCO), Aug 2015, Nice, France. ⟨hal-01198424⟩

Share

Metrics

Record views

217