Dynamic Speech Emotion Recognition with State-Space Models

Konstantin Markov; Tomoko Matsui; François Septier; Gareth W. Peters

Communication Dans Un Congrès Année : 2015

Dynamic Speech Emotion Recognition with State-Space Models

(1) , (2) , (3, 4) , (5)

1
2
3
4
5

Konstantin Markov

Fonction : Auteur

The University of Aizu

Tomoko Matsui

Fonction : Auteur

Department of Statistical Modeling, Institute of Statistical Mathematics

François Septier

Fonction : Auteur
PersonId : 5056
IdHAL : francois-septier
ORCID : 0000-0001-5931-6091
IdRef : 128165472

Institut TELECOM/TELECOM Lille1

Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189

Gareth W. Peters

Fonction : Auteur

University College of London [London]

Résumé

Automatic emotion recognition from speech has been focused mainly on identifying categorical or static affect states, but the spectrum of human emotion is continuous and time-varying. In this paper, we present a recognition system for dynamic speech emotion based on state-space models (SSMs). The prediction of the unknown emotion trajectory in the affect space spanned by Arousal, Valence, and Dominance (A-V-D) descriptors is cast as a time series filtering task. The state- space models we investigated include a standard linear model (Kalman filter) as well as novel non-linear, non-parametric Gaussian Processes (GP) based SSM. We use the AVEC 2014 database for evaluation, which provides ground truth A-V-D labels which allows state and measurement functions to be learned separately simplifying the model training. For the filtering with GP SSM, we used two approximation methods: a recently proposed analytic method and Particle filter. All models were evaluated in terms of average Pearson correla- tion R and root mean square error (RMSE). The results show that using the same feature vectors, the GP SSMs achieve twice higher correlation and twice smaller RMSE than a Kalman filter.

Domaines

Traitement du signal et de l'image [eess.SP]

François Septier : Connectez-vous pour contacter le contributeur

https://imt.hal.science/hal-01198424

Soumis le : samedi 12 septembre 2015-15:59:57

Dernière modification le : mercredi 24 janvier 2024-09:54:23

Dates et versions

hal-01198424 , version 1 (12-09-2015)

Identifiants

HAL Id : hal-01198424 , version 1

Citer

Konstantin Markov, Tomoko Matsui, François Septier, Gareth W. Peters. Dynamic Speech Emotion Recognition with State-Space Models. 23rd European Signal Processing Conference (EUSIPCO), Aug 2015, Nice, France. ⟨hal-01198424⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM CNRS CRISTAL CRISTAL-SIGMA UNIV-LILLE IMT-NORD-EUROPE

122 Consultations

0 Téléchargements

Dynamic Speech Emotion Recognition with State-Space Models

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager