Skip to Main content Skip to Navigation
Theses

Toward universal speech synthesis : harnessing linguistic and stylistic embeddings for expertise-free and flexible systems

Abstract : Text-to-speech synthesis (TTS) turns a written text into an audio speech signal. Many commercial systems rely on human linguistic expertise, while being limited to synthesize speech for a single speaker voice and speaking style. For speech synthesis to become universal in its usage and abilities, it must be easily customizable while being able to produce widely varied speech. The goal of this thesis is two-fold. 1) To study whether it is possible alleviate the need for human linguistic expertise to build or modify a TTS system. 2) To study whether it is possible to produce speech corresponding to different speakers, with their respective tone and regionalism accent. This manuscript presents three contributions. First, we show that the embedding property of neural networks can be used to lower the amount of expertise in unit selection speech synthesis. Second,we show that character embeddings can remove all linguistic expertise for end-to-end systems. Finally, we attempt to explicitly model speaker and accent characteristics in order to build a multi-speaker multi-accent end-to-end speech synthesis system.
Document type :
Theses
Complete list of metadata

https://tel.archives-ouvertes.fr/tel-03343065
Contributor : Abes Star :  Contact
Submitted on : Monday, September 13, 2021 - 6:59:09 PM
Last modification on : Wednesday, September 15, 2021 - 3:28:39 AM

File

2021ISAR0004_PERQUIN_Antoine_T...
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-03343065, version 1

Citation

Antoine Perquin. Toward universal speech synthesis : harnessing linguistic and stylistic embeddings for expertise-free and flexible systems. Computation and Language [cs.CL]. INSA de Rennes, 2021. English. ⟨NNT : 2021ISAR0004⟩. ⟨tel-03343065⟩

Share

Metrics

Record views

81

Files downloads

15