Vous constatez une erreur ?
NaN:NaN
00:00
For some years, the state-of-the-art in speech synthesis and processing has been dominated by data-driven methods and deep neural networks. The use of ever larger amounts of data allows the exploitation of ever more parameters, leading to ever better results. Unfortunately, the increasing computational complexity hinders the widespread application of these models.
In the first part of the talk, we will present our research into data and computationally efficient voice transformation with deep neural networks. We will introduce the Multi-band Excited WaveNet, a deep neural network that integrates a WaveNet into a classical source-filter model. The discussion will motivate model structure and training losses. We will describe the deficiencies of the proposed model and briefly reflect on perspectives considering the rapidly evolving state of the art in neural vocoding.
The second part will then demonstrate ongoing research into applications of the neural vocoder, combining it with dedicated models for intensity, pitch, expressivity or identity transformation.
Bio: Axel Roebel is director of research IRCAM and head of the Analysis/Synthesis team. His research activities center around voice and music synthesis and transformation with strong focus on artistic and industrial applications. After many years or research into various signal processing algorithms he now has shifted his focus towards data driven methods.
Speech production is a complex motor process involving several physiological phenomena, such as the neural, nervous and muscular activities that drive our respiratory, laryngeal and articulatory movements. Modeling speech production, in par
20 octobre 2022 01 h 01 min
Depuis plus d’un demi-siècle, la théorie source-filtre reste au cœur de la modélisation, de l’analyse et de la synthèse de la voix humaine et de ses expressions, comme la parole et le chant. Dans cette présentation, nous reviendrons sur cet
20 octobre 2022 01 h 05 min
20 octobre 2022 26 min
L’exposé porte sur la prédiction de la forme géométrique du conduit vocal à partir d’une suite de phonèmes. Il commencera présenter les différentes approches qui ont été utilisées par le passé, en particulier celles qui reposent sur l’util
20 octobre 2022 01 h 09 min
Vous constatez une erreur ?
1, place Igor-Stravinsky
75004 Paris
+33 1 44 78 48 43
Du lundi au vendredi de 9h30 à 19h
Fermé le samedi et le dimanche
Hôtel de Ville, Rambuteau, Châtelet, Les Halles
Institut de Recherche et de Coordination Acoustique/Musique
Copyright © 2022 Ircam. All rights reserved.