Műegyetemi Digitális Archívum

A Hybrid Algorithm for Robust Pitch Estimation in Emotional Speech Synthesis

Zineb, Hammadi
Al-Radhi, Mohammed Salah
2025-02-20T13:52:05Z
2025-02-20T13:52:05Z
2025

Abstract

Emotional intelligence in synthetic speech remains a critical challenge in human-machine interaction, despite significant advances in speech synthesis naturalness and intelligibility. Current systems struggle to accurately capture the nuanced emotional expressions characteristic of human speech, including rapid pitch transitions, wide frequency variations, and irregular vibrato patterns. While pitch estimation algorithms like PESTO and FCPE have proven effective for standard speech, their performance on emotional content remains largely unexplored. We present ESCAPE (Emotion Self-Supervised ContextAware Pitch Estimation), a novel algorithm specifically designed for emotional speech processing. ESCAPE synthesizes PESTO's precise frequency variation handling with FCPE's context-aware processing through a hybrid architecture that achieves robust pitch tracking in expressive vocal content. Our approach maintains computational efficiency while excelling at capturing complex acoustic patterns unique to emotional utterances. This paper provides the first comprehensive evaluation of PESTO and FCPE on emotional speech datasets and introduces ESCAPE as a transformative solution for pitch estimation in emotionally expressive speech synthesis. Our results demonstrate significant progress toward bridging the gap between human-like emotional expression and machine-generated speech, marking an important advancement in emotional speech synthesis technology.

http://hdl.handle.net/10890/58920
en
A Hybrid Algorithm for Robust Pitch Estimation in Emotional Speech Synthesis
Könyvfejezet
Open access
Szerző
2025-02-03
Budapest
3rd Workshop on Intelligent Infocommunication Networks, Systems and Services
2025-02-20
978-963-421-982-8
Budapest University of Technology and Economics
Budapest
3rd Workshop on Intelligent Infocommunication Networks, Systems and Services
Post print
Faculty of Electrical Engineering and Informatics
81
10.3311/WINS2025-014
86
Synthetic speech
Human-machine interaction
Pitch transitions
Frequency variations
Konferenciacikk
Budapest University of Technology and Economics

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
WINS_14_d.pdf
Size:
403.48 KB
Format:
Adobe Portable Document Format
Description:
WINS_14_d.pdf