Regeneration of Ultrasound Tongue Images using Tongue Position Values towards Articulatory-to-Acoustic Mapping
Date
Type
Language
Reading access rights:
Rights Holder
Conference Date
Conference Place
Conference Title
ISBN, e-ISBN
Container Title
Version
Faculty
First Page
Subject (OSZKAR)
DeepLabCut
articulatory-to-acoustic mapping
regeneration
Gender
University
- Cite this item
- https://doi.org/10.3311/WINS2025-006
OOC works
Abstract
Silent Speech Interfaces (SSI) aim to help people communicate by using articulatory data, supporting those who cannot speak aloud or need to communicate in noisy or silent environments. Ultrasound Tongue Imaging (UTI) is a widely used, safe, and cost-effective method for studying tongue movements during speech. UTI provides real-time views of tongue shapes and dynamics, making it useful for Articulation-to-Speech (ATS) synthesis. However, challenges such as probe misalignment, incomplete tongue images, and differences between speakers can affect data quality. Recent advancements, such as neural vocoders like WaveGlow and adaptations of Tacotron2, have improved ATS synthesis, producing more natural and accurate speech. UTI has also been used to enhance text-to-speech (TTS) systems, showing potential for applications like speech training. To address limitations, studies have explored techniques like dynamic time warping (DTW) and data augmentation, aiming to mitigate alignment issues and enrich training datasets. This study investigates the optimal representation of tongue shape using UTI, employing DeepLabCut for pose estimation and tongue image regeneration from tongue position values towards articulatory-to-acoustic mapping (AAM). The findings aim to refine UTI-based speech synthesis systems and expand their applicability.