Műegyetemi Digitális Archívum

Regeneration of Ultrasound Tongue Images using Tongue Position Values towards Articulatory-to-Acoustic Mapping

Date

Type

Könyvfejezet

Language

en

Reading access rights:

Open access

Rights Holder

Szerző

Conference Date

2025-02-03

Conference Place

Budapest

Conference Title

3rd Workshop on Intelligent Infocommunication Networks, Systems and Services

ISBN, e-ISBN

978-963-421-982-8

Container Title

3rd Workshop on Intelligent Infocommunication Networks, Systems and Services

Version

Post print

Faculty

Faculty of Electrical Engineering and Informatics

First Page

33

Subject (OSZKAR)

ultrasound tongue imaging
DeepLabCut
articulatory-to-acoustic mapping
regeneration

Gender

Konferenciacikk

University

Budapest University of Technology and Economics

OOC works

Abstract

Silent Speech Interfaces (SSI) aim to help people communicate by using articulatory data, supporting those who cannot speak aloud or need to communicate in noisy or silent environments. Ultrasound Tongue Imaging (UTI) is a widely used, safe, and cost-effective method for studying tongue movements during speech. UTI provides real-time views of tongue shapes and dynamics, making it useful for Articulation-to-Speech (ATS) synthesis. However, challenges such as probe misalignment, incomplete tongue images, and differences between speakers can affect data quality. Recent advancements, such as neural vocoders like WaveGlow and adaptations of Tacotron2, have improved ATS synthesis, producing more natural and accurate speech. UTI has also been used to enhance text-to-speech (TTS) systems, showing potential for applications like speech training. To address limitations, studies have explored techniques like dynamic time warping (DTW) and data augmentation, aiming to mitigate alignment issues and enrich training datasets. This study investigates the optimal representation of tongue shape using UTI, employing DeepLabCut for pose estimation and tongue image regeneration from tongue position values towards articulatory-to-acoustic mapping (AAM). The findings aim to refine UTI-based speech synthesis systems and expand their applicability.

Description

Keywords