A Comparison of Data Augmentation Methods on Ultrasound Tongue Images for Articulatory- to-Acoustic Mapping towards Silent Speech Interfaces
| Ibrahimov, Ibrahim | ||
| Gosztolya, Gábor | ||
| Csapó, Tamás Gábor | ||
| 2023-03-13T16:07:08Z | ||
| 2023-03-13T16:07:08Z | ||
| 2023 | ||
AbstractSilent Speech Interfaces (SSI), being a subfield of speech technology, break the limitations of automatic speech recognition when acoustic signals cannot be produced or clearly captured. SSI focuses on the articulation process of speech production in order to map articulatory data into acoustics. Ultrasound tongue imaging (UTI), a non-invasive, clinically safe technique to view the shape, position, and movements of the tongue, has recently become popular in the process of collecting articulatory data of the tongue movement. It has already been shown that data augmentation can be helpful for solving the overfitting problem and improving the generalization ability of deep neural networks. In this paper, we discuss the preliminary implementation and comparison of data augmentation methods on Azerbaijani ultrasound and speech recordings that has been recorded by us. These strategies include consecutive and intermittent time masking, sinusoidal noise injection, and random scaling. We explore the generation of new data samples using the provided methods on the dataset. We use mean-squared error validation loss as an evaluation metric to measure the performance of all the above data augmentation methods. | ||
| http://hdl.handle.net/10890/40712 | ||
| en | ||
| A Comparison of Data Augmentation Methods on Ultrasound Tongue Images for Articulatory- to-Acoustic Mapping towards Silent Speech Interfaces | ||
| Konferenciaközlemény | ||
| Open access | ||
| Szerző | ||
| 2023.02.07 | ||
| Budapest | ||
| 1st Workshop on Intelligent Infocommunication Networks, Systems and Services (WI2NS2) | ||
| 2023.02.07 | ||
| 978-963-421-902-6 | ||
| Budapest University of Technology and Economics | ||
| Online | ||
| 1st Workshop on Intelligent Infocommunication Networks, Systems and Services | ||
| Post print | ||
| Faculty of Electrical Engineering and Informatics | ||
| 59 | ||
| 10.3311/WINS2023-011 | ||
| 64 | ||
| data augmentation | ||
| silent speech interfaces | ||
| ultrasound tongue imaging | ||
| speech technology | ||
| Konferenciacikk | ||
| Budapest University of Technology and Economics |
Files
Original bundle
- Name:
- WINS2023-011.pdf
- Size:
- 2.64 MB
- Format:
- Adobe Portable Document Format
- Description:
- WINS2023-011.pdf