Műegyetemi Digitális Archívum

Improving Naturalness of Neural-based TTS System Trained with Arabic Limited Data

Date

Type

Konferenciaközlemény

Language

en

Reading access rights:

Open access

Rights Holder

Szerző

Conference Date

2023.02.07

Conference Place

Budapest

Conference Title

1st Workshop on Intelligent Infocommunication Networks, Systems and Services (WI2NS2)

ISBN, e-ISBN

978-963-421-902-6

Container Title

1st Workshop on Intelligent Infocommunication Networks, Systems and Services

Version

Post print

Faculty

Faculty of Electrical Engineering and Informatics

First Page

71

Subject (OSZKAR)

Text-to-speech
TTS
Machine Learning
Deep Learning
Deep Neural Networks
Speech Synthesis

Gender

Konferenciacikk

University

Budapest University of Technology and Economics

OOC works

Abstract

In this paper, we investigated different approaches, a neural network speech synthesis system and a non-autoregressive text-to-speech (TTS) model. In the neural network speech synthesis, we showed how a baseline system based on Merlin is used for TTS synthesis to produce the most human-like voice; typically, it is only implemented with a front-end text processor and a WORLD vocoder. Here, we first adapted Continuous and Ahocoder vocoders; and then we investigated the effectiveness of each vocoder’s techniques to produce the highest quality speech. In the non-autoregressive TTS model, we implemented the state-of-the-results Fastspeech2 system, which provided high-quality speech synthesis in a timely manner without controllability and robustness problems. Here, we focused on integrating a different language but with limited data while maintaining its high-quality produced sounds. Through objective and subjective evaluations, we verify that our method can outperform the baseline system with full data.

Description

Keywords