Műegyetemi Digitális Archívum

Implementing a Text-to-Speech synthesis model on a Raspberry Pi for Industrial Applications

Date

Type

Konferenciaközlemény

Language

en

Reading access rights:

Open access

Rights Holder

Szerző

Conference Date

2023.02.07

Conference Place

Budapest

Conference Title

1st Workshop on Intelligent Infocommunication Networks, Systems and Services (WI2NS2)

ISBN, e-ISBN

978-963-421-902-6

Container Title

1st Workshop on Intelligent Infocommunication Networks, Systems and Services

Version

Post print

Faculty

Faculty of Electrical Engineering and Informatics

First Page

77

Subject (OSZKAR)

Real-time system
speech synthesis
FastSpeech

Gender

Konferenciacikk

University

Budapest University of Technology and Economics

OOC works

Abstract

Text-to-Speech (TTS) produces human-like speech from input text. It has recently acquired prominence by applying deep neural networks. Nowadays, end-to-end TTS models produce highly natural synthesized speech but require extremely high computational resources. Deploying such high-quality TTS models in a real-time environment has been a challenging problem due to the limited resources of embedding systems and cell phones. This paper demonstrated the implementation of an end-to-end TTS model (FastSpeech 2) in an embedded device (Raspberry Pi4 B+). The objective experimental results showed that the TTS model is compatible with the Raspberry Pi with high-quality synthesized speech and acceptable performance in terms of processing speed. Our proposed model could be used in many real-life applications if used together with a mechanism for caching, such as railway announcements and industrial purposes.

Description

Keywords