Model-centric data selection: Refining end-to-end speech recognition
Date
Type
Konferenciaközlemény
Language
en
Reading access rights:
Open access
Rights Holder
Szerző
Conference Date
2024-02-05
Conference Place
Budapest
Conference Title
2nd Workshop on Intelligent Infocommunication Networks, Systems and Services (WI2NS2)
ISBN, e-ISBN
978-963-421-944-6
Container Title
2nd Workshop on Intelligent Infocommunication Networks, Systems and Services
Version
Post print
Faculty
Faculty of Electrical Engineering and Informatics
First Page
1
Subject (OSZKAR)
Data selection
Dataset Optimization
Machine learning
Automatic Speech Recognition
Dataset Optimization
Machine learning
Automatic Speech Recognition
Gender
Konferenciacikk
University
Budapest University of Technology and Economics
- Cite this item
- https://doi.org/10.3311/WINS2024-001
OOC works
Abstract
Data selection can be an important step in pre-processing datasets for Automatic Speech Recognition (ASR) -- still its application is not general. In order to handle potential labeling errors and other anomalies in the dataset, we introduced a simple model-centric speech data selection strategy. It discards samples in the dataset that is difficult to recognize by the model, and use a restricted dataset to retrain the model. This technique improved the recognition accuracy of Hungarian ASR both on the BEA-Base and Common Voice (CV) datasets by using the Conformer model architecture. The proposed approach achieved a consistent relative improvement in terms of both Character and Word Error Rates (CER, WER), up to (3%, 2.5%).