RNN-Transducer-based Losses for Speech Recognition on Noisy Targets

Bataev, Vladimir

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:2504.06963 (eess)

[Submitted on 9 Apr 2025]

Title:RNN-Transducer-based Losses for Speech Recognition on Noisy Targets

Authors:Vladimir Bataev

View PDF HTML (experimental)

Abstract:Training speech recognition systems on noisy transcripts is a significant challenge in industrial pipelines, where datasets are enormous and ensuring accurate transcription for every instance is difficult. In this work, we introduce novel loss functions to mitigate the impact of transcription errors in RNN-Transducer models. Our Star-Transducer loss addresses deletion errors by incorporating "skip frame" transitions in the loss lattice, restoring over 90% of the system's performance compared to models trained with accurate transcripts. The Bypass-Transducer loss uses "skip token" transitions to tackle insertion errors, recovering more than 60% of the quality. Finally, the Target-Robust Transducer loss merges these approaches, offering robust performance against arbitrary errors. Experimental results demonstrate that the Target-Robust Transducer loss significantly improves RNN-T performance on noisy data by restoring over 70% of the quality compared to well-transcribed data.

Comments:	Final Project Report, Bachelor's Degree in Computer Science, University of London, March 2024
Subjects:	Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
Cite as:	arXiv:2504.06963 [eess.AS]
	(or arXiv:2504.06963v1 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.2504.06963

Submission history

From: Vladimir Bataev [view email]
[v1] Wed, 9 Apr 2025 15:18:29 UTC (1,977 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:RNN-Transducer-based Losses for Speech Recognition on Noisy Targets

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:RNN-Transducer-based Losses for Speech Recognition on Noisy Targets

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators