Error Signatures to identify Errors in ASR in an unsupervised fashion
by , ,
Abstract:
Large scale ASR systems are trained on thousands of hours of speech. Usually, many of these training data were automatically transcribed by another ASR system due to a lack of manual transcriptions and a lack of resources to transcribe them. Systems trained in such a fashion are biased towards the transcription system. In the past, confidence models have been investigated to exclude data from training. We propose to investigate areas of low confidence by extending our previous work. For this purpose we aggregate potential errors of ASR systems by ascribing a list of attributes to each potential error and find a set of attributes which best describe the errors encountered on an automatically transcribed set. We call these characteristic sets of attributes Error Signatures. Examples of attributes are word identity, phonemes, acoustic models, word context, speaker id, and language id. For each Error Signatures, an error ratio is computed, giving the probability that the signature properly describes the error. Error ratios and occurrence frequencies are used to sort the signatures and present them to an expert to fix the Error Signatures underlying shortcomings of the ASR system.
Reference:
Error Signatures to identify Errors in ASR in an unsupervised fashion (Dominic Telaar, Jochen Weiner, Tanja Schultz), In Proceedings of the Errare Workshop (ERRARE 2015), 2015.
Bibtex Entry:
@inproceedings{telaar2016error,
  title={{Error Signatures to identify Errors in ASR in an unsupervised fashion}},
  author={Dominic Telaar and Jochen Weiner and Tanja Schultz},
  year={2015},
  booktitle={Proceedings of the Errare Workshop (ERRARE 2015)},
  abstract={Large scale ASR systems are trained on thousands of hours of speech. Usually, many of these training data were automatically transcribed by another ASR system due to a lack of manual transcriptions and a lack of resources to transcribe them. Systems trained in such a fashion are biased towards the transcription system. In the past, confidence models have been investigated to exclude data from training. We propose to investigate areas of low confidence by extending our previous work. For this purpose we aggregate potential errors of ASR systems by ascribing a list of attributes to each potential error and find a set of attributes which best describe the errors encountered on an automatically transcribed set. We call these characteristic sets of attributes Error Signatures. Examples of attributes are word identity, phonemes, acoustic models, word context, speaker id, and language id. For each Error Signatures, an error ratio is computed, giving the probability that the signature properly describes the error. Error ratios and occurrence frequencies are used to sort the signatures and present them to an expert to fix the Error Signatures underlying shortcomings of the ASR system.},
  url={https://www.csl.uni-bremen.de/cms/images/documents/publications/TelaarEtAl_Errare2015.pdf}
}