Integration Of Language Identification Into A Recognition System For Spoken Conversations Containing Code-Switches
by , , , , , , ,
Abstract:
This paper describes the integration of language identification (LID) into a multilingual automatic speech recognition (ASR) system for spoken conversations containing code-switches between Mandarin and English. We apply a multistream approach to combine at frame level the acoustic model score and the language information, where the latter is provided by an LID component. Furthermore, we advance this multistream approach by a new method called “Language Lookahead”, in which the language information of subsequent frames is used to improve accuracy. Both methods are evaluated using a set of controlled LID results with varying frame accuracies. Our results show that both approaches improve the ASR performance by at least 4% relative if the LID achieves a minimum frame accuracy of 85%.
Reference:
Integration Of Language Identification Into A Recognition System For Spoken Conversations Containing Code-Switches (Jochen Weiner, Ngoc Thang Vu, Dominic Telaar, Florian Metze, Tanja Schultz, Dau-Cheng Lyu, Eng-Siong Chng Li, Haizhou), In The third International Workshop on Spoken Languages Technologies for Under-resourced Languages, 2012. (SLTU'12)
Bibtex Entry:
@inproceedings{weiner2012integration,
  title={Integration Of Language Identification Into A Recognition System For Spoken Conversations Containing Code-Switches},
  year={2012},
  note={SLTU'12},
  booktitle={The third International Workshop on Spoken Languages Technologies for Under-resourced Languages},
  url={https://www.csl.uni-bremen.de/cms/images/documents/publications/weiner_vu_sltu2012.pdf},
  author={Weiner, Jochen and Vu, Ngoc Thang and Telaar, Dominic and Metze, Florian and Schultz, Tanja and Lyu, Dau-Cheng and Li, Eng-Siong Chng and Haizhou},
  abstract={This paper describes the integration of language identification (LID) into a multilingual automatic speech recognition (ASR) system  for  spoken  conversations  containing  code-switches between Mandarin and English.  We apply a multistream approach to combine at frame level the acoustic model score and the language information, where the latter is provided by an LID component.  Furthermore, we advance this multistream approach by a new method called “Language Lookahead”, in which the language information of subsequent frames is used to improve accuracy.  Both methods are evaluated using a set of controlled LID results with varying frame accuracies. Our results show that both approaches improve the ASR performance by at least 4% relative if the LID achieves a minimum frame accuracy of 85%.},
}