A First Speech Recognition System For Mandarin-English Code-Switch Conversational Speech
by , , , , , , , ,
Abstract:
This paper presents first steps toward a large vocabulary continuous speech recognition system (LVCSR) for conversational Mandarin-English code-switching (CS) speech. We applied state-of-the-art techniques such as speaker adaptive and discriminative training to build the first baseline system on the SEAME corpus [1] (South East Asia Mandarin-English). For acoustic modeling, we applied different phone merging approaches based on the International Phonetic Alphabet (IPA) and Bhattacharyya distance in combination with discriminative training to improve accuracy. On language model level, we investigated statistical machine translation (SMT) - based text generation approaches for building code-switching language models. Furthermore, we integrated the provided information from a language identification system (LID) into the decoding process by using a multi-stream approach. Our best 2-pass system achieves a Mixed Error Rate (MER) of 36.6% on the SEAME development set.
Reference:
A First Speech Recognition System For Mandarin-English Code-Switch Conversational Speech (Ngoc Thang Vu, Dau-Cheng Lyu, Jochen Weiner, Dominic Telaar, Tim Schlippe, Fabian Blaicher, Eng-Siong Chng, Tanja Schultz, Haizhou Li), In 37th International Conference on Acoustics, Speech, and Signal Processing, Kyoto, Japan, 2012. (ICASSP 2012)
Bibtex Entry:
@inproceedings{vu2012a,
  title={A First Speech Recognition System For Mandarin-English Code-Switch Conversational Speech},
  year={2012},
  note={ICASSP 2012},
  booktitle={37th International Conference on Acoustics, Speech, and Signal Processing, Kyoto, Japan},
  url={https://www.csl.uni-bremen.de/cms/images/documents/publications/ICASSP2012-Vu_CodeSwitch.pdf},
  author={Vu, Ngoc Thang and Lyu, Dau-Cheng and Weiner, Jochen and Telaar, Dominic and Schlippe, Tim and Blaicher, Fabian and Chng, Eng-Siong and Schultz, Tanja and Li, Haizhou},
  abstract={This  paper  presents  first  steps  toward  a  large  vocabulary continuous speech recognition system (LVCSR) for conversational Mandarin-English code-switching (CS) speech. We applied state-of-the-art techniques such as speaker adaptive and discriminative  training  to  build  the  first  baseline  system  on the SEAME corpus [1] (South East Asia Mandarin-English). For acoustic modeling,  we applied different phone merging approaches  based  on  the  International  Phonetic  Alphabet (IPA)  and  Bhattacharyya  distance  in  combination  with  discriminative training to improve accuracy. On language model level, we investigated statistical machine translation (SMT) - based text generation approaches for building code-switching language models.   Furthermore,  we integrated the provided information from a language identification system (LID) into the decoding process by using a multi-stream approach.  Our best  2-pass  system  achieves  a  Mixed  Error  Rate  (MER)  of 36.6% on the SEAME development set.},
}