Investigating the Learning Effect of Multilingual Bottle-Neck Features for ASR
by , ,
Abstract:
Deep neural networks (DNNs) have become state-of-the-art techniques of automatic speech recognition in the last few years. They can be used at the preprocessing level (Tandem or Bottle-Neck features) or at the acoustic model level (hybrid Hidden Markov Model/DNN). Moreover, they allow exploiting multilingual data to improve monolingual systems. This paper presents our investigation of the learning effect of neural networks in the context of multilingual Bottle-Neck features. For this, we perform a visual analysis of the output of the Bottle-Neck layer of a neural network using t-Distributed Stochastic Neighbor Embedding. Our results show that multilingual Bottle-Neck features seem to learn phoneme characteristics, such as the F1 and F2 formants which characterize different vowels, and other articulatory features, such as fricatives and nasals which characterize consonants. Furthermore, they seem to normalize language dependent variations and transfer the learned representation to unseen languages.
Reference:
Investigating the Learning Effect of Multilingual Bottle-Neck Features for ASR (Ngoc Thang Vu, Jochen Weiner, Tanja Schultz), In The 15th Annual Conference of the International Speech Communication Association, Singapore, 2014. (Interspeech 2014)
Bibtex Entry:
@inproceedings{vu2014investigating,
  year={2014},
  title={Investigating the Learning Effect of Multilingual Bottle-Neck Features for ASR},
  note={Interspeech 2014},
  booktitle={The 15th Annual Conference of the International Speech Communication Association, Singapore},
  url={https://www.csl.uni-bremen.de/cms/images/documents/publications/354_Paper.pdf},
  author={Vu, Ngoc Thang and Weiner, Jochen and Schultz, Tanja},
  abstract={Deep  neural  networks  (DNNs)  have  become  state-of-the-art techniques of automatic speech recognition in the last few years. They can be used at the preprocessing level (Tandem or Bottle-Neck features) or at the acoustic model level (hybrid Hidden Markov  Model/DNN).  Moreover,  they  allow  exploiting  multilingual  data  to  improve  monolingual systems. This paper presents our investigation of the learning effect of neural networks in the context of multilingual Bottle-Neck features.  For this, we perform a visual analysis of the output of the Bottle-Neck layer  of  a  neural  network  using  t-Distributed  Stochastic  Neighbor  Embedding.   Our  results  show  that  multilingual Bottle-Neck  features  seem  to  learn  phoneme  characteristics, such  as  the F1 and F2 formants  which  characterize  different vowels,  and other articulatory features,  such as fricatives and nasals which characterize consonants.  Furthermore, they seem to  normalize  language  dependent  variations  and  transfer  the learned representation to unseen languages.}
}