Decision-tree based Analysis of Speaking Mode Discrepancies in EMG-based Speech Recognition
by , ,
Abstract:
This study is concerned with the impact of speaking mode variabilities on speech recognition by surface electromyography (EMG). In EMG-based speech recognition, we capture the electric potentials of the human articulatory muscles by surface electrodes, so that the resulting signal can be used for speech processing. This enables the user to communicate silently, without uttering any sound. Previous studies have shown that the processing of silent speech creates a new challenge, namely that EMG signals of audible and silent speech are quite distinct. In this study we consider EMG signals of three speaking modes: audibly spoken speech, whispered speech, and silently mouthed speech. We present an approach to quantify the differences between these speaking modes by means of phonetic decision trees and show that this measure correlates highly with differences in the performance of a recognizer on the different speaking modes. We furthermore reinvestigate the spectral mapping algorithm, which reduces the discrepancy between different speaking modes, and give an evaluation of its effectiveness.
Reference:
Decision-tree based Analysis of Speaking Mode Discrepancies in EMG-based Speech Recognition (Michael Wand, Matthias Janke, Tanja Schultz), In International Conference on Bio-inspired Systems and Signal Processing, 2012. (BIOSIGNALS 2012)
Bibtex Entry:
@inproceedings{wand2012decision,
  year={2012},
  title={Decision-tree based Analysis of Speaking Mode Discrepancies in EMG-based Speech Recognition},
  note={BIOSIGNALS 2012},
  booktitle={International Conference on Bio-inspired Systems and Signal Processing},
  url={https://www.csl.uni-bremen.de/cms/images/documents/publications/WandJankeSchultz_BS2012_DecisionTreeAnalysis.pdf},
  abstract={This study is concerned with the impact of speaking mode variabilities on speech recognition by surface electromyography (EMG). In EMG-based speech recognition, we capture the electric potentials of the human articulatory muscles by surface electrodes, so that the resulting signal can be used for speech processing. This enables the user to communicate silently, without uttering any sound. Previous studies have shown that the processing of silent speech creates a new challenge, namely that EMG signals of audible and silent speech are quite distinct. In this study we consider EMG signals of three speaking modes: audibly spoken speech, whispered speech, and silently mouthed speech. We present an approach to quantify the differences between these speaking modes by means of phonetic decision trees and show that this measure correlates highly with differences in the performance of a recognizer on the different speaking modes. We furthermore reinvestigate the spectral mapping algorithm, which reduces the discrepancy between different speaking modes, and give an evaluation of its effectiveness.},
  author={Wand, Michael and Janke, Matthias and Schultz, Tanja}
}