Codebook Clustering for Unit Selection Based EMG-to-Speech Conversion

Diener, Lorenz; Janke, Matthias; Schultz, Tanja

by Lorenz Diener, Matthias Janke, Tanja Schultz

Abstract:

This paper reports on our recent advances in using Unit Selection to directly synthesize speech from facial surface electromyographic (EMG) signals generated by movement of the articulatory muscles during speech production. We achieve a robust Unit Selection mapping by using a more sophisticated unit codebook. This codebook is generated from a set of base units using a two stage unit clustering process. The units are first clustered based on the audio and afterwards on the EMG feature vectors they cover, and a new codebook is generated using these cluster assignments. We evaluate different cluster counts for both stages and revisit our evaluation of unit sizes in light of this clustering approach. Our final system achieves a significantly better Mel-Cepstral distortion score than the Unit Selection based EMG-to-Speech conversion system from our previous work while, due to the reduced codebook size, taking less time to perform the conversion.

Download PDF

PDF URL: https://www.csl.uni-bremen.de/cms/images/documents/publications/Codebook_Clustering_for_Unit_Selection Based_EMG-to-Speech_Conversion.pdf

Reference:

Codebook Clustering for Unit Selection Based EMG-to-Speech Conversion (Lorenz Diener, Matthias Janke, Tanja Schultz), In Sixteenth Annual Conference of the International Speech Communication Association, 2015. (Interspeech 2015)

Bibtex Entry:

@inproceedings{diener2015codebook,
  title={Codebook Clustering for Unit Selection Based EMG-to-Speech Conversion},
  author={Diener, Lorenz and Janke, Matthias and Schultz, Tanja},
  note={Interspeech 2015},
  booktitle={Sixteenth Annual Conference of the International Speech Communication Association},
  pages={2420--2424},
  abstract={This paper reports on our recent advances in using Unit Selection to directly synthesize speech from facial surface electromyographic (EMG) signals generated by movement of the articulatory muscles during speech production. We achieve a robust Unit Selection mapping by using a more sophisticated unit codebook. This codebook is generated from a set of base units using a two stage unit clustering process. The units are first clustered based on the audio and afterwards on the EMG feature vectors they cover, and a new codebook is generated using these cluster assignments. We evaluate different cluster counts for both stages and revisit our evaluation of unit sizes in light of this clustering approach. Our final system achieves a significantly better Mel-Cepstral distortion score than the Unit Selection based EMG-to-Speech conversion system from our previous work while, due to the reduced codebook size, taking less time to perform the conversion.},
  url={https://www.csl.uni-bremen.de/cms/images/documents/publications/Codebook_Clustering_for_Unit_Selection Based_EMG-to-Speech_Conversion.pdf},
  keywords={electromyography, silent speech interface, unit selection},
  year={2015}
}