Conversion from Facial Myoelectric Signals to Speech: A Unit Selection Approach
by , , ,
Abstract:
This paper reports on our recent research on surface electromyographic (EMG) speech synthesis: a direct conversion of the EMG signals of the articulatory muscle movements to the acoustic speech signal. In this work we introduce a unit selection approach which compares segments of the input EMG signal to a database of simultaneously recorded EMG/audio unit pairs and selects the best matching audio unit based on target and concatenation cost, which will be concatenated to synthesize an acoustic speech output. We show that this approach is feasible to generate a proper speech output from the input EMG signal. We evaluate different properties of the units and investigate what amount of data is necessary for an initial transformation. Prior work on EMG-to-speech conversion used a framebased approach from the voice conversion domain, which struggles with the generation of a natural $F_0$ contour. This problem may also be tackled by our unit selection approach.
Reference:
Conversion from Facial Myoelectric Signals to Speech: A Unit Selection Approach (Marlene Zahner, Matthias Janke, Michael Wand, Tanja Schultz), In The 15th Annual Conference of the International Speech Communication Association, Singapore, 2014. (Interspeech 2014)
Bibtex Entry:
@inproceedings{zahner2014conversion,
  note={Interspeech 2014},
  year={2014},
  title={Conversion from Facial Myoelectric Signals to Speech: A Unit Selection Approach},
  booktitle={The 15th Annual Conference of the International     Speech Communication Association, Singapore},
  url={https://www.csl.uni-bremen.de/cms/images/documents/publications/ZahnerJankeWandSchultz_IS14_MyoelectricSignalsUnitSelection.pdf},
  abstract={This  paper  reports  on  our  recent  research  on  surface  electromyographic (EMG) speech synthesis: a direct conversion of the EMG signals of the articulatory muscle movements to the acoustic speech signal.  In this work we introduce a unit selection approach which compares segments of the input EMG signal to a database of simultaneously recorded EMG/audio unit pairs and selects the best matching audio unit based on target and concatenation cost, which will be concatenated to synthesize an acoustic speech output.  We show that this approach is feasible to generate a proper speech output from the input EMG signal. We evaluate different properties of the units and investigate what amount of data is necessary for an initial transformation.  Prior work on EMG-to-speech conversion used a framebased approach from the voice conversion domain, which struggles with the generation of a natural $F_0$ contour. This problem may also be tackled by our unit selection approach.},
  author={Zahner, Marlene and Janke, Matthias and Wand, Michael and Schultz, Tanja}
}