Direct Conversion from Facial Myoelectric Signals to Speech using Deep Neural Networks
by , ,
Abstract:
This paper presents our first results using Deep Neural Networks for surface electromyographic (EMG) speech synthesis. The proposed approach enables a direct mapping from EMG signals captured from the articulatory muscle movements to the acoustic speech signal. Features are processed from multiple EMG channels and are fed into a feed forward neural network to achieve a mapping to the target acoustic speech output. We show that this approach is feasible to generate speech output from the input EMG signal and compare the results to a prior mapping technique based on Gaussian mixture models. The comparison is conducted via objective Mel-Cepstral distortion scores and subjective listening test evaluations. It shows that the proposed Deep Neural Network approach gives substantial improvements for both evaluation criteria.
Reference:
Direct Conversion from Facial Myoelectric Signals to Speech using Deep Neural Networks (Lorenz Diener, Matthias Janke, Tanja Schultz), In International Joint Conference on Neural Networks, 2015. (IJCNN 2015)
Bibtex Entry:
@inproceedings{diener2015direct,
  title={Direct Conversion from Facial Myoelectric Signals to Speech using Deep Neural Networks},
  author={Diener, Lorenz and Janke, Matthias and Schultz, Tanja},
  note={IJCNN 2015},
  booktitle={International Joint Conference on Neural Networks},
  url={https://www.csl.uni-bremen.de/cms/images/documents/publications/Direct_Conversion_from_Facial_Myoelectric_Signals_to_Speech_using_Deep_Neural_Networks.pdf},
  abstract={This paper presents our first results using Deep Neural Networks for surface electromyographic (EMG) speech synthesis. The proposed approach enables a direct mapping from EMG signals captured from the articulatory muscle movements to the acoustic speech signal. Features are processed from multiple EMG channels and are fed into a feed forward neural network to achieve a mapping to the target acoustic speech output. We show that this approach is feasible to generate speech output from the input EMG signal and compare the results to a prior mapping technique based on Gaussian mixture models. The comparison is conducted via objective Mel-Cepstral distortion scores and subjective listening test evaluations. It shows that the proposed Deep Neural Network approach gives substantial improvements for both evaluation criteria.},
  keywords={electromyography, silent speech interface, deep neural networks},
  pages={1--7},
  doi={10.1109/IJCNN.2015.7280404},
  year={2015},
}