OPT 12 Voice recognition and automatic language processing. ** DEADLINE FEB 14, 2020 **

JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

Please select a paper and summarize it. You will make a reading report of 2 pages of text maximum IN ENGLISH, NOT including figures and references (which can take as much as 2 additional pages).
Submit your report no later than February 14, 2020.
The report will contain:
- A highlight and explanation of the main contribution (s) of the article.
- An analysis of the advantages and disadvantages of the proposed method.
- Your personal point of view on the contribution, the interest, the relevance of the article, its soundness, and what could be improved.
- Did the article have an impact, was it subsequently taken up / extended by the authors or other authors?
- If the article allows it, you are encouraged to implement part of the article to deepen your understanding (a python program under form of commented notebook can be made in addition).

You will get a grade on 20 points, with 5 points for each of these criteria:
- Understanding of the article
- Depth and accuracy of remarks
- Creativity of improvement suggestions and effort to reproduce the results
- Clarity and presentation

(1) Abdel-Hamid, O., Mohamed, A., Jiang, H., Deng, L., Penn, G., Yu, D., 2014. Convolutional Neural Networks for Speech Recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing 22, 1533–1545. https://doi.org/10.1109/TASLP.2014.2339736

(2) Bellegarda, J.R., Monz, C., 2016. State of the art in statistical methods for language and speech processing. Computer Speech & Language 35, 163–184. https://doi.org/10.1016/j.csl.2015.07.001

(3) Besacier, L., Barnard, E., Karpov, A., Schultz, T., 2014. Automatic speech recognition for under-resourced languages: A survey. Speech Communication 56, 85–100. https://doi.org/10.1016/j.specom.2013.07.008

(4) Dahl, G.E., Yu, D., Deng, L., Acero, A., 2012. Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition. IEEE Transactions on Audio, Speech, and Language Processing 20, 30–42. https://doi.org/10.1109/TASL.2011.2134090

(5) De Mulder, W., Bethard, S., Moens, M.-F., 2015. A survey on the application of recurrent neural networks to statistical language modeling. Computer Speech & Language 30, 61–98. https://doi.org/10.1016/j.csl.2014.09.005

(6) Deng, L., Li, X., 2013. Machine Learning Paradigms for Speech Recognition: An Overview. IEEE Transactions on Audio, Speech, and Language Processing 21, 1060–1089. https://doi.org/10.1109/TASL.2013.2244083
Kempton, T., Moore, R.K., 2014. Discovering the phoneme inventory of an unwritten language: A machine-assisted approach. Speech Communication 56, 152–166. https://doi.org/10.1016/j.specom.2013.02.006
Kujala, J.V., 2013. A probabilistic approach to pronunciation by analogy. Computer Speech & Language 27, 1049–1067. https://doi.org/10.1016/j.csl.2012.12.004

(7) Maas, A.L., Qi, P., Xie, Z., Hannun, A.Y., Lengerich, C.T., Jurafsky, D., Ng, A.Y., 2017. Building DNN acoustic models for large vocabulary speech recognition. Computer Speech & Language 41, 195–213. https://doi.org/10.1016/j.csl.2016.06.007

(8) Marxer, R., Barker, J., Alghamdi, N., Maddock, S., 2018. The impact of the Lombard effect on audio and visual speech recognition systems. Speech Communication 100, 58–68. https://doi.org/10.1016/j.specom.2018.04.006

(9) Palaz, D., Magimai-Doss, M., Collobert, R., 2019. End-to-end acoustic modeling using convolutional neural networks for HMM-based automatic speech recognition. Speech Communication 108, 15–32. https://doi.org/10.1016/j.specom.2019.01.004

(10) Razavi, M., Rasipuram, R., Magimai.-Doss, M., 2016. Acoustic data-driven grapheme-to-phoneme conversion in the probabilistic lexical modeling framework. Speech Communication 80, 1–21. https://doi.org/10.1016/j.specom.2016.03.003

(11) Sundermeyer, M., Ney, H., Schlüter, R., 2015. From Feedforward to Recurrent LSTM Neural Networks for Language Modeling. IEEE/ACM Transactions on Audio, Speech, and Language Processing 23, 517–529. https://doi.org/10.1109/TASLP.2015.2400218

(12) S. Gonzalez and M. Brookes, "PEFAC - A Pitch Estimation Algorithm Robust to High Levels of Noise," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, no. 2, pp. 518-530, Feb. 2014.

(13) D. Wang, C. Yu and J. H. L. Hansen, "Robust Harmonic Features for Classification-Based Pitch Estimation," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 5, pp. 952-964, May 2017.

(14) A. Klapuri, "Multipitch Analysis of Polyphonic Music and Speech Signals Using an Auditory Model," in IEEE Transactions on Audio, Speech, and Language Processing, vol. 16, no. 2, pp. 255-266, Feb. 2008.

(15) Myriam Munezero, Calkin Suero Montero, Member, IEEE, Erkki Sutinen, and John Pajunen. Are They Different? Affect, Feeling, Emotion, Sentiment, and Opinion Detection, in Text. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 5, NO. 2, APRIL-JUNE 2014.

(16) Richard Socher, Alex Perelygin, Jean Y. Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng and Christopher Potts Stanford University, Stanford, CA 94305, USA. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. in Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing Stroudsburg, PA Association for Computational Linguistics, October 2013 pp 1631 1642.

(17) Vasilescu, I., Hernandez, N., Vieru, B., Lamel, L. 2018. Exploring Temporal Reduction in Dialectal Spanish : A Large-scale Study of Lenition of Voiced Stops and Coda-s In INTERSPEECH 2018, 2728-2732, DOI : 10.21437/Interspeech, Hyderabad, India.

(18) Michel, Jean-Baptiste & Shen, Yuan & Aiden, Aviva & Veres, Adrian & Gray, Matthew & Pickett, Joseph & Hoiberg, Dale & Clancy, Dan & Norvig, Peter & Orwant, Jon & Pinker, Steven & Nowak, Martin & Aiden, Erez. (2011). Quantitative Analysis of Culture Using Millions of Digitized Books. Science (New York, N.Y.). 331. 176-82. 10.1126/science.1199644.

(19) Adda-Decker, Martine, Lamel, Lori, Discovering speech reduction across speaking styles and languages, in F. Cangemi et al., editors, Rethinking reduction : interdisciplinary perspectives on conditions, mechanisms, and domains for phonetic variation, Berlin : De Gruyter Mouton, 2018

(20) Daichi Kitamura, et al. Determined Blind Source Separation Unifying Independent Vector Analysis and Nonnegative Matrix Factorization. IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 24, NO. 9, SEPTEMBER 2016.

(21) Yi Luo and Nima Mesgarani. Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation. IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 27, NO. 8, AUGUST 2019.

(22) Xiaofei Li, Laurent Girin, Sharon Gannot and Radu Horaud. Multichannel Speech Separation and Enhancement Using the Convolutive Transfer Function . arXiv:1711.07911v2 [cs.SD] 26 Feb 2018.

(23) Jaejin Cho et al. Deep neural networks for emotion recognition combining audio and transcripts. 2019 (arXiv:1911.00432)

(24) Seunghyun Yoon et al. Speech Emotion Recognition Using Multi-hop Attention Mechanism, ICASSP 2019

(25) Caroline Etienne et al., CNN+LSTM Architecture for Speech Emotion Recognition with Data Augmentation, Workshop on Speech, Music and Mind 2018

(26) Belinkov & Glass, Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems, Nips 2017

Family name (Nom) *

First name (Prénom) *

Email *

Number of chosen article *

URL to a PDF of your report *

Submit

Clear form

Never submit passwords through Google Forms.

This content is neither created nor endorsed by Google. Report Abuse - Terms of Service - Privacy Policy

Forms