Document Type : Articles

Authors

1 Faculty of the Literature and Humanities, Laboratory of Linguistics, University of Tehran

2 Faculty of New Sciences and Technologies, University of Tehran

Abstract

This paper aims to investigate how a Persian spoken poetry game, called Mosha'ere, can be computerized by using a Persian automatic speech recognition system trained with read speech. To do this, the text and recitation speech of the poetries of the great poets, Hafez and Sa'di, were gathered. A spoken poetry rhyming game called Chakame, was developed. It utilizes a context-dependent tri-phone HMM acoustic modeling trained by Persian read speech with normal speed to recognize beyts, i.e., lines of verses, spoken by a human user. Chakame was evaluated against two kinds of recitation speech: 100 beyts recited formally at the normal rate and another 100 beyts recited emotionally hyperarticulated at a slow rate. About 23% difference in WER shows the impact of the intrinsic features of emotional recitation speech of verses on recognition rate. However, an overall beyt recognition rate of 98.5% was obtained for Chekame.

Keywords

Ansari, Z. & Seyyedsalehi, S. A. (2017). Toward growing modular deep neural networks for continuous speech recognition. Neural Computing and Applications28(1), 1177-1196.
Babaali, B. (2016). Establishing a New and Efficient Platform for Persian Speech Recognition, Signal and Data Processing Journal, 13 (3).
Baumann, T., Kennington, C., Hough, J. & Schlangen, D. (2017). Recognising conversational speech: What an incremental asr should do for a dialogue system and how to get there. In Dialogues with social robots (pp. 421-432). Springer, Singapore.
Benzeghiba, M., De Mori, R., Deroo, O., Dupont, S., Erbes, T., Jouvet, D.,  Fissore, L., Laface, P., Mertins, A., Ris, C. & Rose, R.. (2007). Automatic speech recognition and speech variability: A review. Speech communication, 49(10-11), 763-786.
Bijankhan M. (2018). Phonology. Chapter 5 of Anousha Sedighi and Pouneh Shabani-Jadidi (2018). The Oxford Handbook of Persian Linguistics. Oxford: Oxford University Press.
Bijankhan, M., Sheikhzadegan, J., Roohani, M.R., Samareh, Y., Lucas, C. & Tebyani, M. (1994). FARSDAT-The speech database of Farsi spoken language. In Proceedings of Australian Conference On Speech Science And Technology, 2, (pp. 826-831).
Bijankhan, M., Sheykhzadegan, J., Bahrani, M. & Ghayoomi, M. (2011). Lessons from building a Persian written corpus: Peykare. Language resources and evaluation, 45(2), 143-164.
Daneshvar, M. & Veisi, H. (2016). Persian phoneme recognition using long short-term memory neural network. In Eighth IEEE International Conference on Information and Knowledge Technology (IKT).
Ensslin, A. (2014). Literary Gaming. Cambridge: The MIT Press.
Goodarzi, M. M. & Almasganj, F. (2016). A GMM/HMM model for reconstruction of missing speech spectral components for continuous speech recognition. International Journal of Speech Technology, 19(4), 769-777.
Hadian, H., Povey, D., Sameti, H. & Khudanpur, S. (2017). Phone Duration Modeling for LVCSR Using Neural Networks. In INTERSPEECH (pp. 518-522).
Hajimani, A. (2017). Persian Speech Recognition Using Deep Learning, M.Sc. Thesis, University of Tehran.
Hajitabar, A. (2016). Large Vocabulary Isolated Word Recognition Using Deep Neural Networks. M.Sc. Thesis, Sharif University of Technology.
Halteren, H. V. (1999). Syntactic Wordclass Tagging (9). Springer Science & Business Media.
Hayes, B. (1979). The rhythmic structure of Persian verse. Edebiyat, 4, 193-242.
Hayes, B. (1989). Compensatory lengthening in moraic phonology. Linguistic inquiry, 20(2), 253-306.
Hayes, B. (2009). Introductory phonology. Malden: Wiley-Blackwell.
Kahnemuyipour, A. (2003). Syntactic Categories and Persian Stress. Natural Language and Linguistic Theory, 21, 333–379.
Kanda, N., Lu, X. & Kawai, H. (2016). Maximum a posteriori Based Decoding for CTC Acoustic Models. In Interspeech (pp. 1868-1872).
McQueen, J. M., Cutler, A., Briscoe, T. & Norris, D. (1995). Models of continuous speech recognition and the contents of the vocabulary. Language and cognitive processes, 10(3-4), 309-331.
Meyer, B. T., Brand, T. & Kollmeier, B. (2011). Effect of speech-intrinsic variations on human and automatic recognition of spoken phonemes. The Journal of the Acoustical Society of America, 129(1), 388-403.
Meyer, B. T., Jürgens, T., Wesker, T., Brand, T. & Kollmeier, B. (2010). Human phoneme recognition depending on speech-intrinsic variability. The Journal of the Acoustical Society of America128(5), 3126-3141.
Milani, A. (2008). Eminent Persians: The Men and Women Who Made Modern Iran. Syracuse University Press.
Sadat-Tehrani, N. (2008). The Structure of Persian Intonation. In Proceedings of the Speech Prosody (pp. 249-252). ISCA Archiv.
Sameti, H., Veisi, H., Bahrani, M., Babaali, B. & Hosseinzadeh, K. (2011). A large vocabulary continuous speech recognition system for Persian language. EURASIP Journal on Audio, Speech, and Music Processing, 2011(1), 1-12.
Sheikhzadegan, J. & Bijankhan, M. (2006). Persian speech databases. In Proceedings of the 2nd Workshop on Persian Language and Computer (pp. 247–261).
Spitzer, S. M., Liss, J. M. & Mattys, S. L. (2007). Acoustic cues to lexical segmentation: A study of resynthesized speech. The Journal of the Acoustical Society of America, 122(6), 3678-3687.
Utas, B. (2008). Prosody: Meter and Rhyme. J. T. P. de Bruijn. Publisher: I. B. Tauris & Co Ltd.