Received: 30-12-2019
Accepted: 26-09-2020
DOI:
Views
Downloads
How to Cite:
Speech Recognition of Vietnamese Alphabet using Deep Boltzmann Machines
Keywords
Artificial intelligence, machine learning, neural network, Boltzmann machine, deep learning
Abstract
Speech recognition has been attracting many researchers in the field of artificial intelligence recently. For example, the problem of implementing a program for robots to recognize human speech, thereby robots can understand, learn and talk with human. In this study, 37 students from Vietnam National University of Agriculture were involved to acquire speech data of 29 letters in Vietnamese alphabet. The data were preprocessed to extract featured voice chunks for the classification. We then used the deep Boltzmann machine (DBM) as a deep network with stacked hidden layers. To evaluate the proposed method, we compared the learning performance of DBM to a neural network (NN) with the same network structure configuration. The results showed that DBM performed better with accuracies of 68% on the training dataset and 51% on the test dataset, while the respective figures for NN were 61% of training and 48%.
References
Dhar V. (2015). Data science and prediction. Communications of the ACM.56 (12): 64-73.
Hilton E.G. (2012). A practical guide to training restricted Boltzmann machines. Lecture Notesin Computer Science, Springer Berlin. 7700: 599-619.
Hoàng Thị Châu (1999). Tiếng Việt trên các miền đất nước (Phương ngữ học). Nhà xuất bản Khoa học Xã hội, Hà Nội.
Hoàng Phê (2010). Từ điển tiếng Việt.Nhà xuất bản Đà Nẵng.
Hugo L., Michael M., Razvan P. & Yoshua B. (2012). Learning algorithms for the classification restricted Boltzmann machine. Machine Learning Research. 13(1): 643-669.
James K. (2010). Dialect experience in Vietnamese tone perception. The Journal of the Acoustical Society of America. 127(6): 3749-3757.
Kazuhiro N., Toru T., Hiroshi G.O.,Hirofumi N., Yuji H. & Hiroshi T. (2010). Design and implementation of robot audition system HARK - open source software for listening to three simultaneous speakers. Advanced Robotics. 24(5): 739-761.
Kuong N.T., Uchino E. & Suetake N. (2017). IVUS tissue characterization of coronary plaque by classification restricted Boltzmann machine. Journal of Advanced Computational Intelligence and Intelligent Informatics. 21(1): 67-73.
Kuong N.T., Uchino E. & Suetake N. (2018a). Recognition of coronary atherosclerotic plaque tissue on intravascular ultrasound images by using misclassification sensitive training of discriminative restricted boltzmann machine. Journal of Biomimetics, Biomaterials and Biomedical Engineering. 37: 85-93.
Kuong N.T., Uchino E. & Suetake N. (2018b). Coronary plaque classification with accumulative training of deep Boltzmann machines. ICIC Express Letters. 12(9): 881-886.
Lecun Y., Yoshua B. & Hinton E.G. (2015). Deep learning. Nature. 521(7553): 436-444.
Orken M., Nurbapa M., Mussa T., Nurzhamal O., Tolga I.M. & Aigerim Y. (2019). Voice identification using classification algorithms. Intelligent system and computing. Book chapter, InTechOpen.
Phuong P.A., Tao N.Q. & Mai L.C. (2008). An efficient model for isolated Vietnamese handwritten recognition. Proceedings of 2008 international conference on intelligent information hiding and multimedia signal processing. pp. 358-361.
Samuel S., Huili C., Safinah A., Michael K. & Cynthia B. (2018). A social robot system for modeling children's Word pronunciation: socially interactive agents track. Proceedings of the 17th international conference on autonomous agents and multi-agent systems. pp. 1658-1666.
Schmidhuber J. (2015). Deep Learning in neural networks: an overview. Neural Networks.61: 85-117.
Thinh D.B, Dat T.T., Thuy T.N., Long Q.T. & Van D.N. (2018). Aerial Image Semantic Segmentation using Neural Search Network Architecture. In Proceedings of Multi-Disciplinary International Conference on Artificial Intelligence (MIWAI), Lecture Notes in Artificial Intelligence, Springer.