Investigation of deep learning techniques in speech recognition for under-resourced languages: : the case of Afaan Oromo

Yadeta Gonfa; Kula Kekeba

doi:10.20372/hjet.v1i2.79

Authors

Yadeta Gonfa Department of Software Engineering, College of Mechanical and Electrical Engineering, Addis Ababa Science and Technology University, Addis Ababa, Addis Ababa, Ethiopia
Kula Kekeba Department of Information Technology, School of Technology and Informatics, Ambo University, Ambo, Ethiopia

DOI:

https://doi.org/10.20372/hjet.v1i2.79

Keywords:

Deep learning, convolutional neural network, speech recognition, under-resourced languages, Afaan Oromo

Abstract

Human-machine interactions are increasing in day-to-day human activities. Automatic Speech Recognition (ASR) is one of the hot research areas to invent the machine that can understand human languages to give responses. Many researchers show the possibility of developing a speech recognition system for assisting human beings in communicating with their machines like computers. ASR work started in the mid of 19th century, and several improvements were presented by implementing various tools and techniques. Several works of literature show that the deep learning approach is currently state-of-the-art in speech recognition. Still, there needs to be more research on learning approaches for under-resourced languages. However, the need for large datasets to implement deep learning approaches is challenging for under-resourced languages. Exploring a deep learning approach for Ethiopian languages, in general, and Afaan Oromo, in particular, should have been emphasized. Therefore, investigating deep learning techniques in speech recognition for Afaan Oromo is the main objective of this study. The experiment was conducted on 2953 utterances of total datasets, and the Convolutional Neural Network (CNN) model was used. The datasets were partitioned into training, validating, and testing datasets. The best test accuracy of 51.27% was obtained when batch size, number of epochs, and learning rate were set to 32, 40, and 0.001, respectively. This result is incredible when compared with the result obtained using the Hidden Markov Model (HMM). Therefore, we have a conclusion on the possibility of investigating deep learning techniques in speech recognition for Afaan Oromo by implementing the CNN model. Further work could be experimented with by using other deep learning algorithms and techniques to improve the accuracy.

Investigation of deep learning techniques in speech recognition for under-resourced languages:

the case of Afaan Oromo

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Current Issue

Information