Automatic Speaker Identification System for Urdu Speech

FATIMA YOUSAF, Min Peng, Samia Razaq, Agha Ali Raza, Suleman mazhar, qianqian Xie, Dong Li

Abstract


Speaker recognition is the process of recognizing a speaker from a verbal phrase. Such systems generally operates in two ways: to identify a speaker or to verify speaker’s claimed identity. Availability of valuable research material witnessed efforts paid to Automatic Speaker Identification (ASI) in East Asian, English and European languages. But unfortunately languages of South Asia especially “Urdu” have got very less attention. This paper aims to describe a new feature set for ASI in Urdu speech, achieving improved performance than baseline systems. Classifiers like Neural Net, Naïve Bayes and K nearest neighbor (K-NN) have been used for modeling. Results are provided on the dataset of 40 speakers with 82% correct identification. Lastly, improvement in system performance is also reported by changing number of recordings per speaker.


Full Text:

PDF

References


Reynolds, D.A. and R.C. Rose, Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech & Audio Processing, 1995. 3(1): p. 72-83.

__________________________________________________________

Kenny, P., et al., A study of interspeaker variability in speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 2008. 16(5): p. 980-988.

__________________________________________________________

Pillay, S.G., et al. Open-Set Speaker Identification under Mismatch Conditions. in Tenth Annual Conference of the International Speech Communication Association. 2009.

__________________________________________________________

Atal, B.S., Automatic recognition of speakers from their voices. Proceedings of the IEEE, 1976. 64(4): p. 460-475.

__________________________________________________________

Reynolds, D.A., Speaker identification and verification using Gaussian mixture speaker models. Speech Commun., 1995. 17(1-2): p. 91-108.

__________________________________________________________

Kalaivani, S. and R.S. Thakur, Modified Hidden Markov Model for Speaker Identification System. International Journal of Advances in Computer and Electronics Engineering, 2017. 2(3): p. 1-7.

__________________________________________________________

Subhashini, P. and T. Pratap, TEXT-INDEPENDENT SPEAKER RECOGNITION USING COMBINED LPC AND MFC COEFFICIENTS.

__________________________________________________________

Raza, A.A., et al. Design and development of phonetically rich Urdu speech corpus. in Speech Database and Assessments, 2009 Oriental COCOSDA International Conference on. 2009. IEEE.

__________________________________________________________

Reynolds, D.A. An overview of automatic speaker recognition technology. in Acoustics, speech, and signal processing (ICASSP), 2002 IEEE international conference on. 2002. IEEE.

__________________________________________________________

Shinde, R. and V. Pawar, Fusion of mfcc & lpc feature sets for accurate speaker identification. International JOurnal of Current Engineering and Technology, 2013. 3: p. 1763-1766.

__________________________________________________________

Zhao, L. and Z. Han. Speech recognition system based on integrating feature and HMM. in Measuring Technology and Mechatronics Automation (ICMTMA), 2010 International Conference on. 2010. IEEE.

__________________________________________________________

Gaikwad, S.K., B.W. Gawali, and P. Yannawar, A review on speech recognition technique. International Journal of Computer Applications, 2010. 10(3): p. 16-24.

__________________________________________________________

Nawaz, O. and T. Habib. Hidden Markov Model (HMM) based speech synthesis for Urdu language. in Conference on Language & Technology (CLT). 2014.

__________________________________________________________

Radha, V. and C. Vimala, A review on speech recognition challenges and approaches. doaj. org, 2012. 2(1): p. 1-7.

__________________________________________________________

Ahmed, Z. and J.P. Cabral. HMM-Based Speech Synthesiser for the Urdu Language. in Spoken Language Technologies for Under-Resourced Languages. 2014.

__________________________________________________________

Huang, J.-T., et al. Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers. in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. 2013.

__________________________________________________________

Raza, A.A., Design and Development of an Automatic Speech Recognition System for Urdu. 2009, Thesis, FAST‐National University of Computer and Emerging Sciences, Lahore Pakistan.

__________________________________________________________

Reynolds, D.A., et al. The 2004 MIT Lincoln laboratory speaker recognition system. in Acoustics, Speech, and Signal Processing, 2005. Proceedings.(ICASSP'05). IEEE International Conference on. 2005. IEEE.

__________________________________________________________

Dumitru, C.O. and I. Gavat. A comparative study of feature extraction methods applied to continuous speech recognition in Romanian Language. in Multimedia Signal Processing and Communications, 48th International Symposium ELMAR-2006 focused on. 2006. IEEE.

__________________________________________________________

Marhon, S.A. and D.N.U. Al-Aghar, Speaker Recognition Based On Neural Networks. The Higher Institute For Industry, Misrata, Libya.

__________________________________________________________

Cai, D., et al., Modeling splice sites with Bayes networks. Bioinformatics, 2000. 16(2): p. 152-158.

__________________________________________________________

Katz, M., et al. Sparse kernel logistic regression using incremental feature selection for text-independent speaker identification. in Speaker and Language Recognition Workshop, 2006. IEEE Odyssey 2006: The. 2006. IEEE.

__________________________________________________________

Maheswari, N.U., A. Kabilan, and R. Venkatesh, A hybrid model of neural network approach for speaker independent word recognition. International Journal of Computer Theory and Engineering, 2010. 2(6): p. 912.

__________________________________________________________

Sarfjoo, S.S., et al., Using eigenvoices and nearest-neighbors in HMM-based cross-lingual speaker adaptation with limited data. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 2017. 25(4): p. 839-851.

__________________________________________________________

Maas, A.L., et al., Building DNN acoustic models for large vocabulary speech recognition. Computer Speech & Language, 2017. 41: p. 195-213.

__________________________________________________________




DOI: http://dx.doi.org/10.22555/pjets.v8i1.1973

Refbacks

  • There are currently no refbacks.


Chief Editor

Prof. Dr. Tariq Rahim Soomro 
Dean
College of Computer Science & Information Systems

Editorial Advisory Board (Internal)

Dr. Syed Irfan Hyder
Dr. S.M. Aqil Burney
Dr. Ejaz Ahmed
Dr. Mohammad Irshad Khan
Dr. Shahid Amjad
Dr. Fatima Riaz
Dr. Insia Hussain
Dr. Ehsan Rehman
Dr. Imran Majid
Dr. Khurram Iqbal
Dr. Zeeshan Shahid
Dr. Seema Ansari
Dr. Muhammad Mansoor Alam

 

Editorial Advisory Board (International)

Prof. Dr. Mazliham Mohd Su'ud, President, Multimedia University, Malaysia

Dr. Eiad Yafi, Deputy Dean, Institute of Post Graduate Studies, Universiti Kuala Lumpur, Malaysia

Prof. Dr. Ghassan Al-Qaimari, President, Emirates College of Technology, Abu Dhabi, UAE
Prof. Dr. Patrice Boursier, Universite de La Rochelle, La Rochelle, France
Prof. Dr. Mudassir Uddin, Professor, University of Karachi, Pakistan
Dr. Nadeem Doudpota, Associate Professor, Al-Baha University, KSA
Dr. Haithem Abdelrazaq Almefleh, Associate Professor, Yarmouk University, Yarmouk, Jordan
Dr. Saiful Islam Ansari, Assistant Professor, University of Tabuk, Saudi Arabia