您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

山东大学学报(理学版) ›› 2017, Vol. 52 ›› Issue (11): 29-36.doi: 10.6040/j.issn.1671-9352.0.2017.253

• • 上一篇    下一篇

基于DTW的俄语短指令语音识别

王彤,马延周,易绵竹*   

  1. 中国人民解放军外国语学院语言工程系, 河南 洛阳 471000
  • 收稿日期:2017-05-20 出版日期:2017-11-20 发布日期:2017-11-17
  • 通讯作者: 易绵竹(1964— ),男,博士,教授,研究方向为计算语言学. E-mail:13373781261@163.com E-mail:463906155@qq.com
  • 作者简介:王彤(1993— ),女,硕士研究生,研究方向为语言信息处理. E-mail:463906155@qq.com
  • 基金资助:
    国家自然科学基金重大项目(11590771)

Speech recognition of Russian short instructions based on DTW

WANG Tong, MA Yan-zhou, YI Mian-zhu*   

  1. Language Engineering Department, PLA University of Foreign Languages, Luoyang 471000, Henan, China
  • Received:2017-05-20 Online:2017-11-20 Published:2017-11-17

摘要: 面向训练语料有限的语音识别任务,基于动态时间规整(dynamic time warping, DTW)算法对俄语语音进行识别。首先,以跨语言标注的语音语料为资源基础,研究融合音字转换和机器翻译的语音识别方法。其次,结合俄语语音特点,以元音为中心设置动态门限阈值,实现精确至音节的端点检测,识别速度提高了34.4%,准确率提高了14%。然后,综合时域、频域分析,提取反映语音静态特征和动态变化的参数模板。另外,引入全局限制和早弃策略改进DTW算法,避免病态匹配,缩小计算规模,使速度提高了19.7%,准确率提高了4.8%。在俄语短指令语音集上做五折交叉验证,识别准确率达到74.9%。

关键词: 端点检测, 俄语语音识别, 跨语言语音识别, DTW算法

Abstract: Focus on speech recognition task with limited training corpus, this paper makes research of Russian speech recognition based on DTW(dynamic time warping)algorithm. Firstly, we study methods for combining speech recognition and machine translation with the speech corpus which annotating tags of cross language text. Secondly, based on the characteristics of Russian speech, in order to detected syllable endpoint, we set dynamic threshold according to the central vowel, which increased the speed by 34.4% and increased the accuracy by 14%. Finally, we extract the parameters of the static and dynamic characteristics by analyzing speech features of time domain and frequency domain. In addition, the DTW algorithm is improved to overcome the ill condition and reduce the computation scale with global restrictions and early discard strategies, which increased the speed by 4.8% and increased the accuracy by 19.7%. Experiments on the Russian short instruction set with 5 fold cross validation, and the accuracy of speech recognition reached 74.9%.

Key words: Russian speech recognition, endpoint detection, DTW algorithm, cross language speech recognition

中图分类号: 

  • TP391
[1] 张雪英.数字语音处理及Matlab仿真[M]. 北京:电子工业出版社,2016. ZHANG Xueying. Processing of digital speech and Matlab simulation[M]. Beijing: Electronics Industry Press, 2016.
[2] 韩纪庆,张磊,郑铁然,等.语音信号处理[M]. 北京:清华大学出版社,2013. HAN Jiqing, ZHANG Lei, ZHENG Tieran, et al. Processing of speech signal[M]. Beijing:Tsinghua University Press, 2013.
[3] 于俊婷,刘伍颖,易绵竹,等.国内语音识别研究综述[J]. 计算机光盘软件与应用,2014(10):76-78. YU Junting, LIU Wuying, YI Mianzhu, et al. Review of speech recognition in China[M].Computer CD Software and Applications, 2014(10):76-78.
[4] 徐来娣.俄语音节理论研究与俄语音节切分优化方案[J]. 中国俄语教学,2009, 28(4):69-72. XU Laidi. Russian syllable theory study andoptimization of Russian syllable segmentation[J]. Teaching Russian in China, 2009, 28(4):69-72.
[5] 徐来娣.俄汉语流重音声学实验对比研究[J]. 中国俄语教学,2016, 35(2):75-81. XU Laidi. A contrastive study of Russian and Chinese acoustics stream stress[J]. Teaching Russian In China, 2016, 35(2):75-81.
[6] 赵芳丽.中国人说俄语声学特征的实验分析及训练对策[J]. 中国俄语教学,2011, 30(3):76-79. ZHAO Fangli. Experimental analysis and training strategies of Russian acoustic features by Chinese[J]. Teaching Russian In China, 2011, 30(3):76-79.
[7] ZHAO Fangli. Russian pronunciation analysis utilized praat software[J]. Computer Engineering and Applications, 2012, 48(11):133-136.
[8] 马延周.基于标注新闻语料的俄语连续语音识别研究[D]. 郑州:解放军外国语学院, 2015. MAYanzhou. A study of Russian continuous speech recognition based on tagged news corpus[D]. Zhengzhou: PLA University of Foreign Language, 2015.
[9] 胡航.语音信号处理[M]. 哈尔滨:哈尔滨工业大学出版社,2009. HU Hang. Processing of speech signal[M]. Harbin: Harbin Institute of Technology Press, 2009.
[10] YOO IC,YOOK D. Robust voice activity detection using the spectral peaks of vowel sounds[J]. ETRI Journal, 2009, 31(4):451-453.
[11] ORTIZ P D, VILLA LF, SALAZAR C, et al. A simple but efficient voice activity detection algorithm through Hilbert transform and dynamic threshold for speech pathologies[J]. Journal of Physics: Conference Series, 2016, 705(1):012037.
[12] 宋知用.MATLAB在语音信号分析与合成中的应用[M]. 北京:北京航空航天大学出版社,2013. SONG Zhiyong. MATLAB application in speech signal analysis and synthesis[M]. Beijing: Beihang University Press, 2013.
[13] SAKOE H, CHIBA S. Dynamic programming algorithm optimization for spoken word recognition[J]. IEEE Transactions on Acoustics, Speech and Signal Process, 1978, 26(1):43-49.
[14] 孙宏伟.基于DTW距离的时间序列相似性查询和shapelets分类算法研究[D]. 太原:太原理工大学,2016. SUN Hongwei. Research of time series distance similarity and shapelets classification algorithm based on DTW[D]. Taiyuan: Taiyuan University of Technology, 2016.
[1] 张晓媛, 田毅, 任子涵, 段天宇, 杨斯媛, 张月轩. 拓扑邻域基在密度聚类算法中的应用[J]. 《山东大学学报(理学版)》, 2026, 61(5): 55-64.
[2] 孙迪,郭义童,任超,范海峰,张传雷. 基于多尺度特征融合与改进注意力的锈蚀螺栓螺帽检测[J]. 《山东大学学报(理学版)》, 2026, 61(1): 1-14.
[3] 仲尚,马丽,刘文哲,李雨豪. 融合多尺度注意力机制和改进特征融合的轻量化水面小目标检测模型[J]. 《山东大学学报(理学版)》, 2026, 61(1): 15-25.
[4] 余雷,孙懿,华金铭,李腊全. 基于深度神经网络的重症监护室脓毒症患者死亡风险预测模型分析[J]. 《山东大学学报(理学版)》, 2026, 61(1): 26-35.
[5] 王军涛,黄强. 基于一般重叠函数的模糊数学形态学边缘检测方法[J]. 《山东大学学报(理学版)》, 2026, 61(1): 36-48.
[6] 李文焱,李丽红,王洪欣. 基于知识度量的模糊粗糙c-均值算法[J]. 《山东大学学报(理学版)》, 2026, 61(1): 49-64.
[7] 孙清,叶军,曾广财,宋苏洋,汪一心. 结合蝙蝠算法和紧密度改进的三支K-means算法[J]. 《山东大学学报(理学版)》, 2026, 61(1): 65-75.
[8] 邹峥,雷雨晟,刘石坚,王定一,邱学炜,史雯雯,周校通. 白蚁分区式微方向感知的精确形态识别[J]. 《山东大学学报(理学版)》, 2026, 61(1): 76-84.
[9] 梁霞,郭洁. 基于在线评论的线上教学平台选择方法[J]. 《山东大学学报(理学版)》, 2024, 59(9): 108-118.
[10] 黎超,廖薇. 基于医疗知识驱动的中文疾病文本分类模型[J]. 《山东大学学报(理学版)》, 2024, 59(7): 122-130.
[11] 纪杰,孙承杰,单丽莉,尚伯乐,林磊. 基于提示学习的电信网络诈骗案件分类方法[J]. 《山东大学学报(理学版)》, 2024, 59(7): 113-121.
[12] 罗奇,苟刚. 基于聚类和群组归一化的多模态对话情绪识别[J]. 《山东大学学报(理学版)》, 2024, 59(7): 105-112.
[13] 赵峰叙,王健,林原,林鸿飞. 面向排序学习的概率分布优化模型[J]. 《山东大学学报(理学版)》, 2024, 59(7): 95-104.
[14] 黄兴宇,赵明宇,吕子钰. 面向图神经网络表征学习的类别知识探针[J]. 《山东大学学报(理学版)》, 2024, 59(7): 85-94.
[15] 桂梁,徐遥,何世柱,张元哲,刘康,赵军. 基于动态邻居选择的知识图谱事实错误检测方法[J]. 《山东大学学报(理学版)》, 2024, 59(7): 76-84.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!