您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(理学版)》

山东大学学报(理学版) ›› 2017, Vol. 52 ›› Issue (11): 29-36.doi: 10.6040/j.issn.1671-9352.0.2017.253

• • 上一篇    下一篇

基于DTW的俄语短指令语音识别

王彤,马延周,易绵竹*   

  1. 中国人民解放军外国语学院语言工程系, 河南 洛阳 471000
  • 收稿日期:2017-05-20 出版日期:2017-11-20 发布日期:2017-11-17
  • 通讯作者: 易绵竹(1964— ),男,博士,教授,研究方向为计算语言学. E-mail:13373781261@163.com E-mail:463906155@qq.com
  • 作者简介:王彤(1993— ),女,硕士研究生,研究方向为语言信息处理. E-mail:463906155@qq.com
  • 基金资助:
    国家自然科学基金重大项目(11590771)

Speech recognition of Russian short instructions based on DTW

WANG Tong, MA Yan-zhou, YI Mian-zhu*   

  1. Language Engineering Department, PLA University of Foreign Languages, Luoyang 471000, Henan, China
  • Received:2017-05-20 Online:2017-11-20 Published:2017-11-17

摘要: 面向训练语料有限的语音识别任务,基于动态时间规整(dynamic time warping, DTW)算法对俄语语音进行识别。首先,以跨语言标注的语音语料为资源基础,研究融合音字转换和机器翻译的语音识别方法。其次,结合俄语语音特点,以元音为中心设置动态门限阈值,实现精确至音节的端点检测,识别速度提高了34.4%,准确率提高了14%。然后,综合时域、频域分析,提取反映语音静态特征和动态变化的参数模板。另外,引入全局限制和早弃策略改进DTW算法,避免病态匹配,缩小计算规模,使速度提高了19.7%,准确率提高了4.8%。在俄语短指令语音集上做五折交叉验证,识别准确率达到74.9%。

关键词: 端点检测, 俄语语音识别, 跨语言语音识别, DTW算法

Abstract: Focus on speech recognition task with limited training corpus, this paper makes research of Russian speech recognition based on DTW(dynamic time warping)algorithm. Firstly, we study methods for combining speech recognition and machine translation with the speech corpus which annotating tags of cross language text. Secondly, based on the characteristics of Russian speech, in order to detected syllable endpoint, we set dynamic threshold according to the central vowel, which increased the speed by 34.4% and increased the accuracy by 14%. Finally, we extract the parameters of the static and dynamic characteristics by analyzing speech features of time domain and frequency domain. In addition, the DTW algorithm is improved to overcome the ill condition and reduce the computation scale with global restrictions and early discard strategies, which increased the speed by 4.8% and increased the accuracy by 19.7%. Experiments on the Russian short instruction set with 5 fold cross validation, and the accuracy of speech recognition reached 74.9%.

Key words: Russian speech recognition, endpoint detection, DTW algorithm, cross language speech recognition

中图分类号: 

  • TP391
[1] 张雪英.数字语音处理及Matlab仿真[M]. 北京:电子工业出版社,2016. ZHANG Xueying. Processing of digital speech and Matlab simulation[M]. Beijing: Electronics Industry Press, 2016.
[2] 韩纪庆,张磊,郑铁然,等.语音信号处理[M]. 北京:清华大学出版社,2013. HAN Jiqing, ZHANG Lei, ZHENG Tieran, et al. Processing of speech signal[M]. Beijing:Tsinghua University Press, 2013.
[3] 于俊婷,刘伍颖,易绵竹,等.国内语音识别研究综述[J]. 计算机光盘软件与应用,2014(10):76-78. YU Junting, LIU Wuying, YI Mianzhu, et al. Review of speech recognition in China[M].Computer CD Software and Applications, 2014(10):76-78.
[4] 徐来娣.俄语音节理论研究与俄语音节切分优化方案[J]. 中国俄语教学,2009, 28(4):69-72. XU Laidi. Russian syllable theory study andoptimization of Russian syllable segmentation[J]. Teaching Russian in China, 2009, 28(4):69-72.
[5] 徐来娣.俄汉语流重音声学实验对比研究[J]. 中国俄语教学,2016, 35(2):75-81. XU Laidi. A contrastive study of Russian and Chinese acoustics stream stress[J]. Teaching Russian In China, 2016, 35(2):75-81.
[6] 赵芳丽.中国人说俄语声学特征的实验分析及训练对策[J]. 中国俄语教学,2011, 30(3):76-79. ZHAO Fangli. Experimental analysis and training strategies of Russian acoustic features by Chinese[J]. Teaching Russian In China, 2011, 30(3):76-79.
[7] ZHAO Fangli. Russian pronunciation analysis utilized praat software[J]. Computer Engineering and Applications, 2012, 48(11):133-136.
[8] 马延周.基于标注新闻语料的俄语连续语音识别研究[D]. 郑州:解放军外国语学院, 2015. MAYanzhou. A study of Russian continuous speech recognition based on tagged news corpus[D]. Zhengzhou: PLA University of Foreign Language, 2015.
[9] 胡航.语音信号处理[M]. 哈尔滨:哈尔滨工业大学出版社,2009. HU Hang. Processing of speech signal[M]. Harbin: Harbin Institute of Technology Press, 2009.
[10] YOO IC,YOOK D. Robust voice activity detection using the spectral peaks of vowel sounds[J]. ETRI Journal, 2009, 31(4):451-453.
[11] ORTIZ P D, VILLA LF, SALAZAR C, et al. A simple but efficient voice activity detection algorithm through Hilbert transform and dynamic threshold for speech pathologies[J]. Journal of Physics: Conference Series, 2016, 705(1):012037.
[12] 宋知用.MATLAB在语音信号分析与合成中的应用[M]. 北京:北京航空航天大学出版社,2013. SONG Zhiyong. MATLAB application in speech signal analysis and synthesis[M]. Beijing: Beihang University Press, 2013.
[13] SAKOE H, CHIBA S. Dynamic programming algorithm optimization for spoken word recognition[J]. IEEE Transactions on Acoustics, Speech and Signal Process, 1978, 26(1):43-49.
[14] 孙宏伟.基于DTW距离的时间序列相似性查询和shapelets分类算法研究[D]. 太原:太原理工大学,2016. SUN Hongwei. Research of time series distance similarity and shapelets classification algorithm based on DTW[D]. Taiyuan: Taiyuan University of Technology, 2016.
[1] 龚双双,陈钰枫,徐金安,张玉洁. 基于网络文本的汉语多词表达抽取方法[J]. 山东大学学报(理学版), 2018, 53(9): 40-48.
[2] 余传明,左宇恒,郭亚静,安璐. 基于复合主题演化模型的作者研究兴趣动态发现[J]. 山东大学学报(理学版), 2018, 53(9): 23-34.
[3] 严倩,王礼敏,李寿山,周国栋. 结合新闻和评论文本的读者情绪分类方法[J]. 山东大学学报(理学版), 2018, 53(9): 35-39.
[4] 原伟,唐亮,易绵竹. 基于本体的俄文新闻话题检测设计与实现[J]. 山东大学学报(理学版), 2018, 53(9): 49-54.
[5] 廖祥文,张凌鹰,魏晶晶,桂林,程学旗,陈国龙. 融合时间特征的社交媒介用户影响力分析[J]. 山东大学学报(理学版), 2018, 53(3): 1-12.
[6] 余传明,冯博琳,田鑫,安璐. 基于深度表示学习的多语言文本情感分析[J]. 山东大学学报(理学版), 2018, 53(3): 13-23.
[7] 张军,李竞飞,张瑞,阮兴茂,张烁. 基于网络有效阻抗的社区发现算法[J]. 山东大学学报(理学版), 2018, 53(3): 24-29.
[8] 庞博,刘远超. 融合pointwise及深度学习方法的篇章排序[J]. 山东大学学报(理学版), 2018, 53(3): 30-35.
[9] 陈鑫,薛云,卢昕,李万理,赵洪雅,胡晓晖. 基于保序子矩阵和频繁序列模式挖掘的文本情感特征提取方法[J]. 山东大学学报(理学版), 2018, 53(3): 36-45.
[10] 张晓东,董唯光,汤旻安,郭俊锋,梁金平. 压缩感知中基于广义Jaccard系数的gOMP重构算法[J]. 山东大学学报(理学版), 2017, 52(11): 23-28.
[11] 孙建东,顾秀森,李彦,徐蔚然. 基于COAE2016数据集的中文实体关系抽取算法研究[J]. 山东大学学报(理学版), 2017, 52(9): 7-12.
[12] 王凯,洪宇,邱盈盈,王剑,姚建民,周国栋. 一种查询意图边界检测方法研究[J]. 山东大学学报(理学版), 2017, 52(9): 13-18.
[13] 张帆,罗成,刘奕群,张敏,马少平. 异质搜索环境下的用户偏好性预测方法研究[J]. 山东大学学报(理学版), 2017, 52(9): 26-34.
[14] 杨艳,徐冰,杨沐昀,赵晶晶. 一种基于联合深度学习模型的情感分类方法[J]. 山东大学学报(理学版), 2017, 52(9): 19-25.
[15] 原伟,易绵竹. 基于维基百科的俄汉可比语料库构建及可比度计算[J]. 山东大学学报(理学版), 2017, 52(9): 1-6.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!