Nhận dạng tự động tiếng nói phát âm liên tục cho các phương ngữ chính của tiếng việt theo phương thức phát âm

Thông tin tài liệu

BỘ GIÁO DỤC VÀ ĐÀO TẠO TRƯỜNG ĐẠI HỌC BÁCH KHOA HÀ NỘI Phạm Ngọc Hưng NHẬN DẠNG TỰ ĐỘNG TIẾNG NÓI PHÁT ÂM LIÊN TỤC CHO CÁC PHƯƠNG NGỮ CHÍNH CỦA TIẾNG VIỆT THEO PHƯƠNG THỨC PHÁT ÂM Chuyên ngành: Hệ thống thông tin Mã số: 62480104 LUẬN ÁN TIẾN SĨ HỆ THỐNG THÔNG TIN NGƯỜI HƯỚNG DẪN KHOA HỌC: PGS.TS Trịnh Văn Loan TS Nguyễn Hồng Quang Hà Nội - 2017 LỜI CAM ĐOAN Tôi xin cam đoan tất nội dung luận án “Nhận dạng tự động tiếng nói phát âm liên tục cho phương ngữ tiếng Việt theo phương thức phát âm” cơng trình nghiên cứu riêng Các số liệu, kết luận án trung thực chưa công bố cơng trình khác ngồi cơng trình công bố luận án Việc tham khảo nguồn tài liệu thực trích dẫn ghi nguồn tài liệu tham khảo quy định TM TẬP THỂ HƯỚNG DẪN KHOA HỌC TÁC GIẢ LUẬN ÁN PGS.TS Trịnh Văn Loan Phạm Ngọc Hưng LỜI CẢM ƠN Tơi xin bày tỏ lòng biết ơn tới Trường Đại học Bách khoa Hà Nội, Viện Công nghệ Thông tin Truyền thông, Bộ môn Kỹ thuật máy tính, Bộ mơn Hệ thống thơng tin tạo điều kiện thuận lợi cho tơi q trình học tập Trường Tôi muốn gửi lời cảm ơn đặc biệt tới tập thể hướng dẫn trực tiếp PGS.TS Trịnh Văn Loan TS Nguyễn Hồng Quang Hai Thầy ln tận tình giúp đỡ, đưa lời khun, định hướng khoa học quý báu để triển khai hồn thành cơng việc nghiên cứu Xin chân thành cảm ơn thầy cô, đồng nghiệp Bộ môn Hệ thống Thông tin, Bộ mơn Kỹ thuật máy tính, Viện Cơng nghệ Thông tin Truyền thông, Trường Đại học Bách khoa Hà Nội nơi học tập, thực đề tài nghiên cứu nhiệt tình giúp đỡ động viên tơi suốt q trình nghiên cứu Cảm ơn Khoa Công nghệ Thông tin Trường Đại học Sư phạm Kỹ thuật Hưng Yên, nơi công tác ln tạo điều kiện cho tơi suốt q trình nghiên cứu hoàn thành luận án Với lòng biết ơn đến thầy cơ, nhà khoa học, đồng nghiệp bạn bè thân hữu động viên giúp đỡ tơi q trình nghiên cứu Cuối tơi muốn bày tỏ lòng biết ơn sâu sắc tới gia đình, nơi ni dưỡng nguồn động lực để tơi vượt trở ngại khó khăn để hoàn thành luận án Phạm Ngọc Hưng MỤC LỤC MỤC LỤC GIẢI THÍCH CÁC KÝ HIỆU VÀ CHỮ VIẾT TẮT DANH MỤC CÁC BẢNG 10 DANH MỤC CÁC HÌNH ẢNH, ĐỒ THỊ 12 MỞ ĐẦU 14 TỔNG QUAN VỀ NHẬN DẠNG TIẾNG NÓI VÀ NHẬN DẠNG PHƯƠNG NGỮ 19 1.1 Nhận dạng tiếng nói 19 1.1.1 Tổng quan nhận dạng tiếng nói 19 1.1.2 Lịch sử phát triển tiến nghiên cứu nhận dạng tiếng nói 20 1.1.3 Các thách thức nhận dạng tự động tiếng nói 21 1.1.4 Phân loại hệ thống nhận dạng tự động tiếng nói 22 1.2 Nhận dạng phương ngữ 24 1.2.1 Các mơ hình nhận dạng phương ngữ 24 1.2.2 Nhận dạng phương ngữ theo phương diện khác 25 1.3 Nghiên cứu nhận dạng tiếng nói nhận dạng phương ngữ tiếng Việt 30 1.4 Một số mơ hình nhận dạng 31 1.4.1 Mơ hình GMM 31 1.4.2 Bộ phân lớp SVM 33 1.4.3 Mạng nơ ron nhận tạo 39 1.5 Kết chương 43 XÂY DỰNG BỘ NGỮ LIỆU CHO NGHIÊN CỨU NHẬN DẠNG PHƯƠNG NGỮ TIẾNG VIỆT 45 2.1 Tổng quan phương ngữ tiếng Việt 45 2.1.1 Phương ngữ phân vùng phương ngữ tiếng Việt 45 2.1.2 Đặc điểm ngữ âm ba vùng phương ngữ tiếng Việt 46 2.1.3 Sự khác biệt từ vựng ngữ nghĩa ba vùng phương ngữ tiếng Việt 47 2.2 Cấu trúc âm tiết, âm vị phương ngữ tiếng Việt 49 2.2.1 Âm tiết âm vị tiếng Việt 49 2.2.2 Âm đệm cách kết hợp âm đệm phương ngữ 53 2.3 Phụ âm đầu phương ngữ tiếng Việt 54 2.3.1 Hệ thống phụ âm đầu 54 2.3.2 So sánh hệ thống phụ âm đầu ba phương ngữ Bắc-Trung-Nam 56 2.4 Hệ thống điệu biến thể phương ngữ tiếng Việt 56 2.4.1 Hệ thống điệu Hà Nội 57 2.4.2 Hệ thống điệu Nghệ - Tĩnh Huế 57 2.4.3 Hệ thống điệu Đà Nẵng Thành phố Hồ Chí Minh 58 2.4.4 Một số nhận xét hệ thống điệu phương ngữ 59 2.5 Ảnh hưởng phương ngữ tới nhận dạng tiếng nói 60 2.6 Ngữ liệu phương ngữ giới xây dựng ngữ liệu dùng cho nhận dạng phương ngữ tiếng Việt 60 2.6.1 Phương pháp xây dựng ngữ liệu phương ngữ tiếng Việt 62 2.6.2 Chuẩn bị chuẩn hóa văn 62 2.6.3 Ghi âm 64 2.6.4 Kết ghi âm đặc tính VDSPEC 67 2.7 2.7.1 Biến thiên tần số F0 theo điệu ba phương ngữ 68 2.7.2 Phân tích thống kê phân bố F0 điệu 70 2.7.3 Phân tích liệu dùng LDA 72 2.8 Phân tích số đặc trưng phương ngữ tiếng Việt ngữ liệu VDSPEC 68 Kết chương 76 NHẬN DẠNG PHƯƠNG NGỮ TIẾNG VIỆT 78 3.1 Nhận dạng phương ngữ tiếng Việt với GMM 78 3.1.1 Công cụ thử nghiệm nhận dạng phương ngữ ALIZE 78 3.1.2 Lựa chọn số lượng hệ số MFCC 80 3.1.3 Thử nghiệm nhận dạng phương ngữ tiếng Việt trường hợp kết hợp MFCC với tham số F0 81 3.1.4 Thử nghiệm nhận dạng phương ngữ tiếng Việt trường hợp kết hợp formant, dải thông tương ứng tham số F0 84 3.1.5 Ảnh hưởng số lượng thành phần Gauss hiệu nhận dạng phương ngữ tiếng Việt 85 3.2 SVM nhận dạng phương ngữ tiếng Việt 87 3.2.1 Bộ phân lớp SMO 87 3.2.2 3.3 Thử nghiệm nhận dạng phương ngữ tiếng Việt sử dụng SMO 88 lBk nhận dạng phương ngữ tiếng Việt 92 3.3.1 Bộ phân lớp IBk 92 3.3.2 Kết nhận dạng phương ngữ tiếng Việt sử dụng IBk 95 3.4 Nhận dạng phương ngữ tiếng Việt với phân lớp MultilayerPerceptron 96 3.4.1 Bộ phân lớp MultilayerPerceptron Weka 96 3.4.2 MultilayerPerceptron nhận dạng phương ngữ tiếng Việt 96 3.5 JRip nhận dạng phương ngữ tiếng Việt 97 3.5.1 Bộ phân lớp JRip 97 3.5.2 Nhận dạng phương ngữ tiếng Việt với JRip 98 3.6 Nhận dạng phương ngữ tiếng Việt với PART 99 3.6.1 Bộ phân lớp PART 99 3.6.2 Kết dùng PART nhận dạng phương ngữ tiếng Việt 99 3.7 Kết chương 99 CẢI THIỆN HIỆU NĂNG NHẬN DẠNG TIẾNG VIỆT VỚI THÔNG TIN VỀ PHƯƠNG NGỮ 102 4.1 HMM nhận dạng tiếng Việt nói 102 4.1.1 Mơ hình HMM 102 4.1.2 HMM nhận dạng tiếng Việt nói theo ba phương ngữ 115 4.2 ngữ Cải thiện hiệu nhận dạng tiếng Việt nói thơng qua sử dụng thơng tin phương 120 4.2.1 Mơ hình nhận dạng tiếng Việt nói với việc sử dụng thơng tin phương ngữ 120 4.2.2 Nhận dạng tiếng Việt nói có thơng tin phương ngữ 122 4.3 Kết chương 123 KẾT LUẬN VÀ KIẾN NGHỊ 125 TÀI LIỆU THAM KHẢO 128 DANH MỤC CÁC CƠNG TRÌNH ĐÃ CƠNG BỐ CỦA LUẬN ÁN 140 GIẢI THÍCH CÁC KÝ HIỆU VÀ CHỮ VIẾT TẮT Chữ viết tắt Chữ viết đầy đủ AANN Auto-Associative Neural Network AM Acoustic Model Mơ hình âm học ANN Artificial Neural Network Mạng nơ-ron nhân tạo ARFF Attribute-Relation File Format Định dạng file tham số đặc trưng Weka ASR Automatic Speech Recognition Tự động nhận dạng tiếng nói BKSPEC Bach Khoa SPEech Corpus Bộ ngữ liệu phát triển Bộ môn Kỹ thuật Máy tính – Viện Cơng nghệ Thơng tin Truyền thông – Đại học Bách khoa Hà Nội BKTC Bach Khoa Text Code BMMI Boosted Maximum Mutual Information Thông tin tương hỗ cực đại tăng cường CD Concept Description Mô tả khái niệm CFG Context-Free Grammar Ngữ pháp phi ngữ cảnh CMS Cepstral Mean Subtraction Trừ trung bình Cepstral CMU SLM Carnegie Mellon University Statistical Language Modeling Toolkit Bộ cơng cụ mơ hình hóa ngơn ngữ thống kê trường Carnegie Mellon DCF Detection Cost Function Hàm giá phát DET Detection Error Tradeoff Cân sai số phát DL Descrition Length Độ dài mô tả DNN Deep Neural Networks Mạng nơ-ron sâu ELRA European Language Resources Association Hội tài nguyên ngôn ngữ châu Âu EM Expectation Maximization Cực đại hóa kỳ vọng ERM Empirical Risk Minimization Tối thiểu hóa rủi ro theo kinh nghiệm F0 Fundamental frequency Tần số fMLLR feature-space MLLR fMMI feature-space MMI fMPE feature-space Minimum Phone Error FST Finite-State-Transducer Chuyển trạng thái hữu hạn GMM Gaussian Mixture Model Mơ hình hỗn hợp Gauss Giải thích HLDA Heteroscedastic Linear Discriminant Analysis Phân tích phân biệt tuyến tính hiệp phương sai khơng đồng HMM Hidden Markov Model Mơ hình Markov ẩn HTK Hidden Markov Model Toolkit Bộ cơng cụ mơ hình Markov ẩn IBk Instance Based k Tên gọi phân lớp k láng giềng gần Weka IBL Instance Based Learning Học dựa đối tượng IREP Incremental Reduced Error Pruning JRip KKT Karush–Kuhn–Tucker Điều kiện Karush–Kuhn–Tucker k-NN k-Nearest Neighbour K láng giềng gần LDA Linear Discriminant Analysis Phân tích phân biệt tuyến tính LDC Linguistic Data Consortium Hội đồn liệu ngơn ngữ LLR Log Likelihood Ratio LPC Linear Prediction Coding MAP Maximum a Posteriori MFCC Mel Frequency Cepstral Coefficients MHAH Mơ hình âm học MHNN Mơ hình ngôn ngữ MLLR Maximum Likelihood Linear Regression MLLT Maximum Likelihood Linear Transforms MMI Maximum Mutual Information MPE Minimum Phone Error Cực tiểu hóa lỗi âm NIST National Institute of Standards and Technology Viện Tiêu chuẩn Công nghệ Quốc gia Mỹ NLP Natural Language Processing Xử lý ngôn ngữ tự nhiên NN Neural Networks Mã hóa tiên đốn tuyến tính Các hệ số Cepstral theo thang đo tần số Mel PART PCA Principal Component Analysis Phân tích thành phần PLP Perceptual Linear Prediction Tiên đốn cảm thụ tuyến tính PNB Phương ngữ Bắc PNN Phương ngữ nam PNT Phương ngữ Trung PPR Parallel Phone Recognition PPRLM Parallel Phone Recognition followed by Language Modeling PRLM Phone Recognition followed by Language Modeling QP Quadratic Programming Quy hoạch toàn phương RBF Radial Basis Function Hàm hướng Gauss RIPPER Repeated Incremental Pruning to Produce Error Reduction RM Risk Minimization Tối thiểu hóa rủi ro SAT Speaker Adaptive Training Huấn luyện thích nghi người nói SBS Sequential Backward Selection Lựa chọn lùi SFS Sequential Forward Selection Lựa chọn tiến SMO Sequential Minimal Optimization Thuật giải tối ưu hóa cực tiểu SRILM Stanford Research Institute Language Modeling Bộ cơng cụ tạo mơ hình ngơn ngữ SRI SRM Structural Risk Minimization Tối thiểu hóa rủi ro cấu trúc SVM Support Vector Machines Máy véc-tơ hỗ trợ TTS Text-to-Speech Văn thành tiếng nói VDSPEC Vietnamese Dialect Speech Corpus Bộ ngữ liệu phương ngữ tiếng Việt VTLN Vocal Tract Length Normalization Chuẩn hóa chiều dài tuyến âm WER Word Error Rate Tỷ lệ lỗi từ DANH MỤC CÁC BẢNG Bảng 2.1: Sự khác biệt phương ngữ từ cách sử dụng từ 48 Bảng 2.2: Cấu trúc âm tiết tiếng Việt 50 Bảng 2.3: Hệ thống phụ âm làm âm đầu 51 Bảng 2.4: Bảng âm nguyên âm đơn 52 Bảng 2.5: Cách thể chữ viết nguyên âm 52 Bảng 2.6: Vị trí âm vị hệ thống âm cuối 53 Bảng 2.7: Hệ thống phụ âm đầu Bắc Bộ 55 Bảng 2.8: So sánh hệ thống phụ âm đầu PNB, PNT PNN 56 Bảng 2.9: Phân loại điệu theo truyền thống 56 Bảng 2.10: Đặc tính văn theo chủ đề 63 Bảng 2.11: Tổ chức lưu liệu ngữ liệu VDSPEC 67 Bảng 2.12: Thống kê thời lượng ghi âm VDSPEC theo phương ngữ 67 Bảng 2.13: Thống kê thời lượng ghi âm VDSPEC theo chủ đề 68 Bảng 2.14: Ngữ cảnh chọn từ khảo sát điệu 69 Bảng 3.1: Kết nhận dạng dùng GMM với tham số MFCC, F0 giá trị chuẩn hóa từ F0 83 Bảng 3.2: Ma trận nhầm lẫn nhận dạng phương ngữ khơng phụ thuộc giới tính sử dụng hệ số MFCC kết hợp với tham số F0 83 Bảng 3.3: Kết thử nghiệm nhận dạng phương ngữ tiếng Việt trường hợp kết hợp formant, dải thông tương ứng tham số F0 85 Bảng 3.4: Tỷ lệ nhận dạng trung bình với số lượng thành phần Gauss khác 87 Bảng 3.5: Bộ phân lớp SMO, kết nhận dạng với 384 tham số 89 Bảng 3.6: Bộ phân lớp SMO, ma trâ ̣n sai nhầ m với 384 tham số 90 Bảng 3.7: Bộ phân lớp SMO, kết nhận dạng khơng có thơng tin liên quan trực tiếp F0 90 Bảng 3.8: Bộ phân lớp SMO, ma trâ ̣n sai nhầ m không có thông tin liên quan trực tiếp F0 90 Bảng 3.9: Bộ phân lớp SMO, kết thử nghiệm dùng tham số liên quan trực tiếp F0 91 Bảng 3.10: Bộ phân lớp SMO, ma trâ ̣n sai nhầ m chỉ sử du ̣ng tham số liên quan trực tiếp F0 91 Bảng 3.11: Bộ phân lớp SMO, kết nhận dạng dùng tham số liên quan trực tiếp MFCC 91 Bảng 3.12: Bộ phân lớp SMO, ma trâ ̣n sai nhầ m dùng tham số liên quan trực tiếp MFCC 91 Bảng 3.13: Thuật giải IBl, CD – Concept Description [8] 93 10 phương ngữ trước nhận dạng nội dung tiếng nói Nghiên cứu tiến hành theo hai trường hợp Trường hợp thứ nhất: nhận dạng nội dung tiếng Việt nói ngữ liệu có phương ngữ khơng dùng thông tin phương ngữ Trường hợp thứ hai: nhận dạng nội dung tiếng Việt nói ngữ liệu có phương ngữ sau có thơng tin phương ngữ Kết nghiên cứu cho thấy, trường hợp nhận dạng có thơng tin phương ngữ, tỷ lệ lỗi từ tương đối giảm 27,9%, tương đương với độ xác nhận dạng tăng lên cách đáng kể Việc xác định thông tin phương ngữ giúp nâng cao độ xác nhận dạng nội dung Đây lần mơ hình HMM sử dụng nhận dạng tự động tiếng Việt nói ngữ liệu có phương ngữ Tổng hợp kết nghiên cứu, luận án đề xuất mơ hình mạnh để nhận dạng tiếng Việt nói ngơn ngữ có phương ngữ đa dạng cần tiến hành nhận dạng phương ngữ trước nhận dạng nội dung nhằm thực nâng cao hiệu cho hệ thống nhận dạng tiếng Việt nói Những hạn chế: Luận án nghiên cứu cho phương ngữ phổ biến Bắc, Trung, Nam Mỗi phương ngữ đại diện giọng tương ứng Hà Nội, Huế Thành phố Hồ Chí Minh Luận án giới hạn chưa nghiên cứu, xử lý nhận dạng thời gian thực Chưa triển khai tổng thể từ nhận dạng phương ngữ đến nhận dạng nội dung tiếng Việt nói Định hướng phát triển: Từ kết nghiên cứu thực hiện, hạn chế, luận án đề xuất kiến nghị sau nhằm mở rộng hướng nghiên cứu có: Bổ sung phương ngữ khác tiếng Việt vào ngữ liệu VDSPEC Nghiên cứu đặc trưng theo phương thức phát âm phương ngữ tiếng Việt bổ sung Phát triển theo hướng xử lý thời gian thực Xây dựng mơ hình nhận dạng tiếng Việt theo hướng ngày hoàn thiện nhằm phù hợp với tính đa dạng phương ngữ tiếng Việt Cài đặt hoàn thiện hệ thống từ nhận dạng phương ngữ đến nhận dạng nội dung tiếng Việt nói 127 TÀI LIỆU THAM KHẢO TIẾNG VIỆT [1] Hồng Phê (1963) Một số ý kiến vấn đề thống tiêu chuẩn hóa tiếng Việt Văn học số [2] Hoàng Thị Châu (2009) Phương ngữ học tiếng Việt NXB Đại học Quốc gia Hà Nội [3] Mai Ngọc Chừ, Vũ Đức Nghiệu, Hoàng Trọng Phiến (2008) Cơ sở ngôn ngữ học tiếng Việt NXB Giáo Dục [4] Nguyễn Hồng Quang, Trịnh Văn Loan (2004) Nhận dạng tiếng nói tiếng Việt phát âm liên tục Kỷ yếu Hội thảo khoa học Quốc gia lần thứ hai nghiên cứu, phát triển ứng dụng Công nghệ Thông tin truyền thông ICT.rda, Hà Nội, pp 243250 [5] Nguyễn Kim Thản, Nguyễn Trọng Báu, Nguyễn Văn Tu (2002) Tiếng Việt đường phát triển NXB Khoa học Xã hội TIẾNG ANH [6] A Stolcke, E Brill, and M Weintraub (1997) Explicit word error minimization in N -Best list rescoring Proceedings of EuroSpeech, vol 97, pp 163-166 [7] Adena (2010) Theatre Supplies and Services [Online] http://adena.co.nz/theatre/products/sound/microphones-wired/shure/smseries/shure-sm48.htm [8] Aha, D W., Kibler, D., & Albert, M K (1991) Instance-Based Learning Algorithms Machine learning, vol 6, no 1, pp 37-66 [9] Anastasakos, T., J McDonough, and J Makhoul (1997) Speaker adaptive training: A maximum likelihood approach to speaker normalization Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Munich, Germany, pp 1043–1046 [10] Aubert, X., & Ney, H (1995) Large vocabulary continuous speech recognition using word graphs In Acoustics, Speech, and Signal Processing, pp 49-52 [11] B T Lowerre (1976) The Harpy Speech Recognition System Carnegie Mellon [12] Baker, J (1975) Stochastic modeling for automatic speech recognition, in D R Reddy Speech Recognition New York: Academic Press [13] Baker, J (1975) The DRAGON system - An overview IEEE Transactions on Acoustics, Speech, and Signal Processing, vol 23, no 1, pp 24-29 [14] Baum, L (1972) An inequality and associated maximization technique occurring in statistical estimation for probabilistic functions of a Markov process.: Inequalities, ch III, pp 1-8 [15] Baum, L E., & Eagon, J A (1967) An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for 128 ecology Bulletin of American Mathematical Society, vol 73, pp 360–363 [16] Becker, Timo, Michael Jessen, and Catalin Grigoras (2008) Forensic speaker verification using formant features and Gaussian mixture models Interspeech, pp 1505-1508 [17] Bernd Kortmann (2005) A comparative grammar of British English dialects: agreement, gender, relative Walter de Gruyter, vol [18] Biadsy, F (2011) Automatic dialect and accent recognition and its application to speech recognition (Doctoral dissertation, Columbia University) [19] Biadsy, Fadi, Julia Hirschberg, and Daniel PW Ellis (2011) Dialect and Accent Recognition Using Phonetic-Segmentation Supervectors INTERSPEECH, pp 745748 [20] Biadsy, Fadi, Julia Hirschberg, and Nizar Habash (2009) Spoken Arabic dialect identification using phonotactic modeling Proceedings of the eacl 2009 workshop on computational approaches to semitic languages, pp 53-61 [21] Boser, Bernhard E., Isabelle M Guyon, and Vladimir N Vapnik (1992) A training algorithm for optimal margin classifiers Proceedings of the fifth annual workshop on Computational learning theory, ACM, pp 144-152 [22] Bouckaert, Remco R., Eibe Frank, Mark Hall, Richard Kirkby, Peter Reutemann, Alex Seewald, and David Scuse (2016) WEKA Manual for Version 3-8-0 Hamilton, New Zealand [23] Brian Hayes (2013) First Links in the Markov Chain American Scientist, vol 101, pp 92-97 [24] C Gaida, P Lange, R Petrick, P Proba, A Malatawy and a D Suendermann-Oeft (2014) Comparing Open-Source Speech Recognition Toolkits Organisation of Alberta Students in Speech, Alberta [25] Campbell, W M., Singer, E., Torres-Carrasquillo, P A., and Reynolds, D A (2004) Language Recognition with Support Vector Machines Odyssey: The Speaker and Language Recognition Workshop, Toledo, Spain, ISCA, pp 41-44 [26] Carlson, Rolf, Gunnar Fant, and Björn Granström (1974) Two-formant models, pitch, and vowel perception Acta Acustica united with Acustica, vol 31, no 6, pp 360-362 [27] Chelba, Ciprian, and Frederick Jelinek (2000) Structured language modeling Computer Speech and Language., ch 14, pp 283–332 [28] Chen, N F., Shen, W., Campbell, J P., & Torres-Carrasquillo, P A (2011) Informative dialect recognition using context-dependent pronunciation modeling Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference, pp 4396-4399 [29] Chen, Too, Chao Huang, Eric Chang, and Jingehan Wang (2001) Automatic accent identification using Gaussian mixture models Automatic Speech Recognition and Understanding ASRU'01 IEEE Workshop, pp 343-346 [30] Cohen, William W (1995) Fast effective rule induction In Proceedings of the twelfth international conference on machine learning, pp 115-123 [31] Đặng Ngọc Đức, John-Paul Hosom Lương Chi Mai (2003) HMM/ANN System for Vietnamese Continuous Digit Recognition Proceeding IEA/AIE'2003 Proceedings of the 16th international conference on Developments in applied artificial intelligence, pp 481-486 [32] Daniel Povey (2003) "Minimum Phone Error - Better than MMI," talk given at 129 [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] IBM Daniel, Jurafsky, and H M James (2009) Speech and Language Processing - An Introduction to Natural Language Processing Computational Linguistics and Speech Recognition Davis, Lawrence M., and Charles L Houck (1992) Is there a Midland dialect area? - Again American speech, pp 61-70 Davis, S and P Mermelstein (1980) Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences IEEE Transactions on Acoustics, Speech, and Signal Processing, vol 28(4), pp 357–366 Dempster, A., N Laird, and D Rubin (1977) Maximum likelihood from incomplete data via the EM algorithm Journal of the Royal Statistical Society, vol 39, no 1, pp 1–21 Deng, L and D O’Shaughnessy (2003) Speech Processing - A Dynamic and Optimization-Oriented Approach New York CRC Press Deng, L., M Aksmanovic, D Sun, and J Wu (1994) Speech recognition using hidden Markov models with polynomial regression functions as nonstationary states IEEE Transactions on Speech and Audio Processing, vol 2(4), pp 507–520 Deng, Li (1993) A stochastic model of speech incorporating hierarchical nonstationarity IEEE Transactions on Speech and Audio Processing, vol 1(4), pp 471–475 Deng, Li, Dong Yu, and Alex Acero (2006) Structured speech modeling IEEE Transactions on Audio, Speech, and Language Processing (Special Issue on Rich Transcription), vol 14(5), pp 1492–1504 Do Dat, TRAN., Castelli, E., Hung, L X., Serignat, J F., & Van Loan, TRINH (2006) Linear F0 contour model for Vietnamese tones and Vietnamese syllable synthesis with TD-PSOLA In Second International Symposium on Tonal Aspects of Languages, pp 115-119 Eide, Ellen, and Herbert Gish (1996) A parametric approach to vocal tract length normalization Proceedings of the International Conference on Acoustics, Speech, and Signal Processing IEEE, Atlanta, GA, pp 346–349 Evermann, G., & Woodland, P C (2000) Large vocabulary decoding and confidence estimation using word posterior probabilities In Acoustics, Speech, and Signal Processing, 2000 IEEE International Conference, vol 3, pp 1655-1658 Evermann, G., & Woodland, P C (2000) Posterior probability decoding, confidence estimation and system combination In Proc Speech Transcription , vol 27, p 78 Evermann, G., Chan, H Y., Gales, M J., Hain, T., Liu, X., Mrva, D., & Woodland, P C (2004) Development of the 2003 CU-HTK conversational telephone speech transcription system In Acoustics, Speech, and Signal Processing, 2004 Proceedings ICASSP'04) IEEE International Conference, vol 1, pp I-249 Eyben, Florian, Martin Wöllmer, and Björn Schuller (2010) Opensmile: the munich versatile and fast open-source audio feature extractor Proceedings of the 18th ACM international conference on Multimedia, pp 1459-1462 F Jelinek (1985) A discrete utterance recogniser Proceedings of IEEE, vol 73, no 11, pp 1616-1624 F Jelinek (1997) Statistical Methods for Speech Recognition Cambridge: MIT Press 130 [49] F P´erez-Cruz and O Bousquet (2004) Kernel Methods and Their Potential Use in Signal Processing IEEE Signal Processing Magazine, vol 21, no 3, pp 57–65 [50] Fadi Biadsy, Julia Hirschberg (2009) Using Prosody and Phonotactics in Arabic Dialect Identification Interspeech, vol 1, pp 208-211 [51] Faria, Arlo (2005) Accent classification for speech recognition In International Workshop on Machine Learning for Multimodal Interaction, pp 285-293 [52] Fiscus, J G (1997) A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER) In Automatic Speech Recognition and Understanding, IEEE Workshop, pp 347-354 [53] Fletcher, Roger (2013) Practical methods of optimization John Wiley & Sons [54] Fox, Robert Allen, and Ewa Jacewicz (2009) Cross-dialectal variation in formant dynamics of American English vowels The Journal of the Acoustical Society of America, vol 126, no 5, pp 2603-2618 [55] Frederick Jelinek (1997) Statistical Methods for Speech Recognition MIT Press, Cambridge, MA [56] Fridland, V., Kendall, T., & Farrington, C (2014) Durational and spectral differences in American English vowels: Dialect variation within and across regions The Journal of the Acoustical Society of America, vol 136, no 1, pp 341349 [57] Furui, Sadaoki (2001) Digital Speech Processing, Synthesis and Recognition, 2nd ed New York Marcel Dekker Inc [58] Garner, Philip N., and Wendy J Holmes (1998) On the robust incorporation of formant features into hidden Markov models for automatic speech recognition Acoustics, Speech and Signal Processing, 1998 Proceedings of the 1998 IEEE International Conference, vol 1, pp 1-4 [59] Gelfer, Marylou Pausewang, and Victoria A Mikos (2005) The relative contributions of speaking fundamental frequency and formant frequencies to gender identification based on isolated vowels Journal of Voice, vol 19, no 4, pp 544-554 [60] Glass, James R (2003) A probabilistic framework for segment-based speech recognition New Computational Paradigms for Acoustic Modeling in Speech Recognition, Computer, Speech and Language, vol 17, no (2–3), pp 137–152 [61] Godfrey, J J., Holliman, E C., & McDaniel, J (1992) SWITCHBOARD: Telephone speech corpus for research and development In Acoustics, Speech, and Signal Processing IEEE, vol 1, pp 517-520 [62] Goel, V., Kumar, S., & Byrne, W (2000) Segmental minimum Bayes-risk ASR voting strategies INTERSPEECH, pp 139-142 [63] Gold, B and N Morgan (2000) Speech and Audio Signal Processing New York John Wiley & Sons [64] Graves, A., Mohamed, A R., & Hinton, G (2013) Speech recognition with deep recurrent neural networks In Acoustics, speech and signal processing (icassp), IEEE International Conference, pp 6645-6649 [65] H Tang, and A A Ghorbani (2003) Accent classification using Support Vector Machine and Hidden Markov Models Proceedings 16th Canadian conference on Artificial Intelligence AI‘03, pp 629-631 [66] Hagiwara, Robert (1997) Dialect variation and formant frequency: The American English vowels revisited The Journal of the Acoustical Society of America, vol 102, no 1, pp 655-658 131 [67] Hakkani-Tür, D., Béchet, F., Riccardi, G., & Tur, G (2006) Beyond ASR 1-best: Using word confusion networks in spoken language understanding Computer Speech & Language, vol 20, no 4, pp 495-514 [68] Hanani, Abualsoud, Martin J Russell, and Michael J Carey (2013) Human and computer recognition of regional accents and ethnic groups from British English speech Computer Speech & Language, vol 27, no 1, pp 59-74 [69] Haykin, Simon S (2001) Neural networks: a comprehensive foundation, 2nd ed Tsinghua University Press [70] Hermansky, H (1990) Perceptual linear predictive analysis of speech Journal of the Acoustical Society of America, vol 87(4), pp 1738–1752 [71] Hillenbrand, J., Getty, L A., Clark, M J., & Wheeler, K (1995) Acoustic characteristics of American English vowels The Journal of the Acoustical society of America, vol 97, no 5, pp 3099-3111 [72] Hillenbrand, James M., and Michael J Clark (2009) The role of f and formant frequencies in distinguishing the voices of men and women Attention, Perception, & Psychophysics, vol 71, no 5, pp 1150-1166 [73] Hirayama N., Yoshino K., Itoyama K., Mori S., Okuno, H.G (2015) Automatic Speech Recognition for Mixed Dialect Utterances by Mixing Dialect Language Models Audio, Speech, and Language Processing, IEEE/ACM Transactions, vol 23, no 2, pp 373 - 382 [74] Huang, X D and K.-F Lee (1993) On speaker-independent, speaker-dependent and speaker adaptive speech recognition IEEE Transactions on Speech and Audio Processing, vol 1(2), pp 150–157 [75] Huang, X D., A Acero, and H Hon (2001) Spoken Language Processing - A Guide to Theory, Algorithms, and System Development Prentice Hall, Upper Saddle River, NJ [76] J K BAKER (1974) Stochastic Modeling as a Means of Automatic Speech Recognition Ph D.dissertation, Carnegie-Mellon Univ [77] J K Chambers and P Trudgill (1998) Dialectology, chapter one, 2nd ed Cambridge University press [78] J Li, T F Zheng, W Byrne, and D Jurafsky (2006) A dialectal chinese speech recognition framework Journal of Computer Science and Technology, vol 21, no 1, pp 106-115 [79] Jacewicz, Ewa, and Robert Allen Fox (2015) The effects of dialect variation on speech intelligibility in a multitalker background Applied Psycholinguistics, vol 36, no 3, pp 729-746 [80] Jean-Franҫois Bonastre, Frédéric Wils (2005) ALIZE, A FREE TOOLKIT FOR SPEAKER RECOGNITION IEEE International Conference, pp I 737 - I 740 [81] Jean-Luc Rouas (2007) Automatic prosodic variations modelling for language and dialect discrimination IEEE Transactions on Audio, Speech and Language Processing, vol 15, no 6, pp 1904-1911 [82] Jelinek, F (1976) Continuous speech recognition by statistical methods Proceedings of the IEEE, vol 64(4), pp 532–557 [83] Jelinek, Frederick (1969) A fast sequential decoding algorithm using a stack IBM Journal of Research and Development, vol 13, no 6, pp 675–685 [84] JING, Y P., ZHENG, J., & HU, W X (2014) Belongingness of Chinese dialect speech recognition based on deep neural network Journal of East China Normal University (Natural Science), vol 1, p 008 132 [85] John C Platt (1998) Microsoft Research, jplatt@microsoft.com, Technical Report MSR-TR-98-14,April 21, 1998 [86] Juang, B H (1984) On the hidden Markov model and dynamic time warping for speech recognition - A unified view Bell Labs Technical Journal, vol 63, no 7, pp 1213-1243 [87] Juang, B H (1985) Maximum-likelihood estimation for mixture multivariate stochastic observations of Markov chains AT&T technical journal, vol 64, no 6, pp 1235-1249 [88] Juang, B H., Levinson, S., & Sondhi, M (1986) Maximum likelihood estimation for multivariate mixture observations of Markov chains (corresp.) IEEE Transactions on Information Theory, vol 32, no 2, pp 307-309 [89] Kingsbury, N G., & Rayner, P J (1971) Digital Filtering Using Logarithmic Arithmetic Electronics Letters, vol 7, no 2, pp 56-58 [90] Kumar, N and A Andreou (1998) Heteroscedastic analysis and reduced rank HMMs for improved speech recognition Speech Communication, vol 26(4), pp 283–297 [91] L Mangu, E Brill, and A Stolcke (2000) Finding consensus among words: Latticebased word error minimisation Computer Speech and Language, vol 14, no 4, pp 373–400 [92] L R Rabiner, B.-H Juang, S E Levinson, and M M Sondhi (1985) Recognition of isolated digits using HMMs with continuous mixture densities AT and T Technical Journal, vol 64, no 6, pp 1211-1233 [93] L.E Baum, T Petrie (1966) Statistical Inference for Probabilistic Functions of Finite State Markov Chains Annals of Math Statistics, vol 37, pp 1,554-1,563 [94] Lee, Chin-Hui, Frank K Soong, and Kuldip Paliwal, eds (2012) Automatic speech and speaker recognition: advanced topics Springer Science & Business Media, vol 355 [95] Lee, Kai-Fu (1988) Automatic Speech Recognition: The Development of the Sphinx Recognition System Berlin, Germany Springer Science & Business Media, vol 62 [96] Leggetter C and P Woodland (1995) Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models Computer Speech and Language., ch 9, pp 171–185 [97] Levinson, S E., Rabiner, L R., & Sondhi, M M (1983) An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition The Bell System Technical Journal, vol 62, no 4, pp 1035-1074 [98] Liu, Gang A., and John HL Hansen (2011) A systematic strategy for robust automatic dialect identification 19th European Signal Processing Conference, pp 138-2141 [99] Lopez-Moreno, I., Gonzalez-Dominguez, J., Plchot, O., Martinez, D., GonzalezRodriguez, J., & Moreno, P (2014) Automatic language identification using deep neural networks In Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference, pp 5337-5341 [100] Luo, X., & Jelinek, F (1999) Probabilistic classification of HMM states for large vocabulary continuous speech recognition In Acoustics, Speech, and Signal Processing, 1999 Proceedings., 1999 IEEE International Conference on, pp 353356 133 [101] Luong Chi Mai (2015) Activities on Speech and Machine Translation in Vietnam Institute of Information Technology Vietnam Academy of Science and Technology, pp (http://ieeexplore.ieee.org/ielx7/7911896/7918968/07919033.pdf?tp=&arnumber =7919033&isnumber=7918968) [102] M Gales and S Young (2007) The Application of Hidden Markov Models in Speech Recognition Foundations and Trends in Signal Processing, vol 1, no 3, pp 195-304 [103] Ma, Bin, Donglai Zhu, and Rong Tong (2006) Chinese Dialect Identification Using Tone Features Based On Pitch 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, vol 1, pp I-I [104] Mannepalli, Kasiprasad, P Nrahari Sastry, and V Rajesh (2015) Accent detection of Telugu speech using prosodic and formant features Signal Processing And Communication Engineering Systems (SPACES), 2015 International Conference on IEEE, pp 318-322 [105] Martin, Alvin, et al (1997) The DET curve in assessment of detection task performance National Inst Of Standards and Technology Gaithersburg Md [106] Martin, S., Liermann, J., & Ney, H (1998) Algorithms for bigram and trigram word clustering Speech communication, vol 24, no 1, pp 19-37 [107] Matsoukas, S., Gauvain, J L., Adda, G., Colthurst, T., Kao, C L., Kimball, O., & Nguyen, L (2006) Advances in transcription of broadcast news and conversational telephone speech within the combined EARS BBN/LIMSI system IEEE Transactions on Audio, Speech, and Language Processing, vol 14, no 5, pp 15411556 [108] McCowan, I A., Moore, D., Dines, J., Gatica-Perez, D., Flynn, M., Wellner, P., & Bourlard, H (2004) On the use of information retrieval measures for speech recognition evaluation No EPFL-REPORT-83156, [109] Mehrabani, M., Bořil, H., & Hansen, J H (2010) Dialect distance assessment method based on comparison of pitch pattern statistical models Acoustics Speech and Signal Processing (ICASSP), IEEE International Conference, pp 5158-5161 [110] Mohamed BELGACEM, Georges ANTONIADIS, Laurent BESACIER (2010) Automatic Identification of Arabic Dialects International Conference on Language Resources and Evaluation (LREC), MALTA, pp 17-23 [111] Mohri, M., Pereira, F., & Riley, M (2002) Weighted finite-state transducers in speech recognition Computer Speech & Language, vol 16, no 1, pp 69-88 [112] Morgan, N., Q Zhu, A Stolcke, K Sonmez, S Sivadas, T Shinozaki, M Ostendorf, P Jain, H Hermansky, D Ellis, G Doddington, B Chen, O Cetin, H Bourlard, and M Athineos (2005) Pushing the envelope-Aside IEEE Signal Processing Magazine, pp 22, 81–88 [113] Nagy, N., Zhang, X., Nagy, G., & Schneider, E W (2006) Clustering dialects automatically: A mutual information approach University of Pennsylvania Working Papers in Linguistics, vol 12, no 2, p 12 [114] Navia-Vázquez, A., Pérez-Cruz, F., Artes-Rodriguez, A., & Figueiras-Vidal, A R (2001) Weighted least squares training of support vector classifiers leading to compact and adaptive schemes IEEE Transactions on Neural Networks, vol 12, no 5, pp 1047-1059 [115] Ney, Hermann (1984) The use of a one-stage dynamic programming algorithm for connected word recognition IEEE Transactions on Acoustics, Speech, and Signal 134 Processing, vol 32(2), pp 263–271 [116] Nguyen Hong Quang, P Nocera, E Castelli, Trinh Van Loan (2008) A Novel Approach in Continuous Speech Recognition for Vietnamese, an Isolating Tonal Language Proceedings of the INTERSPEECH, Brisbane, Australia, pp 11491152 [117] Nguyen Hong Quang, P Nocera, E Castelli, Trinh Van Loan (2008) Large Vocabulary Continuous Speech Recognition for Vietnamese, a Under-resourced Language Proceedings of the 1st International Workshop on Spoken Languages Technologies for Under-resourced Languages (SLTU-2008), Hanoi, Vietnam, pp 23-26 [118] Nguyen Hong Quang, P Nocera, E Castelli, Trinh Van Loan (2008) Tone recognition of Vietnamese continuous speech using hidden Markov model Proceedings of the 2nd International Conference on Communication and Electronics, Hoi An, Vietnam, pp 235-238 [119] Nguyen Hong Quang, P Nocera, E Castelli,Trinh Van Loan (2008) Reconnaissance de la parole continue grand vocabulaire en vietnamien, une langue syllabique tonale Actes des XXVIIes Journée d’Etude sur la Parole, Avignon, France, pp 281-284 [120] Nguyen Hong Quang, Pascal Nocera and Eric Castelli (2008) Tone Recognition of Vietnamese Continuous Speech Using Hidden Markov Model Communications and Electronics, 2008 ICCE 2008 Second International Conference on IEEE, pp 235-239 [121] Nguyễn Phú Bình, Trịnh Văn Loan (2006) Vietnamese Speech Recognition using Subword Models and Test Experiments for Comparing Some Methods of Vietnamese Recognition Proceedings of the 3rd National Symposium on Research, Developpment and Application of Information and Communication Technology (ICT.rda’06), Hanoi-Vietnam, pp 187-196 [122] Nguyễn Phú Bình, Trịnh Văn Loan, E Castelli (2003) Real-time system for Vietnamese isolated word recognition Kỷ yếu Hội thảo khoa học Quốc gia lần thứ nghiên cứu, phát triển ứng dụng Công nghệ Thông tin truyền thông ICT.rda, Hà Nội, pp 310-316 [123] Nguyen Quoc Cuong, Pham Thi Ngoc and Castelli, E (2001) Shape vector characterization of Vietnamese tones and application to automatic recognition Automatic Speech Recognition and Understanding – ASRU'01 IEEE Workshop on, Italy, pp 437-440 [124] Odell, J J., Valtchev, V., Woodland, P C., & Young, S J (1994) A one pass decoder design for large vocabulary recognition In Proceedings of the workshop on Human Language Technology, pp 405-410 [125] Ondřej Plátek (2014) Speech recognition using KALDI MASTER THESIS, Charles University in Prague Faculty of Mathematics and Physics [126] Ortmanns, S., Ney, H., & Aubert, X (1997) A word graph algorithm for large vocabulary continuous speech recognition Computer Speech & Language, vol 11, no 1, pp 43-72 [127] Osuna, E., Freund, R., Girosi, F (1997) An Improved Training Algorithm for Support Vector Machines IEEE NNSP '97, pp 276-285 [128] Pallett, D., Fiscuss, J., Garofolo, J., Martin, A., & Przybocki, M (1999) 1998 broadcast news benchmark test results: English and non-English word error rate 135 performance measures In Proc DARPA Broadcast News Workshop, pp 5-12 [129] Paul, D B (1991) Algorithms for an optimal A* search and linearizing the search in the stack decoder In Acoustics, Speech, and Signal Processing, pp 693-696 [130] Peterson, G E., & Barney, H L (1952) Control methods used in a study of the vowels The Journal of the acoustical society of America, vol 24, no 2, pp 175184 [131] Platt, John C (1999) Fast Training of Support Vector Machines Advances in kernel methods, pp 185-208 [132] Povey, B., Kingsbury, L Mangu, G Saon, H Soltau, and G Zweig (2005) FMPE: Discriminatively trained features for speech recognition Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Philadelphia, PA, pp 961-964 [133] Povey, D., Kanevsky, D., Kingsbury, B., Ramabhadran, B., Saon, G., & Visweswariah, K (2008) Boosted MMI for model and feature-space discriminative training 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, pp 4057-4060 [134] Quinlan, J R (1993) C4.5: Programs for Machine Learning Morgan Kaufmann Publishers [135] Rabiner, L and B Juang (1993) Fundamentals of Speech Recognition Prentice Hall, Englewood Cliffs, NJ [136] Rao, K S (2011) Role of neural network models for developing speech systems Sadhana, vol 36, no 5, pp 783-836 [137] Rao, K S., & Koolagudi, S G (2011) Identification of Hindi dialects and emotions using spectral and prosodic features of speech IJSCI: International Journal of Systemics, Cybernetics and Informatics, vol 9, no 4, pp 24-33 [138] Richardson, F., Ostendorf, M., & Rohlicek, J R (1995) Lattice-based search strategies for large vocabulary speech recognition In Acoustics, Speech, and Signal Processing ICASSP-95., 1995 International Conference, pp 576-579 [139] Rosenberg, A., C H Lee, and F K Soong (1994) Cepstral channel normalization techniques for HMMbased speaker verification Proceedings of the International Conference on Acoustics Speech, and Signal Processing, Adelaide, SA, pp 1835– 1838 [140] S Furui (1986) Speaker independent isolated word recognition using dynamic features of IEEE Transactions ASSP, vol 34, pp 52–59 [141] S J Young and L L Chase (1998) Speech recognition evaluation: A review of the US CSR and LVCSR programmes Computer Speech and Language, vol 12, no 4, pp 263-279 [142] Sak, H., Senior, A W., & Beaufays, F (2014) Long short-term memory recurrent neural network architectures for large scale acoustic modeling In Interspeech, pp 338-342 [143] Sakoe, Hiroaki, and Seibi Chiba (1971) A dynamic programming approach to continuous speech recognition Proceedings of the 7th International Congress on Acoustics, vol 3, Budapest, Hungary, pp 65–69 [144] Saon, G., & Povey, D (2008) Penalty function maximization for large margin HMM training INTERSPEECH, pp 920-923 [145] Shen, W., Chen, N F., & Reynolds, D A (2008) Dialect recognition using adapted phonetic models In Interspeech , pp 763-766 136 [146] Shweta Sinha (2015) Analysis and Recognition of Dialects of Hindi Speech International Journal of Scientific Research in Multidisciplinary Studies, vol 1, no 1, pp 26-33 [147] Shweta Sinha, Aruna Jain, S S Agrawal (2015) Acoustic-Phonetic Feature Based Dialect Identification in Hindi Speech International Journal on Smart Sensing & Intelligent Systems, vol 8, no 1, pp 235-254 [148] Simon Haykin (2005) Neuron Networks A Comprehensive Foundation, 2nd ed McMaster University Hamilton [149] Sinha, S., Jain, A., & Agrawal, S S (2014) Speech Processing for Hindi Dialect Recognition Advances in Signal Processing and Intelligent Recognition Systems Springer International Publishing., pp 161-169 [150] Sittichok Aunkaew, Montri Karnjanadecha, Chai Wutiwiwatchai (2013) Development of a Corpus for Southern Thai Dialect Speech Recognition: Design and Text Preparation The 10th International Symposium on Natural Language Processing, Phuket, Thailand [151] Solera-Ureña, R., Padrell-Sendra, J., Martín-Iglesias, D., Gallardo-Antolín, A., Peláez-Moreno, C., & Díaz-de-María, F (2007) SVMs for Automatic Speech Recognition: A Survey Progress in nonlinear speech processing, pp 190-216 [152] Soltau, H., Kingsbury, B., Mangu, L., Povey, D., Saon, G., & Zweig, G (2005) The IBM 2004 conversational telephony system for rich transcription In Acoustics, Speech, and Signal Processing, 2005 Proceedings.(ICASSP'05) IEEE International, Philadelphia, PA, pp I-205 [153] Song, Y., Cui, R., Hong, X., Mcloughlin, I., Shi, J., & Dai, L (2015) Improved language identification using deep bottleneck network In Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference, pp 4200-4204 [154] Stantic, Dejan, and Jun Jo (2012) Accent Identification by Clustering and Scoring Formants World Academy of Science, Engineering and Technology, International Journal of Computer, Electrical, Automation, Control and Information Engineering, vol 6, no 3, pp 379-384 [155] Stolcke, A., Zheng, J., Wang, W., & Abrash, V (2011) SRILM at sixteen: Update and outlook IEEE Automatic Speech Recognition and Understanding Workshop, vol [156] T.T Vu, D.T Nguyen, M.C Luong, and J-P Hosom (2005) Vietnamese large vocabulary continuous speech recognition INTERSPEECH 2005, Lisbon, Portugal [157] Thang Tat Vu, Dung Tien Nguyen, Mai Chi Luong and John-Paul Hosom (2006) Vietnamese Large Vocabulary Continuous Speech Recognition Proceedings of Eurospeech, Lisboa [158] Thompson, Henry (1990) Best-first enumeration of paths through a lattice - An active chart parsing solution Computer Speech & Language, vol 4, no 3, pp 263274 [159] Tommie Gannert (2007) A Speaker Verification System under the Scope: Alize Stockholm, Sweden School of Computer Science and Engineering [160] Torres-Carrasquillo, P A., Gleason, T P., and Reynolds, D A (2004) Dialect Identification Using Gaussian Mixture Models Odyssey: The Speaker and Language Recognition Workshop, pp 297-300 137 [161] Torres-Carrasquillo, P A., Singer, E., Kohler, M A., Greene, R J., Reynolds, D A., and Deller Jr., J R (2002) Approaches to Language Identification Using Gaussian Mixture Models and Shifted Delta Cepstral Features International Conference on Spoken Language Processing, Denver, CO, ISCA, pp 33-36, 82-92 [162] Trần Đỗ Đạt, Eric Castelli, Trịnh Văn Loan, Lê Việt Bắc (2004) Xây dựng sở liệu lớn tiếng nói cho tiếng Việt Tạp chí Khoa học Cơng nghệ trường đại học kỹ thuật, vol 46+47, pp 13-17 [163] Trần Thị Ngọc Lang (1995) Phương ngữ Nam Bộ Những khác biệt từ vựng ngữ nghĩa so với phương ngữ Bắc Bộ NXB Khoa học Xã hội [164] Trịnh Văn Loan, Nguyễn Nam Hà, Phạm Việt Hà (1999) Determining characteristics of Vietnamese non-accent vowels Post and telecommunication Journal, Special issue: R&D on telecommunication and IT, vol 2, pp 77-82 [165] Tuan Vu Hai, Kris Demuynck and Dirk Van Compernolle Vietnamese Automatic Speech Recognition: the FLaVoR Approach International Symposium on Chinese Spoken Language Processing, Singapore, p 2006 [166] V.B Le, D.D Tran, E Castelli, L Besacier, and J-F Serignat (2004) Spoken and written language resources for vietnamese LREC 2004, vol II, Lisbon, Portugal, pp 599–602 [167] Vapnik, Vladimir Naumovich (1982) Estimation of dependences based on empirical data New York Springer-Verlag, vol 40 [168] Vijayarani, S., & Muthulakshmi, M (2013) Comparative analysis of bayes and lazy classification algorithms International Journal of Advanced Research in Computer and Communication Engineering, vol 2, no 8, pp 3118-3124 [169] Vintsyuk, Taras K (1968) Speech discrimination by dynamic programming Cybernetics and Systems Analysis, vol 4(1), pp 52-57 [170] Viterbi, A (1967) Error bounds for convolutional codes and an asymptotically optimum IEEE transactions on Information Theory, vol 13, no 2, pp 260-269 [171] Viterbi, Andrew (1967) Error bounds for convolutional codes and an asymptotically optimum decoding algorithm IEEE Transactions on Information Theory, vol 13(2), pp 260–269 [172] Võ Xuân Trang (1997) Phương ngữ Bình Trị Thiên Nhà xuất Khoa học xã hội [173] Vu, Quan, Kris Demuynck, and Dirk Van Compernolle (2006) Vietnamese automatic speech recognition: the FLaVoR approach ISCSLP 2006, Kent Ridge, Singapore [174] W Labov (1972) Sociolinguistic Patterns Philadelphia: University of Pennsylvania [175] W Labov, C Boberg, and B Sharon (2006) The Atlas of North American English Walter de Gruyter [176] Wang, Y., M Mahajan, and X Huang (2000) A unified context-free grammar and n-gram model for spoken language processing Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, vol 3, Istanbul, Turkey, pp 1639-1642 [177] Witten, Ian H., and Eibe Frank (2005) Data Mining: Practical machine learning tools and techniques Morgan Kaufmann [178] Woodland, P C., Gales, M J F., Pye, D., & Young, S J (1997) The development of the 1996 HTK broadcast news transcription system DARPA speech recognition workshop, pp 73-78 [179] Xuedong Huang and Li Deng (2010) Handbook of Natural Language Processing, 138 [180] [181] [182] [183] [184] [185] [186] Fred J Damerau Nitin Indurkhya, Ed Chapman and Hall/CRC, vol Xuedong Huang, Alejandro Acero, Hsiao-Wuen Hon (2010) Spoken language processing Prentice Hall Ptr Young, S J., Odell, J J., & Woodland, P C (1994) Tree-based state tying for high accuracy acoustic modelling In Proceedings of the workshop on Human Language , pp 307-312 Young, S J., Russell, N H., & Thornton, J H S (1989) Token passing: a simple conceptual model for connected speech recognition systems Cambridge, UK Cambridge University Engineering Department Young, S J., Russell, N H., & Thornton, J H S (1991) The use of syntax and multiple alternatives in the VODIS voice operated database inquiry system Computer Speech & Language, vol 5, no 1, pp 65-80 Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., & Valtchev, V (2002) The HTK book, 175th ed., 3, Ed Cambridge university engineering department Yusnita, M A., et al (2013) Acoustic analysis of formants across genders and ethnical accents in Malaysian English using ANOVA Procedia Engineering 64, pp 385-394 Zissman, M A., Gleason, T P., Rekart, D M., & Losiewicz, B L (1996) Automatic dialect identification of extemporaneous conversational, Latin American Spanish speech In Acoustics, Speech, and Signal Processing, 1996 ICASSP-96 Conference Proceedings., pp 777-780 139 DANH MỤC CÁC CƠNG TRÌNH ĐÃ CƠNG BỐ CỦA LUẬN ÁN Nguyễn Hồng Quang, Trịnh Văn Loan, Phạm Ngọc Hưng, Trần Xuân Thương (2011) Một phương pháp lựa chọn nhanh tham số cho hệ thống nhận dạng tiếng nói tiếng Việt Tạp chí Nghiên cứu khoa học công nghệ quân sự, Số 16 năm 2011 (tháng 12), ISSN 1859-1043, trang 169-178 Nguyễn Hồng Quang, Trịnh Văn Loan, Phạm Ngọc Hưng, Đào Thị Thu Diệp (2012) Cải thiện hiệu hệ thống nhận dạng tiếng Việt nói phương pháp lưới từ hậu nghiệm Tạp chí Nghiên cứu khoa học cơng nghệ qn sự, Số đặc san ACEIT’12 năm 2012 (tháng 11), ISSN 1859-1043, trang 25-32 Phạm Ngọc Hưng, Trịnh Văn Loan, Nguyễn Hồng Quang (2013) Một hướng tiếp cận dựa tần số để phân biệt phương ngữ tiếng Việt theo phương thức phát âm Kỷ yếu Hội nghị Quốc gia lần thứ VI Nghiên cứu ứng dụng Công nghệ thông tin (FAIR) - Huế, ngày 20 – 21/6/2013, ISBN: 978-604-913-1653, trang 265-269 Diep Dao Thi Thu, Loan Trinh Van, Quang Nguyen Hong, Hung Pham Ngoc (2013) Text-dependent Speaker Recognition for Vietnamese 2013 Fixfth International Conference of Soft Computing and Pattern Recognition (SoCPaR 2013), Hanoi, Vietnam, 15-18 December 2013, pp 203-206, ISBN 978-1-47993400-3, IEEE Catalog Number: CFP1395H-ART Phạm Ngọc Hưng, Trịnh Văn Loan, Nguyễn Hồng Quang, Phạm Quốc Hùng (2014) Nhận dạng phương ngữ tiếng Việt sử dụng mơ hình Gauss hỗn hợp Kỷ yếu Hội nghị Quốc gia lần thứ VII Nghiên cứu ứng dụng Công nghệ thông tin (FAIR) – Thái Nguyên, ngày 19-20/6/2014, ISBN: 978-604-913-300-8, trang 449-552 Phạm Ngọc Hưng, Trịnh Văn Loan, Nguyễn Hồng Quang (2015) Nhận dạng phương ngữ tiếng Việt sử dụng MFCC tần số Kỷ yếu Hội nghị Quốc gia lần thứ VIII Nghiên cứu ứng dụng Công nghệ thông tin (FAIR) – Hà Nội, 09-10/7/2015, ISBN: 978-604-913-397-8, trang 523-528 Pham Ngoc Hung, Trinh Van Loan, Nguyen Hong Quang (2015) Corpus and Statistical Analysis of F0 Variation for Vietnamese Dialect Identification The 3rd International Conference on Computer and Computing Science Proceedings, Hanoi, Vietnam, October 22-24, 2015 ISSN: 2287-1233 ASTL, Vol.111 (COMCOMS 2015), pp.205-210 Pham Ngoc Hung, Trinh Van Loan, Nguyen Hong Quang (2015) “Building of corpus for Vietnamese dialect identification”, Journal of Science and Technology Technical Universities, No.109-2015 ISSN 2354-1083, pp.49-55 Nguyễn Hồng Quang, Phạm Ngọc Hưng, Trịnh Văn Loan, Phạm Quốc Hùng (2016) “So sánh số phân lớp dùng cho nhận dạng phương ngữ tiếng Việt” Kỷ yếu Hội nghị Quốc gia lần thứ IX Nghiên cứu ứng dụng Công 140 nghệ thông tin (FAIR) – Cần Thơ, 4-5/8/2016 ISBN: 978-604-913-472-2, trang 663-667 10 Phạm Ngọc Hưng, Trịnh Văn Loan, Nguyễn Hồng Quang, Trần Vũ Duy (2016) “Cải thiện hiệu hệ thống nhận dạng tiếng việt với thông tin phương ngữ” Kỷ yếu Hội nghị Quốc gia lần thứ IX Nghiên cứu ứng dụng Công nghệ thông tin (FAIR) – Cần Thơ, 4-5/8/2016 ISBN: 978-604-913-472-2, trang 63-69 11 Pham Ngoc Hung, Trinh Van Loan, Nguyen Hong Quang (2016) “Automatic identification of Vietnamese dialects” Journal of Computer Science and Cybernetics, V.32, N.1 (2016), 18-29, DOI: 10.15625/1813-9663/32/1/7905 12 Pham Ngoc Hung, Trinh Van Loan, Nguyen Hong Quang (2016) “Statistical Analysis of Vietnamese Dialect Corpus and Dialect Identification Experiments” International Journal of Scientific Engineering and Applied Science (IJSEAS) – Volume-2, Issue-8, August 2016, ISSN: 2395-3470, pp 255-266 141 ... phát âm liên tục cho phương ngữ tiếng Việt theo phương thức phát âm nhằm nghiên cứu sâu vấn đề xử lý nhận dạng tiếng Việt nói, giải số hạn chế nhận dạng tiếng Việt nói liên quan đến phương ngữ. .. kết nhận dạng phương ngữ tiếng Việt vào hệ thống nhận dạng tự động tiếng Việt nói nhằm cải thiện hiệu nhận dạng, nhận dạng phương ngữ xem bước tiền xử lý hệ thống nhận dạng tự động tiếng Việt. .. biệt phương ngữ tiếng Việt làm sở cho nghiên cứu nhận dạng phương ngữ tiếng Việt Luận án đánh giá ảnh hưởng phương ngữ tới hệ thống nhận dạng tự động tiếng Việt nói (2) Xây dựng ngữ liệu phương ngữ

Ngày đăng: 04/11/2018, 23:06

Xem thêm: Nhận dạng tự động tiếng nói phát âm liên tục cho các phương ngữ chính của tiếng việt theo phương thức phát âm , CHƯƠNG 1: TỔNG QUAN VỀ NHẬN DẠNG TIẾNG NÓI VÀ NHẬN DẠNG PHƯƠNG NGỮ, CHƯƠNG 2: XÂY DỰNG BỘ NGỮ LIỆU CHO NGHIÊN CỨU NHẬN DẠNG PHƯƠNG NGỮ TIẾNG VIỆT, CHƯƠNG 3: NHẬN DẠNG PHƯƠNG NGỮ TIẾNG VIỆT, CHƯƠNG 4: CẢI THIỆN HIỆU NĂNG NHẬN DẠNG TIẾNG VIỆT VỚI THÔNG TIN VỀ PHƯƠNG NGỮ, KẾT LUẬN VÀ KIẾN NGHỊ, TÀI LIỆU THAM KHẢO

Nhận dạng tự động tiếng nói phát âm liên tục cho các phương ngữ chính của tiếng việt theo phương thức phát âm

Thông tin tài liệu

Từ khóa liên quan

Mục lục

MỤC LỤC

MỞ ĐẦU

CHƯƠNG 1: TỔNG QUAN VỀ NHẬN DẠNG TIẾNG NÓI VÀ NHẬN DẠNG PHƯƠNG NGỮ

CHƯƠNG 2: XÂY DỰNG BỘ NGỮ LIỆU CHO NGHIÊN CỨU NHẬN DẠNG PHƯƠNG NGỮ TIẾNG VIỆT

CHƯƠNG 3: NHẬN DẠNG PHƯƠNG NGỮ TIẾNG VIỆT

CHƯƠNG 4: CẢI THIỆN HIỆU NĂNG NHẬN DẠNG TIẾNG VIỆT VỚI THÔNG TIN VỀ PHƯƠNG NGỮ

KẾT LUẬN VÀ KIẾN NGHỊ

TÀI LIỆU THAM KHẢO

Tài liệu cùng người dùng

Tài liệu liên quan