... NIST Language Recognition Evaluation database. 1 Introduction Spoken language and written language are similar in many ways. Therefore, much of the research in spoken language identification, ... 2003. Acoustic, Pho-netic and Discriminative Approaches to Automatic language recognition, In Proc. of Eurospeech Masahide Sugiyama. 1991. Automatic language recog-nition using acoustic features, ... of acoustic vocabulary (AV) with mixture of token unigram, bigram, and trigram: a) AV1: 32 broad class phonemes as unigram, selected from 12 languages, also referred to as P-ASM as detailed...
... Proceedings of the Human Language Technology Workshop, 272-277. ARPA. Raymond Lau, Ronald Rosenfeld, and Salim Roukos. 1993. Trigger-based language models: a maximum entropy approach. In Proceedings ... University, Baltimore, MD. Frederick Jelinek, John Lafferty, David M. Mager- man, Robert Mercer, Adwait Ratnaparkhi, Salim Roukos. 1994. Decision Tree Parsing usinga Hid- den Derivational Model. ... those assigned man- ually in the Penn Treebank (Marcus95) after under- going headword percolation and binarization. All four LMs predict a word wk and they were implemented using the Maximum...
... sec-ond dataset contains three annotated presidential de-bates (Boydstun et al., 2011) between Barack Obamaand John McCain and a vice presidential debate be-tween Joe Biden and Sarah Palin. Each ... Quintana,F. A. (2004). Nonparametric Bayesian data analysis.Statistical Science, 19(1):95–110.[Murray et al., 2005]Murray, G., Renals, S., and Carletta,J. (2005). Extractive summarization of meeting ... moderator.7Similarly, the “Question” speakerhad a relatively high variance, consistent with anamalgamation of many distinct speakers.These topic shift tendencies suggest that all can-didates manage to...
... DLMs are trained using correct sentences from a corpus and negativeexamples from a Pseudo-Negative generator.An advantage of sampling is that as many nega-tive examples can be collected as correct ... that they have the dis-advantage of being computationally expensive, andnot all relevant features can be included. A discriminative languagemodel (DLM) assigns a scoreto a sentence , measuring ... spe-cific applications and therefore were able to obtainreal negative examples easily. For example, Roark(2007) proposed a discriminative language model, inwhich amodel is trained so that a correct...
... features, as described below: a statistical languagemodel and a measure of tensedifficulty.4.1 The language model The lexical difficulty of a text is quite an elaboratephenomenon to parameterise. ... poems as outliers).4 Selection of lexical and syntacticvariablesAny text classification tasks require an object(here a text) to be parameterised into variables,whether qualitative or quantitative. ... Belgiumthomas.francois@uclouvain.beAbstractReading is known to be an essential taskin language learning, but finding the ap-propriate text for every learner is far fromeasy. In this context, automatic...
... Portugal.Federico Sangati and Chiara Mazza. 2009. An English Dependency Treebank` a la Tesni`ere. In The 8th In-ternational Workshop on Treebanks and LinguisticTheories, pages 173–184, Milan, ... (92).Michael J. Collins. 1999. Head-Driven StatisticalModels for Natural Language Parsing. Ph.D. the-sis, University of Pennsylvania.Marie-Catherine de Marneffe and Christopher D. Man-ning. ... coordination, a linguistic phenomena highlyabundant in natural language production, but of-ten neglected when it comes to evaluating parsingresources. We have therefore proposed a specialevaluation...
... signif-icantly. Bear in mind that Charniak et al. (2003) in-tegrated Charniak’s languagemodel with the syntax-based translation model Yamada and Knight pro-posed (2001) to rescore a tree-to-string ... Stochastic analysis of lexical andsemantic enhanced structural language model. The 8thInternational Colloquium on Grammatical Inference(ICGI), 97-111.K. Yamada and K. Knight. 2001. A syntax-based ... (EMNLP),858-867.E. Charniak. 2001. Immediate-head parsing for language models. The 39th Annual Conference on Associationof Computational Linguistics (ACL), 124-131.E. Charniak, K. Knight and K. Yamada. 2003....
... and Linda C. Bauman Peto. 1995. A hierarchical Dirichlet language model. Natural Lan-guage Engineering, 1(3):1–19.Y.W. Teh. 2006. A hierarchical Bayesian language model based on Pitman-Yor processes. ... n-grams:C(ab) − C(ab∗). A( ab) = max(1, K(C(ab) − C(ab∗))) A different K constant is chosen for each n-gramorder. Using this formulation as an interpolated 5-gram languagemodel gives a cross ... Speech and Language. R. Kneser and H. Ney. 1995. Improved backing-off form-gram language modeling. In International Confer-ence on Acoustics, Speech, and Signal Processing.David J. C. Mackay and...
... com-pression tasks achieved a significant com-pression rate without any loss.1 IntroductionThere has been an increase in available N -gramdata and a large amount of web-scaled N-gramdata has been ... the ACL-IJCNLP 2009 Conference Short Papers, pages 341–344,Suntec, Singapore, 4 August 2009.c2009 ACL and AFNLP A Succinct N-gram Language Model Taro Watanabe Hajime Tsukada Hideki IsozakiNTT ... Communication Science Laboratories2-4 Hikaridai Seika-cho Soraku-gun Kyoto 619-0237 Japan{taro,tsukada,isozaki}@cslab.kecl.ntt.co.jpAbstractEfficient processing of tera-scale text datais an important...
... using matrix PRICAI-00, 2000, (to appear). Tanaka H. (1995) Statistical Learning of “Case Frame Tree” for Translating English Verbs, Journal of NLP, 2/3, pp. 49-72, (in Japanese). Yamada, ... Laboratories 2-2 Hikaridai, Seika, Soraku Kyoto 619-0288, Japan sumita@slt.atr.co.jp Abstract Building a bilingual dictionary for transfer in a machine translation system is conventionally ... generalization (Akiba et. al., 1996 and Tanaka, 1995); (2) approaches using structural matching: to obtain transfer rules, several search methods have been proposed for maximal structural matching between...
... 923 Japanese OCR Error Correction using Character Shape Similarity and Statistical LanguageModel Masaaki NAGATA NTT Information and Communication Systems Laboratories 1-1 Hikari-no-oka Yokosuka-Shi ... such as Japanese and Chinese. It consists of a statistical OCR model, an approxi- mate word matching method using character shape similarity, and a word segmentation algorithm us- ing a statistical ... Yokosuka-Shi Kanagawa, 239-0847 Japan nagata@nttnly, isl. ntt. co. jp Abstract We present a novel OCR error correction method for languages without word delimiters that have a large character...