Báo cáo khoa học: "Generalized Algorithms for Constructing S

Báo cáo khoa học: "Generalized Algorithms for Constructing Statistical Language Models" pdf

... ﬁnding all for a given is . Therefore, the total cost is . For all non-empty , we create a new state and for all we set . We create a transition , and for all such that , we set . For all such ... Generalized Algorithms for Constructing Statistical Language Models Cyril Allauzen, Mehryar Mohri, Brian Roark AT&T Labs – Research 180 ... in a general software library for...

Ngày tải lên : 08/03/2014, 04:22

8
389
0

Tài liệu Báo cáo khoa học: "Improved Smoothing for N-gram Language Models Based on Ordinary Counts" doc

... known method for estimating N-gram language models. Kneser-Ney smoothing, however, requires nonstandard N-gram counts for the lower- order models used to smooth the highest- order model. For some ... schema, C n denotes the counting method used for N-grams of length n. For most smoothing methods, C n denotes actual training corpus counts for all n. For KN smoothing and its var...

Ngày tải lên : 20/02/2014, 09:20

4
365
0

Tài liệu Báo cáo khoa học: "Tabular Algorithms for TAG Parsing" potx

... several tabular algorithms for Tree Adjoining Grammar parsing, creating a continuum from simple pure bottom-up algorithms to complex pre- dictive algorithms and showing what transformations must ... resenting structure. Several parsing algorithms have been proposed for this formalism, most of them based on tabular techniques, ranging from simple bottom-up algorithms (Vij...

Ngày tải lên : 22/02/2014, 03:20

8
292
0

Báo cáo khoa học: "Efficient Search for Interactive Statistical Machine Translation" doc

... translations for large do- mains. Hence, in many applications, post-editing 'The author is now affiliated with the Information Science Institute, University of Southern California, och@isi.edu . of ... prototype system. 2 Statistical Machine Translation We are given a source language ('French') sen- tence = f3 . . . ff, which is to be trans- lated into a target lan...

Ngày tải lên : 08/03/2014, 21:20

8
272
0

Báo cáo khoa học: "Cutting the Long Tail: Hybrid Language Models for Translation Style Adaptation" doc

... ways: only for word selection, as a frequency measure, or also for word representation, as a mapping for common words. In the former, we preserve in- ﬂected variants that may be useful to model the language ... LMs for the target language modeling component of a phrase-based statistical machine translation system. 1 Introduction The translation of TED conference talks 1 is an...

Ngày tải lên : 08/03/2014, 21:20

10
335
0

Báo cáo khoa học: "Faster and Smaller N -Gram Language Models" pptx

... Language Processing. Marcello Federico and Mauro Cettolo. 2007. Efﬁcient handling of n-gram language models for statistical machine translation. In Proceedings of the Second Work- shop on Statistical ... all value ranks for a given language model will vary – we will refer to this variable as v . 2.2 Trie-Based Language Models The data structure of choice for the majority o...

Ngày tải lên : 07/03/2014, 22:20

10
463
0

Báo cáo khoa học: "Unsupervised Sense Disambiguation Using Bilingual Probabilistic Models" pdf

... to natural language understanding and is a useful in- termediate step for many other language processing tasks (Ide and Veronis, 1998). Many recent approaches make use of ideas from statistical ... thus providing huge resources of labeled data for supervised approaches to make use of. For the rest of this paper, for simplicity we will refer to the primary language of the p...

Ngày tải lên : 08/03/2014, 04:22

8
361
0

Tài liệu Báo cáo khoa học: "Generalized Expectation Criteria for Semi-Supervised Learning of Conditional Random Fields" pdf

... = 0.001, which we tuned for best performance on the test set, giving an unfair advantage to our competitor. Finally, there are some methods that use auxil- iary tasks for training sequence models, ... their associated majority label. Features for each label were chosen by the method de- scribed in HK06 – top frequency for that label and not higher frequency for any other label. + SV...

Ngày tải lên : 20/02/2014, 09:20

9
492
1

Tài liệu Báo cáo khoa học: "Generalized Hebbian Algorithm for Incremental Singular Value Decomposition in Natural Language Processing" potx

... you will never cover all the things that might reasonably be said. Language is often too rich for the task being performed; for example it can be difficult to establish that two documents are d ... problem within language processing is the over-specificity of language, and the sparsity of data. Corpus- based techniques depend on a sufficiency of examples in order to model human language...

Ngày tải lên : 22/02/2014, 02:20

8
362
0

Báo cáo khoa học: "Randomized Algorithms and NLP: Using Locality Sensitive Hash Function for High Speed Noun Clustering" pdf

... permutation for every vector (by choosing random values for a and b, q number of times). Thus for every vector we have q different bit permutations for the original bit stream. 5. For each permutation ... Association for Computational Linguistics Randomized Algorithms and NLP: Using Locality Sensitive Hash Function for High Speed Noun Clustering Deepak Ravichandran, Patrick...

Ngày tải lên : 08/03/2014, 04:22

8
393
0