Báo cáo khoa học: "An Efﬁcient Indexer for Large N-Gram Corp

Báo cáo khoa học: "An Efﬁcient Indexer for Large N-Gram Corpora" docx

... then the two keys (N-grams) are equal. 4.3 Searching for a Record We construct a B + -tree for each N-gram file in the dataset for N = 2, 3, 4, 5, and keep the key of the first N-gram for each file in ... 103–108, Portland, Oregon, USA, 21 June 2011. c 2011 Association for Computational Linguistics An Efficient Indexer for Large N-Gram Corpora Hakan Ceylan Department of C...

Ngày tải lên : 07/03/2014, 22:20

6
320
0

Tài liệu Báo cáo khoa học: "AN INTEGRATED HEURISTIC SCHEME FOR PARTIAL PARSE EVALUATION" docx

... features are designed to be general and, for the most part, grammar and domain independent. For each parse, the heuristic computes a penalty score for each of the features. The penalties ... AN INTEGRATED HEURISTIC SCHEME FOR PARTIAL PARSE EVALUATION Alon Lavie School of Computer Science Carnegie Mellon University 5000 Forbes Ave., Pittsburgh, PA 15213 email : ... tree...

Ngày tải lên : 20/02/2014, 21:20

3
346
0

Báo cáo khoa học: " An NLP Tool Suite for Processing Word Lattices" docx

... annotation format. Moreover, the MACAON exchange format was deﬁned from the bottom up, originating from the authors’ need to use several existing tools and adapt their input/output formats in order for ... the MACAON exchange format. htk2macaon and fsm2macaon convert word lattices from the HTK format (Young, 1994) and ATT FSM format (Mohri et al., 2000) to the MACAON exchange format. macaon...

Ngày tải lên : 07/03/2014, 22:20

6
311
0

Tài liệu Báo cáo khoa học: "Computationally Efﬁcient M-Estimation of Log-Linear Structure Models∗" doc

... straightforward (we omit it for space), but of course using such features (while interesting) would complicate inference in decoding. 4 It may be helpful to think of i as forward probabilities, but for ... symbol S ∈ N, terminal alphabet Σ, and rules of the form A → B C and A → x. (We assume Chomsky nor- mal form for clarity; the generalization is straightforward.) Let r A (B C) and r...

Ngày tải lên : 20/02/2014, 12:20

8
286
0

Tài liệu Báo cáo khoa học: "An Open Source Toolkit for Phrase-based and Syntax-based Machine Translation" docx

... resulting language model, low-frequency n-grams are filtered out by some thresholds. Moreover, an n-gram cache is implemented to speed up n-gram probability requests for decoding. 3.4 Weight Tuning ... of minimum error rate training that allows for various evaluation metrics for tuning the system. In addition, the toolkit provides easy-to-use APIs for the development of ne...

Ngày tải lên : 19/02/2014, 20:20

6
530
0

Tài liệu Báo cáo khoa học: "An Entity-Mention Model for Coreference Resolution with Inductive Logic Programming" pdf

... m j represents the jth mention (e.g., m 6 for the pronoun “he”). e i j represents the partial entity i before the jth mention. For example, e 1 6 denotes the part of e 1 before m 6 , i.e., {“Microsoft Corp.”, ... model has a limitation that information beyond mention pairs is ignored for training and testing. As an individual mention usually lacks ad- equate descriptive information of...

Ngày tải lên : 20/02/2014, 09:20

9
476
2

Tài liệu Báo cáo khoa học: "An Evaluation Method of Words Tendency using Decision " docx

... Recently, there are many large electronic texts and computers are processing (analysis) them widely. Determination of important keywords is crucial in successful modern Information Retrieval (IR). ... with time-series variation is considered, especially when searching for similar texts. This paper presents a new method for estimating automatically the stability classes that ind...

Ngày tải lên : 20/02/2014, 16:20

4
502
0

Tài liệu Báo cáo khoa học: "An Empirical Investigation of Proposals in Collaborative Dialogues" docx

... adaptations for our corpus and research issues. For details about our scheme, see (Di Eugenio et al., 1997); for details about features we added to DR/, but that are not relevant for this paper, ... Kappas for Forward and Backward Func- tions exist to the set of constraint equations, each varl in the set of equations must have a solution. For example, if 5 instances of so...

Ngày tải lên : 20/02/2014, 18:20

5
452
0

Tài liệu Báo cáo khoa học: "An alternative LR algorithm for TAGs" docx

... satisfying: • CS(N) + C_ CS(N), for each N; • (N, L) • CS(N), for each N such that N <~* l, and each L • Af*; • N • CS(N), for each N such that -~(N<~*l); and • for each N, children(N) ... needed for construction of parse trees (or "derived trees" as they are often called for TAGs) and the computation of features are al- most identical to the correspondi...

Ngày tải lên : 20/02/2014, 18:20

7
413
0

Tài liệu Báo cáo khoa học: "An Efficient Parallel Substrate for Typed Feature Structures on Shared Memory Parallel Machines" docx

... i := O; forall z E F do create uarae-concat¢~atoT'-Rmb age~| J~f i; N, ~= s*lve(x); i := i + 1; forellend for j := 0 to i do R := R U (Wait-lor-result(J~fj)); forend return ... parallel system for processing Typed Feature Structures (TFSs) on shared-memory parallel machines. We call the system Parallel Substrate for TFS (PSTFS}. PSTFS is designed for parallel c...

Ngày tải lên : 20/02/2014, 18:20

7
427
0