... then
the two keys (N-grams) are equal.
4.3 Searching for a Record
We construct a B
+
-tree for each N-gram file in the
dataset for N = 2, 3, 4, 5, and keep the key of the
first N-gram for each file in ... 103–108,
Portland, Oregon, USA, 21 June 2011.
c
2011 Association for Computational Linguistics
An Efficient Indexer for Large N-Gram Corpora
Hakan Ceylan
Department of C...
... fea-
tures are designed to be general and, for the most part,
grammar and domain independent. For each parse, the
heuristic computes a penalty score for each of the fea-
tures. The penalties ... AN INTEGRATED HEURISTIC SCHEME
FOR PARTIAL PARSE EVALUATION
Alon Lavie
School of Computer Science
Carnegie Mellon University
5000 Forbes Ave.,
Pittsburgh,
PA 15213
email
: ... tree...
... annotation format. Moreover,
the MACAON exchange format was defined from the
bottom up, originating from the authors’ need to use
several existing tools and adapt their input/output
formats in order for ... the
MACAON exchange format. htk2macaon
and fsm2macaon convert word lattices from
the HTK format (Young, 1994) and ATT
FSM format (Mohri et al., 2000) to the
MACAON exchange format. macaon...
... straightforward (we omit it for space),
but of course using such features (while interesting)
would complicate inference in decoding.
4
It may be helpful to think of i as forward probabilities, but
for ... symbol
S ∈ N, terminal alphabet Σ, and rules of the form
A → B C and A → x. (We assume Chomsky nor-
mal form for clarity; the generalization is straight-
forward.) Let r
A
(B C) and r...
...
resulting language model, low-frequency n-grams
are filtered out by some thresholds. Moreover, an
n-gram cache is implemented to speed up n-gram
probability requests for decoding.
3.4 Weight Tuning ... of
minimum error rate training that allows for various
evaluation metrics for tuning the system. In
addition, the toolkit provides easy-to-use APIs for
the development of ne...
... m
j
represents the jth mention (e.g., m
6
for the pronoun
“he”). e
i j
represents the partial entity i before the
jth mention. For example, e
1 6
denotes the part of
e
1
before m
6
, i.e., {“Microsoft Corp.”, ... model has a limitation that informa-
tion beyond mention pairs is ignored for training and
testing. As an individual mention usually lacks ad-
equate descriptive information of...
...
Recently, there are many large electronic texts and
computers are processing (analysis) them widely.
Determination of important keywords is crucial in
successful modern Information Retrieval (IR). ... with time-series variation is
considered, especially when searching for similar
texts.
This paper presents a new method for
estimating automatically the stability classes that
ind...
... adaptations
for our corpus and research issues. For details about
our scheme, see (Di Eugenio et al., 1997); for details
about features we added to DR/, but that are not
relevant for this paper, ... Kappas for Forward and Backward Func-
tions
exist to the set of constraint equations, each varl in
the set of equations must have a solution. For exam-
ple, if 5 instances of so...
...
satisfying:
• CS(N) + C_ CS(N),
for each N;
• (N, L) • CS(N),
for each N such that N <~*
l, and each L • Af*;
• N • CS(N),
for each N such that -~(N<~*l);
and
• for each
N, children(N) ... needed for construction of parse trees
(or "derived trees" as they are often called for
TAGs) and the computation of features are al-
most identical to the correspondi...
...
i
:=
O;
forall z E F
do
create
uarae-concat¢~atoT'-Rmb age~|
J~f i;
N, ~= s*lve(x); i := i + 1;
forellend
for j := 0 to i
do
R := R
U
(Wait-lor-result(J~fj));
forend
return ... parallel system
for processing Typed Feature Structures (TFSs)
on shared-memory parallel machines. We call
the system Parallel Substrate for TFS (PSTFS}.
PSTFS is designed for parallel c...