... Proceedings of the 43rd Annual Meeting of the ACL, pages 451–458,
Ann Arbor, June 2005.
c
2005 Association for Computational Linguistics
Using Conditional Random Fields For Sentence Boundary Detection ... not an-
notated according to the guideline used for the train-
ing and test data (Strassel, 2003). For BN, we use
the training corpus for the LM for speech recogni-...
... CRF:All in Table
5). We get about 0.5% increase in accuracy, 76.1%
with a window of size w = 1.
Using larger windows resulted in minor increases
in the performance of the model, as summarized in
Table ... For our textual phonological features, we in-
cluded the number of syllables in a word and the
number of phones (both in citation form as well as
transcribed form). Instead...
... linguistic expression used by a ques-
tioner to request information in the form of an an-
swer. The sentence containing request focus is
called question. Context are the sentences contain-
ing ... on forum data.
Experimental results show that 1) Linear CRFs out-
perform SVM and decision tree in both context
and answer detection; 2) Skip-chain CRFs outper-
form Linear CRFs for answer...
... experiment
simply for illustrative purposes: many columns
in this code were unnecessary, yielding only a
slight gain in performance over much simpler
codes while incurring a very large increase in
training ... errors by individual models.
Error-correcting CRF training is much
less resource intensive and has a much
faster training time than a standardly
formulated CRF, while decoding
p...
... CRF training.
1 Introduction
Conditional random fields (CRFs) are a recently
introduced formalism (Lafferty et al., 2001) for
representing a conditional model p(y|x), where
both a set of inputs, ... isozaki}@cslab.kecl.ntt.co.jp
Abstract
This paper proposes a framework for train-
ing Conditional Random Fields (CRFs)
to optimize multivariate evaluation mea-
sures, including non-...
... Effect of Training Data Size
In order to allow for rapid examination of multi-
ple feature combinations, we restricted the size of
the training set (S) to maintain manageable train-
ing times. ... training data. We test this assumption by
taking the best-performing feature sets from Table 5
and training new models using twice the training
data {S=4000}. The results are shown in Table 6....
... distributions in Figure 1, except the user model,
are known. The ASR confusion model was esti-
mated by transcribing 50 randomly chosen dialogs
from the training set in Section 4.2 and calculat-
ing the ... reasonable,
hand-crafted value for θ, and then generated syn-
thetic dialogs by following the probabilistic process
depicted in Figure 1. In this way, we were able to
create synt...
... sometimes elim-
inate the original meaning by incorrectly
removing important parts of sentences, be-
cause trimming probabilities only depend
on parents’ and daughters’ non-terminals
in applied CFG ... tree of the original
sentence is marked if it exists in the compressed
sentence. In the figure, the marked terminals are
represented by circles. Second, each non-terminal
in the origi...
... word in
WordNet is linked to a set of senses, with each
sense identifying one particular meaning of that
word. For example, the noun ‘skin’ has senses rep-
resenting (i) the cutis or skin of ... sense-pair, using the first term of
the score tuple as the main key for comparison
(lines 14 and 15), and using the second term as
a tie-breaker (lines 16 to 18). If the PRO score for
relation...
... parses
in TreeBank Release 2. As named entity information
is not available in PropBank/TreeBank we tagged
the training corpus with NE information using an
open-domain NE recognizer, having 96% ... were
mainly responsible for the improvements. This is
the rationale for including them in the FS2 set.
We also were interested in comparing the results
2
3
These results, listed also on...