... 217–224,
Sydney, July 2006.
c
2006 Association for Computational Linguistics
Training ConditionalRandomFieldswithMultivariate Evaluation
Measures
Jun Suzuki, Erik McDermott and Hideki Isozaki
NTT Communication ... isozaki}@cslab.kecl.ntt.co.jp
Abstract
This paper proposes a framework for train-
ing ConditionalRandomFields (CRFs)
to optimize multivariateevaluation mea-
sures, including non-linear measures such
as F-score. Our proposed framework ... optimization re-
sults.
4 MultivariateEvaluation Measures
Thus far, we have discussed the error rate ver-
sion of MCE. Unlike ML/MAP, the framework of
MCE criterion training allows the embedding...
... 2006.
c
2006 Association for Computational Linguistics
Discriminative Word Alignment withConditionalRandom Fields
Phil Blunsom and Trevor Cohn
Department of Software Engineering and Computer ... into the CRF, and
demonstrate that even with only a few hun-
dred word-aligned training sentences, our
model improves over the current state-of-
the-art with alignment error rates of 5.29
and ... work in Section 6.
Finally, we conclude in Section 7.
2 Conditionalrandom fields
CRFs are undirected graphical models which de-
fine a conditional distribution over a label se-
quence given an...
... substantial improvements in accuracy
for tagging tasks in Collins (2002).
2.3 ConditionalRandomFields
Conditional RandomFields have been applied to NLP
tasks such as parsing (Ratnaparkhi et al., ... some point during training. Thus the percep-
tron algorithm is in effect doing feature selection as a
by-product of training. Given N training examples, and
T passes over the training set, O(NT ... data.
This is a key contrast withconditionalrandom fields,
which optimize the parameters of a fixed feature set. Fea-
ture selection can be critical in our domain, as training
and applying a discriminative...
... struc-
tured learning has been highly successful, with
sequence classification as its most important and
successful subfield, and withconditional random
fields (CRFs) as the most influential approach ... dictionaries, or in compound words such as
“sudden-acceleration” above.
3 Conditionalrandom fields
A linear-chain conditionalrandom field (Lafferty
et al., 2001) is a way to use a log-linear model
for ... 661–672. MIT Press, Cambridge, MA,
USA.
Fei Sha and Fernando Pereira. 2003. Shallow pars-
ing withconditionalrandom fields. Proceedings of
the 2003 Conference of the North American Chapter
of the Association...
... variable z.
This type of training has been applied by Quattoni
et al. (2007) for hidden-state conditional random
fields, and can be equally applied to semi-supervised
conditional random fields. Note, ... information,
and making good selections requires significant in-
sight.
2
3 ConditionalRandom Fields
Linear-chain conditionalrandom fields (CRFs) are a
discriminative probabilistic model over sequences ... tokens.
Training a GE model with only labeled features sig-
nificantly outperforms traditional log-likelihood training
with labeled instances for comparable numbers of labeled
tokens. When training...
... this experiment, we could
not examine the performance without filtering us-
ing all the training data, because training on all
the training data without filtering required much
larger memory resources ... compared the result of the recog-
nizers with and without filtering using only 2000
sentences as the training data. Table 5 shows the
result of the total system with different filtering
thresholds. ... Cohen. 2004. Semi-
markov conditionalrandom fields for information
extraction. In NIPS 2004.
Burr Settles. 2004. Biomedical named entity recogni-
tion using conditionalrandom fields and rich feature
sets....
... results
(Section 6) and conclude (Section 7).
2 ConditionalRandom Fields
CRFs can be considered as a generalization of lo-
gistic regression to label sequences. They define
a conditional probability distribution ... Models (McCallum et al., 2000),
Projection Based Markov Models (Punyakanok and
Roth, 2000), ConditionalRandomFields (Lafferty
et al., 2001), Sequence AdaBoost (Altun et al.,
2003a), Sequence Perceptron ... them with acous-
tic features that have been demonstrated to be good
predictors of pitch accent (Sun, 2002; Conkie et al.,
1999; Wightman et al., 2000).
7 Conclusion
We used CRFs with new measures...
... result-
ing objective combines the likelihood of the CRF
on labeled training data with its conditional en-
tropy on unlabeled training data. Unfortunately,
the maximization objective is no longer ... observation
sequence
, define the matrix random
variable by
where
Here is the edge with labels and
is the vertex with label .
For each index define the for-
ward vectors with base case
and recurrence
Similarly, ... semi-supervised training
procedure for conditionalrandom fields
(CRFs) that can be used to train sequence
segmentors and labelers from a combina-
tion of labeled and unlabeled training data.
Our...
... Cohen. 2004. Semi-
markov conditionalrandom fields for information
extraction. In Proceedings of NIPS.
Fei Sha and Fernando Pereira. 2003. Shallow parsing
with conditionalrandom fields. In Proceedings ... states
and edges combined with surface observations.
The weights of the features are determined in
such a way that they maximize the conditional log-
likelihood of the training data:
L
λ
=
N
i=1
log ... 2009.
c
2009 Association for Computational Linguistics
Fast Full Parsing by Linear-Chain ConditionalRandom Fields
Yoshimasa Tsuruoka
†‡
Jun’ichi Tsujii
†‡∗
Sophia Ananiadou
†‡
†
School of Computer...
... on Conditional
RandomFields (Lafferty et al., 2001) (CRFs) which
are able to model the sequential dependencies be-
tween contiguous nodes. A CRF is an undirected
graphical model G of the conditional ... answers
together with the questions will yield not only
a coherent forum summary but also a valu-
able QA knowledge base. In this paper, we
propose a general framework based on Con-
ditional RandomFields ... question 1, but they
cannot be linked with any common word. Instead,
S8 shares word pet with S1, which is a context of
question 1, and thus S8 could be linked with ques-
tion 1 through S1. We call...
... recognition withconditionalrandom fields, feature
induction and web-enhanced lexicons. In Proceedings of
CoNLL 2003, pages 188–191.
Andrew McCallum. 2003. Efficiently inducing features of
conditional random ... parsing with
conditional random fields. In Proceedings of HLT-NAACL
2003, pages 213–220.
Andrew Smith, Trevor Cohn, and Miles Osborne. 2005. Loga-
rithmic opinion pools for conditionalrandom fields. ... task, with the model predicting both the
chunk tags and the POS tags. The training corpus
consisted of 8,936 sentences, with 47,377 tokens
and 118 labels.
A 200-bit random code was used, with...
... entity
recognition withconditionalrandom fields, feature induction
and web-enhanced lexicons. In Proc. CoNLL-2003.
A. McCallum, K. Rohanimanesh, and C. Sutton. 2003. Dy-
namic conditionalrandom fields ... 29.13
Label PER 40.49
Label O 60.44
Random 1 70.34
Random 2 67.76
Random 3 67.97
Random 4 70.17
Table 1: Development set F scores for NER experts
6.2 LOP-CRFs with unregularised weights
In this ... a viable alternative to
CRF regularisation without the need for hyperpa-
rameter search.
2 ConditionalRandom Fields
A linear chain CRF defines the conditional probabil-
ity of a state or label...
... prosodic features ) is associated
with a state
.
The model is trained to maximize the conditional
log-likelihood of a given training set. Similar to the
Maxent model, the conditional likelihood is closely
related ... CRF differs from an HMM with respect to its
training objective function (joint versus conditional
likelihood) and its handling of dependent word fea-
tures. Traditional HMM training does not maxi-
mize ... words).
We also notice from the CTS results that when
only word N-gram information is used (with or
without combining with prosodic information), the
HMM is superior to the Maxent; only when various
additional...