... University of British Columbia, Canada
Abstract
We apply Stochastic Meta- Descent (SMD),
a stochastic gradient optimization method
with gain vector adaptation, to the train-
ing ofConditionalRandomFields ... but as we show in Section 5, it is often better to
try to optimize the correct objective function.
AcceleratedTrainingofConditional Random
FieldswithStochastic Gradient Methods
S.V. N. ... exponential families, and describe
CRFs as conditional models in the exponential family.
AcceleratedTrainingof CRFs withStochastic Gradient Methods
Figure 6. Training objective (left) and percent...
... of the different feature set, as de-
scribed in Sec. 5.2. However, MCE-F showed the
better performance of 85.29 compared with (Mc-
Callum and Li, 2003) of 84.04, which used the
MAP trainingof ... Linguistics and 44th Annual Meeting of the ACL, pages 217–224,
Sydney, July 2006.
c
2006 Association for Computational Linguistics
Training ConditionalRandomFieldswith Multivariate Evaluation
Measures
Jun ... function of
the CRFs into that of the MCE criterion:
g(y, x, λ) = log p(y|x; λ) ∝ λ · F (y, x) (11)
Basically, CRF trainingwith the MCE criterion
optimizes Eq. 9 with Eq. 11 after the selection of
an...
... variable z.
This type oftraining has been applied by Quattoni
et al. (2007) for hidden-state conditional random
fields, and can be equally applied to semi-supervised
conditional random fields. Note, ... requires significant in-
sight.
2
3 ConditionalRandom Fields
Linear-chain conditionalrandom fields (CRFs) are a
discriminative probabilistic model over sequences x
of feature vectors and label sequences ... Semi-Supervised Learning of
ConditionalRandom Fields
Gideon S. Mann
Google Inc.
76 Ninth Avenue
New York, NY 10011
Andrew McCallum
Department of Computer Science
University of Massachusetts
140...
... Linguistics
Discriminative Word Alignment withConditionalRandom Fields
Phil Blunsom and Trevor Cohn
Department of Software Engineering and Computer Science
University of Melbourne
{pcbl,tacohn}@csse.unimelb.edu.au
Abstract
In ... and thus the sparsity of the
index label set is not an issue.
3.1 Features
One of the main advantages of using a conditional
model is the ability to explore a diverse range of
features engineered ... as de ↔ of, which lie well off the
diagonal, are avoided.
The differing utility of the alignment word pair
feature between the two tasks is probably a result
of the different proportions of word-...
... label
of the preceding entity, the model can be solved
without approximation.
4 Reduction of Training/ Inference Cost
The straightforward implementation of this mod-
eling in semi-CRFs often results ... distribution of entities in the
training set of the shared task in 2004 JNLPBA.
Formally, the computational cost oftraining semi-
CRFs is O(KLN), where L is the upper bound
length of entities, ... thus compared the result of the recog-
nizers with and without filtering using only 2000
sentences as the training data. Table 5 shows the
result of the total system with different filtering
thresholds....
... Proceedings of ACL-08: HLT, pages 710–718,
Columbus, Ohio, USA, June 2008.
c
2008 Association for Computational Linguistics
Using ConditionalRandomFields to Extract Contexts and Answers of
Questions ... gaocong@cs.aau.dk
cyl@microsoft.com zxy-dcs@tsinghua.edu.cn
Abstract
Online forum discussions often contain vast
amounts of questions that are the focuses of
discussions. Extracting contexts and answers
together with ... S8 is an answer of question 1, but they
cannot be linked with any common word. Instead,
S8 shares word pet with S1, which is a context of
question 1, and thus S8 could be linked with ques-
tion...
... Further, the CRF algo-
rithm is parallelizable, so that most of the work of an
Discriminative Language Modeling with
Conditional RandomFields and the Perceptron Algorithm
Brian Roark Murat Saraclar
AT&T ... are of-
ten used for this task, whose parameters are optimized
to maximize the likelihood of a large amount of training
text. Recognition performance is a direct measure of the
effectiveness of ... selection.
The number of distinct n-grams in our training data is
close to 45 million, and we show that CRF training con-
verges very slowly even when trained with a subset (of
size 12 million) of these features....
... max
¯y
p(¯y|¯x; w)
for each training example ¯x.
The software we use as an implementation of
conditionalrandom fields is named CRF++ (Kudo,
2007). This implementation offers fast training
since it uses ... ver-
sion of T
E
X used a different, simpler method.
Liang’s method was used also in troff and
groff, which were the main original competitors
of T
E
X, and is part of many contemporary software
products, ... Sha and Fernando Pereira. 2003. Shallow pars-
ing withconditionalrandom fields. Proceedings of
the 2003 Conference of the North American Chapter
of the Association for Computational Linguistics...
... on a string of text, without the addition of
acoustic data, we have shown that adding aspects
of rhythm and timing aids in the identification of
accent targets. We used the number of words in
an ... (Section 7).
2 ConditionalRandom Fields
CRFs can be considered as a generalization of lo-
gistic regression to label sequences. They define
a conditional probability distribution of a label se-
quence ... features
of ConditionalRandom Fields. In Proc. of Un-
certainty in Articifical Intelligence.
T. Minka. 2001. Algorithms for maximum-
likelihood logistic regression. Technical report,
CMU, Department of...
... N. Schraudolph, M. Schmidt and K. Mur-
phy. (2006). Acceleratedtrainingofconditional random
fields withstochastic meta- descent. Proceedings of the
23th International Conference on Machine Learning.
D. ... number of states
= number oftraining iterations.
Then the time required to classify a test sequence
is , independent oftraining method, since
the Viterbi decoder needs to access each path.
For training, ... of Grandvalet and Ben-
gio (2004) to structured predictors. The result-
ing objective combines the likelihood of the CRF
on labeled training data with its conditional en-
tropy on unlabeled training...
... Semi-
markov conditionalrandom fields for information
extraction. In Proceedings of NIPS.
Fei Sha and Fernando Pereira. 2003. Shallow parsing
with conditionalrandom fields. In Proceedings of
HLT-NAACL.
Erik ... parsing. We
convert the task of full parsing into a series
of chunking tasks and apply a conditional
random field (CRF) model to each level
of chunking. The probability of an en-
tire parse tree ... states
and edges combined with surface observations.
The weights of the features are determined in
such a way that they maximize the conditional log-
likelihood of the training data:
L
λ
=
N
i=1
log...
... 2002. Efficient trainingofconditional random
fields. Master’s thesis, University of Edinburgh.
17
3.3 Choice of code
The accuracy of ECOC methods are highly depen-
dent on the quality of the code. ... recognition withconditionalrandom fields, feature
induction and web-enhanced lexicons. In Proceedings of
CoNLL 2003, pages 188–191.
Andrew McCallum. 2003. Efficiently inducing features of
conditionalrandom ... Osborne
Division of Informatics
University of Edinburgh
United Kingdom
miles@inf.ed.ac.uk
Abstract
Conditional RandomFields (CRFs) have
been applied with considerable success to
a number of natural...
... variety of types of expert,
combination of expert CRFs with an unregularised
standard CRF under a LOP with optimised weights
can outperform the unregularised standard CRF and
rival the performance of ... have considered training the
weights of a LOP-CRF using pre-trained, static ex-
perts. In future we intend to investigate cooperative
training of LOP-CRF weights and the parameters of
each expert ... Proceedings of the 43rd Annual Meeting of the ACL, pages 18–25,
Ann Arbor, June 2005.
c
2005 Association for Computational Linguistics
Logarithmic Opinion Pools for ConditionalRandom Fields
Andrew...
... prosodic features ) is associated
with a state
.
The model is trained to maximize the conditional
log-likelihood of a given training set. Similar to the
Maxent model, the conditional likelihood is closely
related ... its
training objective function (joint versus conditional
likelihood) and its handling of dependent word fea-
tures. Traditional HMM training does not maxi-
mize the posterior probabilities of ... 5.
452
Proceedings of the 43rd Annual Meeting of the ACL, pages 451–458,
Ann Arbor, June 2005.
c
2005 Association for Computational Linguistics
Using ConditionalRandomFields For Sentence...
... Shallow parsing withconditionalrandom fields. In
Proceedings of HLT-NAACL, pages 213–220, 2003.
P. Singla and P. Domingos. Discriminative trainingof Markov logic networks. In
Proceedings of the Twentieth ... number of states is large, or the number of training
sequences is very large, then this can become expensive. For example, on a standard
named-entity data set, with 11 labels and 200,000 words oftraining ... training data, CRF
training finishes in under two hours on current hardware. However, on a part -of-
speech tagging data set, with 45 labels and one million words oftraining data, CRF
training requires...