... syntax
across languages, and try to extract a common
grammar from non-parallel multilingual corpora.
For this purpose, we propose a generative model
for multilingual grammars that is learned in an
unsupervised ... corpora,
where each sentence is generated from a
language dependent probabilistic context-
free grammar (PCFG), and these PCFGs
are generated from a prior grammar...
... null label is NO-REL.
train/test split from Table 1 and the feature sets:
Syntactic The syntactic features from Section 4.
Semantic The semantic features from Section 4.
All Both syntactic and ... took, take, VBD
and began, to, trade, begin, trade, VBD,TO,VB.
• The syntactic paths from the first event to
the common ancestor to the second event, e.g.
VBD>VP, VP and VP<VBD.
1
Tra...
...
instances), from the TimeBank corpus annotated
in TimeML (Pustejovky et al., 2003). The non-
WSJ articles (mainly political and disaster news)
include both print and broadcast news that are
from ... two peaks in this distribution. One is
from 5 to 7 in the natural logarithmic scale,
which corresponds to about 1.5 minutes to 30
minutes. The other is from 14 to 17 in the natural
l...
... both precision and recall.
We cast semantic category acquisition from
search logs as the task of learning labeled in-
stances from few labeled seeds. To our knowledge
this is the first study that ... different from ours. An-
other line of new research is to combine various re-
sources such as web documents with search query
logs (Pas¸ca and Durme, 2008; Talukdar et al.,
2008). We differ...
... of the Constraint Grammar (CG)
(Karlsson et al., 1995) approach to part of
speech tagging and surface syntactic depen-
dency parsing is due to the minutely hand-
crafted grammar and two-level ... Progol machine learning system will be
presented very briefly.
1.1 Constraint Grammar POS tagging
Constraint Grammar is a system for part of
speech tagging and (shallow) syntactic dep...
... improvement over the 92.3 us-
ing only the Wiktionary lexicon. Of the true errors,
the most common arose from semantically related
words which had strong context feature correlations
(see table ... aria42,pliang,tberg,klein }@cs.berkeley.edu
Abstract
We present a method for learning bilingual
translation lexicons from monolingual cor-
pora. Word types in each language are charac-
terize...
... from corpora. The
EX approach aims to construct a large and up-to-
date transliteration lexicon from live corpora.
Towards this objective, some have proposed
extracting translation pairs from ...
extraction of transliteration pairs (EX) from
corpora.
The TM approach models phoneme-based or
grapheme-based mapping rules using a
generative model that is trained from a large
bi...
... called me up.
The following two grammar fragments describe
the relevant CVP syntax for English and Ger-
man. Every auxiliary verb governs only one
verb, so the CVP grammar is basically 2 regu- ... Fortunately, the task can be (partly)
automated if the tables associating words with
biases are learned from a corpus. Statistical
approaches also support empirical evaluation of
diffe...
... a word grammar could be learned in
conjunction with this acquisition process, and used
as a disambiguation step.
3 Tests and Results
To test the algorithm, we used 34438 utterances
from the ...
cgdemarc@ai.mit.edu
Abstract
We present work-in-progress on the ma-
chine acquisition of a lexicon from sen-
tences that are each an unsegmented phone
sequence paired with a primitive r...
... mapping which does
not arise from string operations but must instead
be learned.
We used the dataset created by Chen and
Mooney (2008), which contains 1919 scenarios
from the 2001–2004 Robocup ... 3,753
cities in the US (those with population at least
10,000) over three days (February 7–9, 2009) from
www.weather.gov. For each city and date, we
created two scenarios, one for the day fore...