... im-
prove word alignment for languages with
scarce resources using bilingual corpora
of other language pairs. To perform word
alignment between languages L1 and L2,
we introduce a third language ... improve
word alignment for languages with scarce re-
sources using bilingual corpora of other language
pairs. To perform word alignment bet...
... framework for word
alignment that incorporates synonym
knowledge collected from monolingual
linguistic resources in a bilingual proba-
bilistic model. Synonym information is
helpful for word alignment ... occurrences of ‘chief’ and ‘forefront’ with
‘head’ do sometimes harm with word alignment
accuracy, and we have to model either the context
or senses of words.
We propos...
... than 48
GB of memory is not widely available even today.
Therefore, we parallelized the clustering algo-
rithm, to make it suitable for running on a cluster
of PCs with a moderate amount of memory ... Torisawa (2007),
which encodes the matching with a gazetteer entity
using IOB tags, with the modification for Japanese.
They describe using two types of gazetteer features....
... alignment system of
GIZA++.
1 Introduction
Inversion transduction grammar (ITG) (Wu, 1997)
is an adaptation of SCFG to bilingual parsing. It
does synchronous parsing of two languages with
phrasal ...
expanding the list of alignment hypotheses of
minimal number of span pairs.
The first type of pruning is equivalent to mi-
nimizing the number of hypernodes in...
... corpus of
160 million word tokens with a vocabulary size W
of 70K word types. There are 2·W types of context
(columns): The first or second W are counted if the
word c occurs within a window of 10 ... EACL.
Honkela, T. (1997). Self-organizing maps of
words for natural language processing applica-
tions. Proceedings of the International ICSC
Symposium on Soft Computing.
Honkel...
... This tag weight
for each emotion tag has been calculated based
on the frequency of occurrence of an emotion tag
with respect to the total number of occurrences
of all six types of emotion tags ... Bengali part of
speech tagger (Ekbal et al. 2008) based on
Support Vector Machine (SVM) tech-
nique. The POS tagger was developed
with a tagset of 26 POS tags
2
, defined for...
... semantic process-
ing. Other methods use a variety of other informa-
tion: cooccurrence of two words (Burgess, 1998;
Sch¨utze, 1998), occurrence of a word in the sense
definitions of a dictionary (Kasahara ... kind of semantic similarity between
words in the same level of categories or clusters of
the thesaurus, in particular synonyms, antonyms,
and other coordinates. Assoc...
... domain,
with positions for all of its dependents, or a
restricted phrase, which forms the verb cluster,
with no positions for dependents other than
predicative elements. These two kinds of
phrases ... infini-
tives (with zu) and bare infinitives (without zu):
Bare infinitives cannot form an embedded
domain outside of the Vorfeld. Consequently,
there are two different prosodie...
... and repeatedly boosts the
performances of the classifiers by
further classifying data in each of the
two languages and by exchanging
between the two languages
information regarding the classified ... data in both languages, (2) using the
constructed classifiers in each of the languages to
classify some unclassified data and adding them
to the classified training data se...
...
primarily concerned with analysis of language at the
sentence level.
The most glamourous areas of natural language research
are at levels above the sentence, concerned with
dialogues and ... level. Yet, with
regard to most of the topics in this and other
sessions, there is a stronK sense of de~a vu; the
earliest natural language studies featured automatic
extrac...