... correct Chinese
translation of the ambiguous English word, given
an English sentence which contains the word.
Word translation disambiguation is actually a
special case of word sense disambiguation ... 1999).
3 Bilingual Bootstrapping
3.1 Overview
Instead of using Monolingual Bootstrapping, we
propose a new method for word translation
disambiguation using Bili...
... between common English words and medical
terms. We measured word frequency by "disease occur-
rence", (the number of disease definitions in which a
given word occurs one or more ...
tion measure, A.
Word pairs which are found to be highly associated,
appear to do so for two reasons. The test, which is
trivial, is that some word pairs are semantically one
word despite...
... windows. The “0”
tag means that the word is outside of any
chunk. The “I-XP” tag means that this
word is inside an XP chunk. The “B-XP”
by default means that the word is at the
beginning of an XP ... The clustering result using K-means; (c) Three
elongated clusters in the 2D clustering space using
Spectral clustering: two dominant eigenvectors; (d)
The clustering result using Spec...
... contains
204K words (14K sentences, 946 documents), the
test set contains 46K words (3.5K sentences, 231
documents), and the development set contains
51K words (3.3K sentences, 216 documents).
We ... induce word
features—or to download word features that have
already been induced—plug these word features
into an existing system, and observe a significant
increase in accuracy. But which...
... a
method for using synonym information effectively
to improve word alignment quality.
In general, synonym relations are defined in
terms of word sense, not in terms of word form. In
other words, synonym ... models into low-frequency word
pairs in bilingual sentences, and then improved the
word alignment performance. The SRH regards
all of the different words coupled with the same...
... of a word w, the hits near positive
(P words) and negative (Nwords) seed words is
used. The SO-PMI equation is given as
SO-PMI (word) =
log
2
pword∈P words
hits (word NEAR pword)
nword∈Nwords
hits (word ... =
log
2
pword∈P words
hits (word NEAR pword)
nword∈Nwords
hits (word NEAR nword)
×
nword∈Nwords
hits(nword)
pword∈P words
hits(pword)
5.2 Data Acquisition
We used t...
... this translation.
• Is-Best percentage: how often the translation
was top-ranked among the four translations.
• Is-Better percentage: how often the translation
was judged as the better translation, ... Introduction
In natural language processing research, translations
are most often used in statistical machine translation
(SMT), where systems are trained using bilingual
sentence-...
... cap-
italized words (with a few exceptions).
We use a list of about 200 Arabic and English
stopwords and stopword pairs.
We use lists of countries and their adjective
forms to bridge cross-POS translations ... the NEWA metric (section 2) to both
our SMT translations as well as the four human ref-
erence translations, using both the original named-
entity translation annotation and the re-...
... Gigaword have billions of words,
but the parallel data has only about 30 million words.
Step-4 and -5 are natural ways to integrate the ab-
breviation translation component with the baseline
translation ... 0.133
phrase translation 0.066 0.023
lexical translation 0.061 0.078
reverse phrase translation 0.059 0.103
reverse lexical translation 0.112 0.090
phrase penalty -0.150 -0.162...
... emotion (Roth et.al., 2005).
SentiWordNet emotion word: A word
appearing in the SentiWordNet (Bengali)
contains an emotion.
Reduplication: The reduplicated words
(e.g., bhallo bhallo [good ... likely emotion words.
Question words: It has been observed
that the question words generally contrib-
ute to the emotion in a sentence.
Colloquial / Foreign words: The collo-
quial wo...