... generation.
In this paper, we exploit multiple resources to
improve the SMT-based paraphrase generation. In
detail, six kinds of resources are utilized, includ-
ing: (1) an automatically constructed thesaurus, ... USA, June 2008.
c
2008 Association for Computational Linguistics
Combining Multiple Resources to Improve SMT-based Paraphrasing Model
∗
Shiqi Zhao
1
, Cheng...
... chose well
formatted resources (or manually format the re-
source) so as to get reliable and usable results;
semi-automatic rather than fully automatic ap-
proach is adopted to ensure accuracy; ... domain-specific
knowledge may need to be added to the lexicon.
The problem of how to adapt a general lexicon
to a particular application domain and merge
domain ontologies with...
... automatic method to create
a thesaurus that is sensitive to the sentiment of
words expressed in different domains.
• We describe a method to use the created the-
saurus to expand feature vectors ... vector d ∈ R
N
, where the
value of the j-th element d
j
is set to the total number
of occurrences of the unigram or bigram w
j
in the
review d. To find the suitable candidates to exp...
... dialogue
segmentation and topic labels. In the annotation pro-
cess, annotators were given the freedom to subdi-
vide a segment into subsegments to indicate when
the group was discussing a subtopic. Annotators
were ... of automatic dia-
logue segmentation is often considered as similar to
the problem of topic segmentation. Therefore, re-
search has adopted techniques previously develope...
...
varies sufficiently from language to language to
make automatic extraction difficult. Together,
these allow phrases like this (taken from the
French Wikipedia) to be correctly marked in its
entirety ... following
forms of the verb to be” to derive a label. For ex-
ample, they used the sentence “Franz Fischler is
an Austrian politician” to associate the label “poli-
tician”...
... hard and time-consuming task
to hand-align bilingual data, the automation of this
task receives a fair amount of attention. In this pa-
per, we present an approach to improve the bilin-
gual dictionary ... dictionaries Dict0.01 for
up to one link per word
rebuilding algorithm is independent of the actual
word alignment method used.
Furthermore, we plan to explore ways to improve...
... set.
The simplest way to match query tokens to snip-
pet tokens is to allow a query token to match any
snippet token. This can be problematic when we
have queries that have a token repeated with ... surprisingly pow-
erful one – is to POS tag some relevant snippets for
238
a given query, and then to transfer the tags from the
snippet tokens to matching query tokens. This “di-
re...
... in comparison to documents
from other domains: Turney (2002) observes that
the movie reviews are hardest to classify since the
review authors tend to give information about the
storyline of the ... are about. This
aboutness has been referred to as the opinion tar-
get or opinion topic in the literature from the field.
In this work our goal is to extract opinion target
- opinion word...
... N-grams+lem.
to the improvements gained in Table 3). The largest
improvement comes with the addition of the bigram
(thus introducing context into the model), but the tri-
gram provides only a slight improvement ... handwritten text may be too small,
light or closely-spaced to readily distinguish, caus-
ing the system to drop them entirely. While Arabic
disconnective letters may make it...
... and shuffled to produce a
mutant library, the members of which were then moni-
tored for their ability to confer increased TMP resis-
tance when fused to DHFR. The genes corresponding
to resistant ... ligand to a residue that is unli-
kely to bind metal. The G270V residue is located next
to a metal-binding residue ) this mutation is likely to
cause a conformational change that...