... two approaches
using unlabeled data in text categorization; one
approach combines unlabeled data and labeled data,
and the other approach uses the clustering
technique for text categorization. ... labeled data.
While labeled data are difficult to obtain,
unlabeled data are readily available and plentiful.
Therefore, this paper advocates using a
bootstrap...
... learning,
definitions are used to create and enrich concepts
with textual information (Gangemi et al., 2003),
and extract taxonomic and non-taxonomic rela-
tions (Snow et al., 2004; Navigli and Velardi,
2006; Navigli, ... |s
k
|} and b ∈ {1, . . . , |s
j
|},
S
a,b
is a score of the matching between the a-th
token of s
k
and the b-th token of s
j
, and M
0,0
,
M
0,b
and M
a,0...
... words;
the collection of σs form the lexicon. Each unit
σ is present in a segmentation with some context
c = (φ
l
, φ
r
) of the form φ
l
σφ
r
. Features based on
the context and the unit itself parameterize ... corpus with segmentations
and corresponding features. The notation m ih/1:1
represents unit/label :feature- value. Overlapping context
features capture rich segmentation regul...
... proposed for
sentence compression (Witbrock and Mittal, 1999;
Jing and McKeown, 1999; Vandeghinste and Pan,
2004), this paper focuses on Knight and Marcu’s
noisy-channel model (Knight and Marcu, ... Y), is produced, and
851
contextual information, x (∈ X ), is observed. To
represent whether the event (x, y) satisfies a cer-
tain feature, we introduce a feature function. A
fea...
... post-hydrolysis. Therefore, the dissoci-
ation of phosphate and ⁄ or ADP is likely to be responsible for resetting
of the transporter. The data indicate that, like ABCB1 and ABCC1,
the ‘power stroke’ for translocation ... provided the most information. For two of the
proteins, ABCB1 and ABCC1, it has been demon-
strated that the binding of nucleotide imparts marked
and essenti...
... null
mutants [72]. Immunoprecipitation and ligand binding
studies [21] confirmed that a4b2* (with possible inclusion
of a5 subunits) and a6b2* (with possible inclusion of a4
and b3 subunits) are the main ... obtained with EpI. This
a-conotoxin was originally characterized on rat intracardiac
ganglia neurons and bovine chromaffin cells and assumed to
be selective for a3b2anda3b4...
... paraphrasers”, with the result that there are
no readily available large corpora and no consistent
standards for what constitutes a high-quality para-
phrase. In addition to the lack of standard datasets
for ... our data collection framework for
use on crowdsourcing platforms such as Amazon’s
Mechanical Turk. Crowdsourcing can allow inex-
pensive and rapid data collection for...
... the infor-
mation space, the current search engine paradigm
does not provide enough assistance for these kind
of searches. The user has to read through the docu-
ments and then eventually reformulate ... actually
labeled with the specific relation that exists between
the nodes.
In this way the user can explore in an uniform way
both new information nuggets and validated back-
ground in...
... Outilex, a generalist linguis-
tic platform for text processing. The plat-
form includes several modules implement-
ing the main operations for text processing
and is designed to use large-coverage ... cov-
erage for French and English, originating from the
former LADL
1
, will be distributed with the plat-
form under LGPL-LR
2
license.
The platform aims to be a generalist bas...
... #
of features, and metrics used. Our MT models are trained
with standard phrase-based Moses software (Koehn and
others, 2007), with IBM M4 alignments, 4gram SRILM,
lexical ordering for PubMed and ... combining metrics
using machine learning for better correlation with
human judgments (Liu and Gildea, 2007; Albrecht
and Hwa, 2007; Gimnez and M
`
arquez, 2008) and
may...