... Republic, June 2007.
c
2007 Association for Computational Linguistics
Detecting Erroneous Sentences using Automatically Mined Sequential
Patterns
Guihua Sun
∗
Xiaohua Liu Gao Cong Ming Zhou
Chongqing ... identifying
erroneous/ correct sentences. A set of training data
containing correct and erroneous sentences is given.
Unlike some previous work, our technique requires
neit...
... Effects of using the empty categories
5 Experiments with Automatically
Parsed data
The next set of experiments use the BNC and
Treebank, but strip POS and parse information,
and parse them automatically ... summarise the findings :
• Using the BNC, which is tagged with a com-
plex tagging scheme but has no parse data, it
is possible to get 76% F1 using lexical forms
and POS data alon...
... problem by using pseudo-
error sentences generated automatically. Fur-
thermore, we apply domain adaptation, the
pseudo-error sentences are from the source
domain, and the real-error sentences ... and the correct sentences. However, col-
lecting a sufficient number of pairs is expensive. To
avoid this problem, we use additional corpus con-
sisting of pseudo-error sentences automat...
... finally annotated 7,384 sentences. Table 3
shows the number of comparative sentences and
non-comparative sentences in our corpus.
Table 3. The numbers of annotated sentences
Total Comparative ... ([gat]: same)’. But many sentences also ex-
press comparison without those keywords. Simi-
larly, although some sentences contain some
keywords, they cannot be comparative sentence...
... following NE is
used as a feature. We call this feature Centering Top
(CT).
2.4 Using Stack Structure
The sorting algorithm using centering theory tends
to rank highly thoes words that easily become ... are
now so advanced that named entity (NE) taggers are
in practical use. Researchers are now focusing on
extracting semantic relations between NEs, such as
“George Bush (person)” is “presi...
... languages, and ii) entail-
ment relations between T and H have to be
checked in both directions. Using a combi-
nation of lexical, syntactic, and semantic fea-
tures to train a cross-lingual textual ... of lexical evidence.
When only unidirectional entailment relations from
T to H have to be determined (RTE-like setting), the
full mapping of the hypothesis into the text usually
provides eno...
... achieves better performance by only using
WordNet synonym, hypernym and similar to rela-
tions. Adding co-occurrence statistics slightly im-
proved performance, while using glosses did not
help at ... and it has a wide variety of applications.
We proposed a method for automatically predict-
ing the semantic orientation of words using ran-
dom walks and hitting time. The proposed metho...
... have shown the effectiveness of using charac-
ter cross-correlation in choosing the best headline
out of nominated sentences from Arabic document.
The advantage of using character cross-correlation ... and
S
is
the total number of sentences.
Figure 1: Scaling function of a 1000 nominated
headline document.
According the nominating mechanism hundreds
of sentences could be...
... used. We
do not use the full paper, as pdfs are not available for
all papers in publication pages (due to copyright and
other issues). The titles are then parsed using the
RASP parser (Briscoe and ... stylesheets are often considered inappropriate
for diverse organisations.
Research summary pages using stylesheets can
offer alternative methods of information access and
browsing, aiding na...
... data are as follows:
• Training data: 24,263 sentences, 234,474
bunsetsus
• Development data: 4,833 sentences, 47,580
bunsetsus
• Test data: 9,287 sentences, 89,982 bunsetsus
The test data contained ... r, and verb v)
by using probabilistic latent semantic indexing
(PLSI) (Hofmann, 1999)
5
. If n, r, v is the
co-occurrence of n and r, v, we can calculate
P (n, r, v) by using the...