Báo cáo khoa học: "Detecting Erroneous Sentences using Automatically Mined Sequential Patterns" pdf

... Republic, June 2007.c2007 Association for Computational LinguisticsDetecting Erroneous Sentences using Automatically Mined Sequential PatternsGuihua Sun∗Xiaohua Liu Gao Cong Ming ZhouChongqing ... identifying erroneous/ correct sentences. A set of training datacontaining correct and erroneous sentences is given.Unlike some previous work, our technique requiresneither that the erroneous sentences ... and “correct.” To build the learning model,we automatically extract labeled sequential patterns(LSPs) from both erroneous sentences and correct sentences, and use them as input features for...

Báo cáo khoa học: "Robust VPE detection using Automatically Parsed Text" pdf

... Effects of using the empty categories5 Experiments with Automatically Parsed dataThe next set of experiments use the BNC andTreebank, but strip POS and parse information,and parse them automatically ... summarise the ﬁndings :• Using the BNC, which is tagged with a com-plex tagging scheme but has no parse data, itis possible to get 76% F1 using lexical formsand POS data alone• Using the Treebank, ... raised to 68% using extra fea-tures• Parsing the BNC, top performance is 71%,raised to 72% using extra features• Combining the parsed data, top performanceis 67%, raised to 71% using extra...

Tài liệu Báo cáo khoa học: "Grammar Error Correction Using Pseudo-Error Sentences and Domain Adaptation" pdf

... problem by using pseudo-error sentences generated automatically. Fur-thermore, we apply domain adaptation, thepseudo-error sentences are from the sourcedomain, and the real-error sentences ... and the correct sentences. However, col-lecting a sufﬁcient number of pairs is expensive. Toavoid this problem, we use additional corpus con-sisting of pseudo-error sentences automatically gen-erated ... Pseudo-error Sentences and DomainAdaptationThe error corrector described in Section 2 requirespaired sentences. However, it is expensive to col-lect them. We resolve this problem by using pseudo-error...

Tài liệu Báo cáo khoa học: "Extracting Comparative Sentences from Korean Text Documents Using Comparative Lexical Patterns and Machine Learning Techniques" doc

... finally annotated 7,384 sentences. Table 3 shows the number of comparative sentences and non-comparative sentences in our corpus. Table 3. The numbers of annotated sentences Total Comparative ... ([gat]: same)’. But many sentences also ex-press comparison without those keywords. Simi-larly, although some sentences contain some keywords, they cannot be comparative sentences. By these reasons, ... Thus all the sentences can be divided into four categories as follows: Table 2. The four categories of the sentences Single-keyword Contain Not contain Comparative Sentences S1 S2...

Báo cáo khoa học: "Detecting Semantic Relations between Named Entities in Text Using Contextual Features" pdf

... following NE isused as a feature. We call this feature Centering Top(CT).2.4 Using Stack StructureThe sorting algorithm using centering theory tendsto rank highly thoes words that easily become ... arenow so advanced that named entity (NE) taggers arein practical use. Researchers are now focusing onextracting semantic relations between NEs, such as“George Bush (person)” is “president ... Tom13inNew York14.)To solve the above problems, we propose a super-vised learning method using contextual features.The rest of this paper isorganized as follows. Sec-tion 2 describes...

Tài liệu Báo cáo khoa học: "Detecting Semantic Equivalence and Information Disparity in Cross-lingual Documents" doc

... languages, and ii) entail-ment relations between T and H have to bechecked in both directions. Using a combi-nation of lexical, syntactic, and semantic fea-tures to train a cross-lingual textual ... of lexical evidence.When only unidirectional entailment relations fromT to H have to be determined (RTE-like setting), thefull mapping of the hypothesis into the text usuallyprovides enough ... semanticallyaugmented corpora. Finally, we extract the seman-tic phrase table from the augmented aligned corpora using the Moses toolkit (Koehn et al., 2007). Forthe matching phase, we ﬁrst annotate T and H in...

Tài liệu Báo cáo khoa học: "Identifying Text Polarity Using Random Walks" pptx

... achieves better performance by only using WordNet synonym, hypernym and similar to rela-tions. Adding co-occurrence statistics slightly im-proved performance, while using glosses did nothelp at ... and it has a wide variety of applications.We proposed a method for automatically predict-ing the semantic orientation of words using ran-dom walks and hitting time. The proposed methodis based ... handful of seeds is used to deﬁne thetwo polarity classes. The method is exper-imentally tested using a manually labeledset of positive and negative words. It out-performs the state of the art...

Tài liệu Báo cáo khoa học: "Automatic Headline Generation using Character Cross-Correlation" doc

... have shown the effectiveness of using charac-ter cross-correlation in choosing the best headline out of nominated sentences from Arabic document. The advantage of using character cross-correlation ... and S is the total number of sentences. Figure 1: Scaling function of a 1000 nominated headline document. According the nominating mechanism hundreds of sentences could be nominated as ... common subsequence (LCS) of an automatically generated headline and the reference headlines. It is clear that the MAX-CCC scores the highest result in the automatically generated headlines....

Tài liệu Báo cáo khoa học: "Generating research websites using summarisation techniques" pptx

... used. Wedo not use the full paper, as pdfs are not available forall papers in publication pages (due to copyright andother issues). The titles are then parsed using theRASP parser (Briscoe and ... stylesheets are often considered inappropriatefor diverse organisations.Research summary pages using stylesheets canoffer alternative methods of information access andbrowsing, aiding navigation ... needs, but these are time-consuming to create and maintain by hand. We areexploring the idea of automatically generated andupdated web pages that accurately reﬂect the re-search interests being...

Tài liệu Báo cáo khoa học: "Japanese Dependency Parsing Using Co-occurrence Information and a Combination of Case Elements" pdf

... data are as follows:• Training data: 24,263 sentences, 234,474bunsetsus• Development data: 4,833 sentences, 47,580bunsetsus• Test data: 9,287 sentences, 89,982 bunsetsusThe test data contained ... r, and verb v)by using probabilistic latent semantic indexing(PLSI) (Hofmann, 1999)5. If n, r, v is theco-occurrence of n and r, v, we can calculateP (n, r, v) by using the following ... to rerank the results obtained using anexisting machine learning based parsingmethod showed that our method can im-prove the accuracy of the results obtained using the existing method.1 IntroductionDependency...

Xem thêm