Báo cáo khoa học: "A DOM Tree Alignment Model for Mining Par

Báo cáo khoa học: "A DOM Tree Alignment Model for Mining Parallel Data from the Web" doc

... web mining scheme for parallel data acquisition. Based on the Document Object Model (DOM) , a web page is represented as a DOM tree. Then a DOM tree alignment model is proposed to identify the ... DOM and Our Document Tree Despite these minor differences, our document tree is still referred as DOM tree throughout this paper. 491 4.1 DOM Tree A...

Ngày tải lên : 08/03/2014, 02:21

8
435
0

Tài liệu Báo cáo khoa học: "A Syntax-Driven Bracketing Model for Phrase-Based Translation" pptx

... used, the SDB model gives a probability to the span s covered by the rule, which estimates the extent to which the span is bracketable. For the unary SDB model, we only consider the features from ... cross boundaries of the lower VP on the right, therefore CBMF is “VP-RC”. 3.3 The Integration of the SDB Model into Phrase-Based SMT We integrate the SDB model...

Ngày tải lên : 20/02/2014, 07:20

9
438
0

Tài liệu Báo cáo khoa học: "A Uniﬁed Syntactic Model for Parsing Fluent and Disﬂuent Speech∗" ppt

... by (Hale et al., 2006) works because the information about the transition to an error state is propagated up the tree, in the form of the -UNF tags. As the parsing chart is ﬁlled in bottom up, ... repair rule set, and so at the top of the tree the EDITED hypothesis is much more likely. How- ever, this requires that several ﬂuent speech rules from the data set be m...

Ngày tải lên : 20/02/2014, 09:20

4
581
0

Tài liệu Báo cáo khoa học: "A Phrase-based Statistical Model for SMS Text Normalization" ppt

... This is the basic function of the channel model for the phrase-based SMS normalization model, where we used the maximum approximation for the sum over all segmentations. Then we further decompose ... able to model the three transfor- mations through the normalization pair (, ) k ka s e   , 36 with the mapping probability . The fol- lowings show the scenari...

Ngày tải lên : 20/02/2014, 12:20

8
399
0

Báo cáo khoa học: "A Generalized Vector Space Model for Text Retrieval Based on Semantic Relatedness" pot

... whether the proposed GVSM can aid the VSM performance, we executed the GVSM in the same retrieved documents. The interpo- lated precision-recall values in the 11-standard recall points for these ... VSM, for the first 4 recall points. For TRECs 4 and 6 we have done the same for the first 9 and 8 recall points respectively. As shown in figure 3, the proposed GVSM may...

Ngày tải lên : 08/03/2014, 21:20

9
394
0

Báo cáo khoa học: "A Class-Based Agreement Model for Generating Accurately Inﬂected Translations" pptx

... morphological information in the target. These relations are best captured in a target- side model because they are mostly unobserved (from lexical clues) in the English source. The agreement model scores ... available; the MT system has yet to generate the rest of the translation when the tagging features for a position are scored. Therefore, we only deﬁne emission featu...

Ngày tải lên : 16/03/2014, 19:20

10
414
0

Báo cáo khoa học: "A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging" potx

... standard. For the whole task, both the boundaries and the POS tag have to be correctly identiﬁed. 4.2 Performance of the Coarse-grained Solvers Table 3 shows the performance on the development data ... “C:±3 T:±1” model performs the same as the “C:±3 T:±2” model. However, the sub-word clas- siﬁcation accuracy of the “C:±3 T:±1” model is higher, so in the followi...

Ngày tải lên : 17/03/2014, 00:20

10
412
0

Báo cáo khoa học: "A Language-Independent Unsupervised Model for Morphological Segmentation" pot

... in two trees, the forward tree and the backward tree. Branches correspond to letters, and nodes are annotated with the total corpus fre- quency of the letter sequence from the root of the tree ... The second step is the afﬁx acquisition step, during which a set of morphemes is identiﬁed from the corpus data. The third step uses these morphemes to segment words. 3...

Ngày tải lên : 17/03/2014, 04:20

8
288
0

Báo cáo khoa học: "A Hierarchical Phrase-Based Model for Statistical Machine Translation" pptx

... some distortion model; 3. translate each of the ¯e i into French phrases ac- cording to a model P( ¯ f | ¯e) estimated from the training data. Other phrase-based models model the joint distribu- tion ... It translates the above example almost exactly as we have shown, the only error being that it omits the word ‘that’ from (6) and therefore (8). These hierarchical phra...

Ngày tải lên : 17/03/2014, 05:20

8
331
0

Báo cáo khoa học: "A Generative Constituent-Context Model for Improved Grammar Induction" docx

... omitted by the treebank. Also shown is F 1 for the induced PCFG. The PCFG shows higher accuracy on small spans, while the CCM is more even. at random from the set of binary trees. 4 This is the unsupervised ... example is the experiments from (Carroll and Charniak, 1992). They restricted the space of gram- mars to those isomorphic to a dependency grammar over the POS s...

Ngày tải lên : 17/03/2014, 08:20

8
316
0