0

improving statistical machine translation by paraphrasing the training data

Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Improving Statistical Machine Translation with Monolingual Collocation" pdf

Báo cáo khoa học

... alignments on the phrase-based SMT sys-tem. The bi-directional alignments are obtained 830 Figure 3. Example of the translations generated by the baseline system and the system where the phrase ... that of the translation "can we avoid". Thus, our method selects the former as the translation. Although the phrase "我们 必须 采取 有效 措施" in the source sentence has the same ... the systems using the improved bi-directional alignments achieve higher quality of translation than the baseline system. If the same alignment method is used, the systems using CM-3 got the...
  • 9
  • 474
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Bucking the Trend: Large-Scale Cost-Focused Active Learning for Statistical Machine Translation" docx

Báo cáo khoa học

... SetupWe set out to see whether we could use the HNGmethod to achieve translation quality improve-ments by gathering additional translations to addto the training data of the entire LDC languagepack, ... widely used metric in the MT commu-nity. The x-axis measures the number of sen-tence translation pairs in the training data. The VGcurves are cut off at the point at which the stoppingcriterion ... returns. We further confirmed this by runninga least-squares linear regression on the points of the last 700,000 words annotated in the LDC data and also for the points in the new data that we...
  • 11
  • 580
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Unsupervised Search for The Optimal Segmentation for Statistical Machine Translation" doc

Báo cáo khoa học

... respectively.trained on the English side of the training cor-pus using the SRILM toolkit (Stolcke, 2002). The BLEU metric (Papineni et al., 2002) was used for translation evaluation.Figure 1 compares the translation ... 2006. Word-basedalignment, phrase-based translation: What’s the link? In Proceedings of the 7th Conference of the Association for Machine Translation in the Ameri-cas (AMTA-06), pages 90–99.Franz ... Oflazer. 2006.Initial explorations in English to Turkish statistical machine translation. In Proceedings of the Work-shop on Statistical Machine Translation, pages 7–14, New York City, New York,...
  • 6
  • 445
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Translation Model Adaptation for Statistical Machine Translation with Monolingual Topic Information" doc

Báo cáo khoa học

... represents the probability thattf inis not the topic of the phrase˜f. Similarly,P (¯tf in|fj) indicates the probability that tf inis not the topic of the word fj. The other method ... andcontext-dependent translation selection both ofwhich put emphasis on solving the translation ambiguity by the exploitation of the context in-formation at the sentence level, we adopt the topical ... that the more data, the better translation quality when the corpus size is less than 30K. The overall BLEUscores corresponding to the range of great N val-ues are generally higher than the ones...
  • 10
  • 533
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Modified Distortion Matrices for Phrase-Based Statistical Machine Translation" doc

Báo cáo khoa học

... thatis parsed by the decoder to modify the distortionmatrix just before starting the search. As usual, the distortion matrix is queried by the distortion penaltygenerator and by the hypothesis expander9.7.1 ... hypothesis, thusthey are not affected by changes in the matrix.10That is everything except the small GALE corpus and the UN corpus. As reported by Green et al. (2010) the removal ofUN data ... Arabic-English statistical machine translation. Machine Trans-lation, Published Online.David Chiang. 2005. A hierarchical phrase-based modelfor statistical machine translation. In Proceedings ofthe...
  • 10
  • 473
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "A Ranking-based Approach to Word Reordering for Statistical Machine Translation" doc

Báo cáo khoa học

... parallel data, viewing the source tree nodes to be reorderedas list items to be ranked. The ranks of tree nodes aredetermined by their relative positions in the targetlanguage – the node in the ... gets the high-est rank, while the ending word in the target sentencegets the lowest rank. The ranking model is trainedto directly minimize the mis-ordering of tree nodes,which differs from the ... English-Hindi Statistical Machine Translation. In Proc. IJCNLP.Roy Tromble. 2009. Search and Learning for the Lin-ear Ordering Problem with an Application to Machine Translation. Ph.D. Thesis.Karthik...
  • 9
  • 615
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Mixing Multiple Translation Models in Statistical Machine Translation" docx

Báo cáo khoa học

... hypotheses. Union, on the other hand, useshypotheses from all the phrase tables. The featureset of these hypotheses are expanded to include onefeature set for each table. However, for the ... only the scores of the best rules, the model with the highest weighted sum of the probabil-ities of the rules wins. This sum has totake into account the translation table limit(ttl), on the ... shows the results of the baselines. The firstgroup are the baseline results on the phrase-basedsystem discussed in Section 2 and the second groupare those of our hierarchical MT system. Since the Hiero...
  • 10
  • 456
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Bilingual Sense Similarity for Statistical Machine Translation" ppt

Báo cáo khoa học

... of the rule (a particular phrase pair in the training corpus), the context will be the words instantiating the non-terminals. In turn, the context for the sub-phrases that instantiate the ... large data (CE_LD) and small data (CE_SD) NIST task by applying one feature. 6.2 Effect of Combining the Two Similari-ties We then combine the two similarity scores by using both of them ... different languages by using their contexts. Second, we use the sense similarities between the source and target sides of a translation rule to improve statistical machine translation perfor-mance....
  • 10
  • 594
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Corpus Expansion for Statistical Machine Translation with Semantic Role Label Substitution Rules" doc

Báo cáo khoa học

... algorithm during the substitution. If the inserted phrase has a prefix or suffix sub-phrasethat is the same as the suffix or prefix of the adjacentparts of the original sentence, then the duplicationwill ... scoresof the experiments. As we can see in the results, by using only the generated sentence pairs, the per-formance of the system drops. However the inter-polated phrase tables outperform the baseline. ... Because of this, We use the fea-tures in the phrase table to sort the rules, and keep100 rules with highest the arithmetic mean of the feature values. The second problem is the phrase boundaries...
  • 5
  • 416
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Phrase-Based Statistical Machine Translation as a Traveling Salesman Problem" docx

Báo cáo khoa học

... time for the twocorpuses.Since in the real translation task, the size of the TSP graph is much larger than in the artificial re-ordering task (in our experiments the median sizeof the TSP graph ... the source side of b (whatever their rel-ative positions in the source sentence), and to pro-ducing the target side of bdirectly after the targetside of b; the transition cost is then the ... between the reconstructed and the originalsentences, which allows us to check how well the quality of reconstruction correlates with the inter-nal score. The training dataset for learning the LMconsists...
  • 9
  • 438
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Name Translation in Statistical Machine Translation Learning When to Transliterate" pptx

Báo cáo khoa học

... the transliterations then competewith the translations generated by the gen-eral SMT system. This means that the MTsystem will not always use the transliteratorsuggestions, depending on the ... substringpairs where the English side is preceded by the let-ters a, o, or u. The fourth rule specifies a cost of 0.1if the substrings occur at the end of (both) names,0.2 otherwise. According to the fifth ... fifth rule, the Ara-bic lettermay match an empty string on the En-glish side, if there is an English consonant (EC) in the right context of the English side. The total cost is computed byalways...
  • 9
  • 545
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Segmentation for English-to-Arabic Statistical Machine Translation" ppt

Báo cáo khoa học

... without the table difficult.3.3 Factored ModelsFor the Factored Translation Models experiment, the factors on the English side are the POS tags and the surface word. On the Arabic side, we use the ... Arabic. The Factored Translation Models experiments uses the MOSES system.4.1 Data UsedWe experiment with two domains: text news andspoken dialogue from the travel domain. For the news training data ... later, weobserve the same trend, which is due to the fact that the model becomes less sparse with more training data. There has been work on translating from En-glish to other morphologically...
  • 4
  • 374
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Combination of Arabic Preprocessing Schemes for Statistical Machine Translation" ppt

Báo cáo khoa học

... forPhrase-based Statistical Machine Translation Mod-els. In Proc. of the Association for Machine Trans-lation in the Americas (AMTA).P. Koehn. 2004b. Statistical Significance Tests for Machine Translation ... able to disambiguate amongst the possibleanalyses of a word, identify the features addressed by the scheme in the chosen analysis and processthem as specified by the scheme. In this sectionwe ... constructed from the Arabic side of the par-allel corpus used in the MT experiments (Sec-tion 5).Obviously the more verbose a scheme is, the bigger the number of tokens in the text. The ST,ON,...
  • 8
  • 295
  • 0

Xem thêm

Tìm thêm: hệ việt nam nhật bản và sức hấp dẫn của tiếng nhật tại việt nam khảo sát chương trình đào tạo gắn với các giáo trình cụ thể xác định thời lượng học về mặt lí thuyết và thực tế tiến hành xây dựng chương trình đào tạo dành cho đối tượng không chuyên ngữ tại việt nam điều tra đối với đối tượng giảng viên và đối tượng quản lí điều tra với đối tượng sinh viên học tiếng nhật không chuyên ngữ1 khảo sát thực tế giảng dạy tiếng nhật không chuyên ngữ tại việt nam khảo sát các chương trình đào tạo theo những bộ giáo trình tiêu biểu nội dung cụ thể cho từng kĩ năng ở từng cấp độ xác định mức độ đáp ứng về văn hoá và chuyên môn trong ct phát huy những thành tựu công nghệ mới nhất được áp dụng vào công tác dạy và học ngoại ngữ mở máy động cơ rôto dây quấn các đặc tính của động cơ điện không đồng bộ hệ số công suất cosp fi p2 đặc tuyến mômen quay m fi p2 đặc tuyến tốc độ rôto n fi p2 sự cần thiết phải đầu tư xây dựng nhà máy phần 3 giới thiệu nguyên liệu từ bảng 3 1 ta thấy ngoài hai thành phần chủ yếu và chiếm tỷ lệ cao nhất là tinh bột và cacbonhydrat trong hạt gạo tẻ còn chứa đường cellulose hemicellulose chỉ tiêu chất lượng 9 tr 25