0

using bilingual parallel corpora

Báo cáo khoa học:

Báo cáo khoa học: "Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment" pptx

Báo cáo khoa học

... more generic multilingual resources (e.g bilin-gual dictionaries).3 Using Parallel Corpora for CLTE Bilingual parallel corpora represent a possible solu-tion to overcome the inadequacy of ... TE.(4) Can parallel corpora be useful also for mono-lingual TE? To answer this question, we experimenton monolingual RTE datasets using paraphrase ta-bles extracted from bilingual parallel corpora. ... extractedfrom bilingual corpora, we conducted a series of ex-periments using the different resources mentioned inSection 4.2.As it can be observed in Table 1, the highestresults are achieved using...
  • 10
  • 284
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Paraphrasing with Bilingual Parallel Corpora" pot

Báo cáo khoa học

... word- andsentence-aligned parallel corpora. In Proceedings ofACL.Mona Diab and Philip Resnik. 2002. An unsupervisedmethod for word sense tagging using parallel corpora. In Proceedings of ... andcomfort as console.While monolingual parallel corpora often haveidentical contexts that can be used for identifyingparaphrases, bilingual parallel corpora do not. In-stead, we use phrases ... probability to include multiple corpora, as follows:ˆe2= arg maxe2=e1Cf in Cp(f|e1)p(e2|f) (5)where C is a parallel corpus from a set of parallel corpora. For this condition we...
  • 8
  • 308
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Word Alignment for Languages with Scarce Resources Using Bilingual Corpora of Other Language Pairs" pptx

Báo cáo khoa học

... amounts of bilingual data are available for the desired language pair L1-L2, large-scale bilin-gual corpora in L1-L3 and L2-L3 are available. Using these two additional bilingual corpora, we ... as compared with the method using the two corpora in L1-L3 and L3-L2, and a relative error rate reduction of 21.30% as compared with the method using the small bilingual corpus in L1 and ... Robert Gaizauskas. 2005. Aligning Words in English-Hindi Parallel Corpora. In Proc. of the ACL 2005 Workshop on Building and Using Parallel Texts: Data-driven Machine Translation and Beyond,...
  • 8
  • 359
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora" potx

Báo cáo khoa học

... those with parallel data, we can still obtain improvements using the pseudo -parallel data, especially in the first setting. The difference between using parallel versus pseudo -parallel data ... show the performance of our model using the pseudo- parallel data versus the real parallel data, in the two settings, respectively. The EN->CH pseudo- parallel data consists of the English ... is reasonable since the quality of the pseudo -parallel data is not as good as that of the parallel data. Therefore, the performance using pseudo -parallel data is better with a small weight (e.g....
  • 11
  • 302
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Exploring Syntactic Structural Features for Sub-Tree Alignment using Bilingual Tree Kernels" docx

Báo cáo khoa học

... Conclusion In this paper, we explore syntactic structure fea-tures by means of Bilingual Tree Kernels and ap-ply them to bilingual sub-tree alignment along with various lexical and plain structural ... translation, tree kernels are seldom applied. In this paper, we propose Bilingual Tree Ker-nels (BTKs) to model the bilingual translational equivalences, in our case, to conduct sub-tree alignment. ... structures. We propose two kinds of BTKs named dependent Bilingual Tree Kernel (dBTK), which takes the sub-tree pair as a whole and independent Bilingual Tree Kernel (iBTK), which individually models...
  • 10
  • 467
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "ParaSense or How to Use Parallel Corpora for Word Sense Disambiguation" pdf

Báo cáo khoa học

... as there are parallel corpora avail-able for the targeted languages. Although large mul-tilingual corpora are still rather scarce, we stronglybelieve there will be more parallel corpora availablein ... metrics.Several studies have already shown the validityof using parallel corpora for sense discrimination(e.g. (Ide et al., 2002)), for bilingual WSD mod-ules (e.g. (Gale and Church, 1993; Ng ... disambiguation using a second language monolingual corpus. Compu-tational Linguistics, 20(4):563–596.M. Diab and P. Resnik. 2002. An Unsupervised Methodfor Word Sense Tagging Using Parallel Corpora. ...
  • 6
  • 537
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Semi-Supervised Learning of Partial Cognates using Bilingual Bootstrapping" doc

Báo cáo khoa học

... monolingual bootstrapping technique, we also use bilingual bootstrapping. Diab (2002) has shown that unsupervised WSD systems that use parallel corpora can achieve results that are close to ... Some improvement might be achieved when using lemmatization. We wanted to see how well we can do by using sentences as they are extracted from the parallel corpus, with no additional pre-processing ... we use the parallel data in a dif-ferent way: we use words from parallel sentences as features for Machine Learning (ML). Li and Li (2004) have shown that word translation and bilingual bootstrapping...
  • 8
  • 418
  • 1
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Context-dependent SMT Model using Bilingual Verb-Noun Collocation" doc

Báo cáo khoa học

... automaticallyacquired from the chunk-aligned bilingual corpora. 4.1 Automatic Extraction of Bilingual Verb-Noun Collocation(BiVN)To automatically extract the bilingual verb-nouncollocations, we utilize ... ob-tain more semantically plausible transla-tion results, we use bilingual verb-nouncollocations; these are automatically ex-tracted by using chunk alignment and amonolingual dependency parser. ... Arbor, June 2005.c2005 Association for Computational LinguisticsContext-dependent SMT Model using Bilingual Verb-Noun CollocationYoung-Sook HwangATR SLT Research Labs2-2-2 Hikaridai Seika-choSoraku-gun...
  • 8
  • 304
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: " Word Translation Disambiguation Using Bilingual Bootstrapping" doc

Báo cáo khoa học

... Learning, vol. 34, pp. 107-130. G. Kikui, 1999. Resolving Translation Ambiguity Using Non -parallel Bilingual Corpora. In Proceedings of ACL ’99 Workshop on Unsupervised Learning in Natural ... −←γγ;} Output: classifiers in English and Chinese Figure 2: Bilingual Bootstrapping Word Translation Disambiguation Using Bilingual Bootstrapping Cong Li Microsoft Research Asia 5F ... paper proposes a new method for word translation disambiguation using a machine learning technique called Bilingual Bootstrapping’. Bilingual Bootstrapping makes use ofin learning a small...
  • 9
  • 480
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "ALIGNING SENTENCES IN PARALLEL CORPORA" doc

Báo cáo khoa học

... sentences of each length less tha.n 8]. We estimated the probabilities 173 ALIGNING SENTENCES IN PARALLEL CORPORA Peter F. Brown, Jennifer C. Lai, a, nd Robert L. Mercer IBM Thomas J. Watson Research ... describe a statistical tech- nique for aligning sentences with their translations in two parallel corpora. In addition to certain anchor points that are available in our da.ta, the only information ... sentence, or even a whole passage, may be missing from one or the other of the corpora. If a person is given two parallel texts and asked to match up the sentences in them, it is na.tural for...
  • 8
  • 387
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora" doc

Báo cáo khoa học

... primary lexicon using the scores. A threshold is applied to the DTW score of each pair, selecting the most correlated pairs as the first bilingual lexicon. 5. Find anchor points using the primary ... secondary lexicon. 3 Finding high frequency bilingual word pairs When the sentence alignments for the corpus are un- known, standard techniques for extracting bilingual lexicons cannot apply. To ... in its translation 1, suggest- ing a discontinuous mapping between some parallel texts. We have previously shown that using a vector rep- resentation of the frequency and positional informa-...
  • 8
  • 426
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Extracting Parallel Sub-Sentential Fragments from Non-Parallel Corpora" pdf

Báo cáo khoa học

... for parallel data acquisition is highly benefi-cial for the SMT field.Comparable corpora exhibit various degrees ofparallelism. Fung and Cheung (2004a) describe corpora ranging from noisy parallel, ... very non -parallel. Corpora fromthe last category contain “ disparate, very non- parallel bilingual documents that could either beon the same topic (on-topic) or not”. This is thekind of corpora ... informationconcerning these corpora. 3.2 Extraction ExperimentsOn each of our comparable corpora, and using each of our initial parallel corpora, we applyboth the fragment extraction and the sentence ex-traction...
  • 8
  • 263
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Unsupervised Sense Disambiguation Using Bilingual Probabilistic Models" pdf

Báo cáo khoa học

... have presented two novel probabilistic modelsfor unsupervised word sense disambiguation using parallel corpora and have shown that both modelsoutperform existing unsupervised approaches. Inaddition, ... experiments with real data, we make use ofthe parallel corpora constructed by Diab and Resnik(2002) for evaluation purposes. We chose to workon these corpora in order to permit a direct compar-ison ... unsuper-vised word-sense disambiguation using parallel cor-pora. The first model, which we call the Sensemodel, builds on the work of Diab and Resnik(2002) that uses both parallel text and a sense in-ventory...
  • 8
  • 361
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Unsupervised Learning of Arabic Stemming using a Parallel Corpus" pot

Báo cáo khoa học

... Use the small parallel corpus• Step 2: (optional) Use the monolingual corpusThe two steps are described in detail in the fol-lowing subsections.2.1 Step 1: Using the Small Parallel CorpusFigure ... 55–60.Mona Diab and Philip Resnik. 2002. An unsupervisedmethod for word sense tagging using parallel corpora. In Proceedings of the 40th Annual Meeting of the As-sociation for Computational ... multilingual text analysis tools via ro-bust projection across aligned corpora. Unsupervised Learning of Arabic Stemming using a Parallel CorpusMonica Rogati†Computer Science Department,Carnegie...
  • 8
  • 424
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "AUTOMATIC ALIGNMENT IN PARALLEL CORPORA" potx

Báo cáo khoa học

... (associated with content words) reduces the number of parameters 335 AUTOMATIC ALIGNMENT IN PARALLEL CORPORA Harris Papageorgiou, Lambros Cranias, Stelios Piperidis I Institute for Language ... the optimum alignment of units. The proposed scheme has been tested at sentence level on parallel corpora of the CELEX database. The success rate exceeded 99%. The next steps of the work ... scheme's efficiency at lower levels endowed with necessary bilingual information about potential delimiters. INTRODUCTION Parallel linguistically meaningful text units are indispensable...
  • 3
  • 193
  • 0

Xem thêm