... more generic multilingual resources (e.g bilin-gual dictionaries).3 UsingParallelCorpora for CLTE Bilingual parallelcorpora represent a possible solu-tion to overcome the inadequacy of ... TE.(4) Can parallelcorpora be useful also for mono-lingual TE? To answer this question, we experimenton monolingual RTE datasets using paraphrase ta-bles extracted from bilingualparallel corpora. ... extractedfrom bilingual corpora, we conducted a series of ex-periments using the different resources mentioned inSection 4.2.As it can be observed in Table 1, the highestresults are achieved using...
... word- andsentence-aligned parallel corpora. In Proceedings ofACL.Mona Diab and Philip Resnik. 2002. An unsupervisedmethod for word sense tagging usingparallel corpora. In Proceedings of ... andcomfort as console.While monolingual parallelcorpora often haveidentical contexts that can be used for identifyingparaphrases, bilingualparallelcorpora do not. In-stead, we use phrases ... probability to include multiple corpora, as follows:ˆe2= arg maxe2=e1Cf in Cp(f|e1)p(e2|f) (5)where C is a parallel corpus from a set of parallel corpora. For this condition we...
... amounts of bilingual data are available for the desired language pair L1-L2, large-scale bilin-gual corpora in L1-L3 and L2-L3 are available. Using these two additional bilingual corpora, we ... as compared with the method using the two corpora in L1-L3 and L3-L2, and a relative error rate reduction of 21.30% as compared with the method using the small bilingual corpus in L1 and ... Robert Gaizauskas. 2005. Aligning Words in English-Hindi Parallel Corpora. In Proc. of the ACL 2005 Workshop on Building and Using Parallel Texts: Data-driven Machine Translation and Beyond,...
... those with parallel data, we can still obtain improvements using the pseudo -parallel data, especially in the first setting. The difference between usingparallel versus pseudo -parallel data ... show the performance of our model using the pseudo- parallel data versus the real parallel data, in the two settings, respectively. The EN->CH pseudo- parallel data consists of the English ... is reasonable since the quality of the pseudo -parallel data is not as good as that of the parallel data. Therefore, the performance using pseudo -parallel data is better with a small weight (e.g....
... Conclusion In this paper, we explore syntactic structure fea-tures by means of Bilingual Tree Kernels and ap-ply them to bilingual sub-tree alignment along with various lexical and plain structural ... translation, tree kernels are seldom applied. In this paper, we propose Bilingual Tree Ker-nels (BTKs) to model the bilingual translational equivalences, in our case, to conduct sub-tree alignment. ... structures. We propose two kinds of BTKs named dependent Bilingual Tree Kernel (dBTK), which takes the sub-tree pair as a whole and independent Bilingual Tree Kernel (iBTK), which individually models...
... as there are parallelcorpora avail-able for the targeted languages. Although large mul-tilingual corpora are still rather scarce, we stronglybelieve there will be more parallelcorpora availablein ... metrics.Several studies have already shown the validityof usingparallelcorpora for sense discrimination(e.g. (Ide et al., 2002)), for bilingual WSD mod-ules (e.g. (Gale and Church, 1993; Ng ... disambiguation using a second language monolingual corpus. Compu-tational Linguistics, 20(4):563–596.M. Diab and P. Resnik. 2002. An Unsupervised Methodfor Word Sense Tagging UsingParallel Corpora. ...
... monolingual bootstrapping technique, we also use bilingual bootstrapping. Diab (2002) has shown that unsupervised WSD systems that use parallelcorpora can achieve results that are close to ... Some improvement might be achieved when using lemmatization. We wanted to see how well we can do by using sentences as they are extracted from the parallel corpus, with no additional pre-processing ... we use the parallel data in a dif-ferent way: we use words from parallel sentences as features for Machine Learning (ML). Li and Li (2004) have shown that word translation and bilingual bootstrapping...
... automaticallyacquired from the chunk-aligned bilingual corpora. 4.1 Automatic Extraction of Bilingual Verb-Noun Collocation(BiVN)To automatically extract the bilingual verb-nouncollocations, we utilize ... ob-tain more semantically plausible transla-tion results, we use bilingual verb-nouncollocations; these are automatically ex-tracted by using chunk alignment and amonolingual dependency parser. ... Arbor, June 2005.c2005 Association for Computational LinguisticsContext-dependent SMT Model usingBilingual Verb-Noun CollocationYoung-Sook HwangATR SLT Research Labs2-2-2 Hikaridai Seika-choSoraku-gun...
... Learning, vol. 34, pp. 107-130. G. Kikui, 1999. Resolving Translation Ambiguity Using Non -parallel Bilingual Corpora. In Proceedings of ACL ’99 Workshop on Unsupervised Learning in Natural ... −←γγ;} Output: classifiers in English and Chinese Figure 2: Bilingual Bootstrapping Word Translation Disambiguation UsingBilingual Bootstrapping Cong Li Microsoft Research Asia 5F ... paper proposes a new method for word translation disambiguation using a machine learning technique called Bilingual Bootstrapping’. Bilingual Bootstrapping makes use ofin learning a small...
... sentences of each length less tha.n 8]. We estimated the probabilities 173 ALIGNING SENTENCES IN PARALLELCORPORA Peter F. Brown, Jennifer C. Lai, a, nd Robert L. Mercer IBM Thomas J. Watson Research ... describe a statistical tech- nique for aligning sentences with their translations in two parallel corpora. In addition to certain anchor points that are available in our da.ta, the only information ... sentence, or even a whole passage, may be missing from one or the other of the corpora. If a person is given two parallel texts and asked to match up the sentences in them, it is na.tural for...
... primary lexicon using the scores. A threshold is applied to the DTW score of each pair, selecting the most correlated pairs as the first bilingual lexicon. 5. Find anchor points using the primary ... secondary lexicon. 3 Finding high frequency bilingual word pairs When the sentence alignments for the corpus are un- known, standard techniques for extracting bilingual lexicons cannot apply. To ... in its translation 1, suggest- ing a discontinuous mapping between some parallel texts. We have previously shown that using a vector rep- resentation of the frequency and positional informa-...
... for parallel data acquisition is highly benefi-cial for the SMT field.Comparable corpora exhibit various degrees ofparallelism. Fung and Cheung (2004a) describe corpora ranging from noisy parallel, ... very non -parallel. Corpora fromthe last category contain “ disparate, very non- parallel bilingual documents that could either beon the same topic (on-topic) or not”. This is thekind of corpora ... informationconcerning these corpora. 3.2 Extraction ExperimentsOn each of our comparable corpora, and using each of our initial parallel corpora, we applyboth the fragment extraction and the sentence ex-traction...
... have presented two novel probabilistic modelsfor unsupervised word sense disambiguation using parallelcorpora and have shown that both modelsoutperform existing unsupervised approaches. Inaddition, ... experiments with real data, we make use ofthe parallelcorpora constructed by Diab and Resnik(2002) for evaluation purposes. We chose to workon these corpora in order to permit a direct compar-ison ... unsuper-vised word-sense disambiguation usingparallel cor-pora. The first model, which we call the Sensemodel, builds on the work of Diab and Resnik(2002) that uses both parallel text and a sense in-ventory...
... Use the small parallel corpus• Step 2: (optional) Use the monolingual corpusThe two steps are described in detail in the fol-lowing subsections.2.1 Step 1: Using the Small Parallel CorpusFigure ... 55–60.Mona Diab and Philip Resnik. 2002. An unsupervisedmethod for word sense tagging usingparallel corpora. In Proceedings of the 40th Annual Meeting of the As-sociation for Computational ... multilingual text analysis tools via ro-bust projection across aligned corpora. Unsupervised Learning of Arabic Stemming using a Parallel CorpusMonica Rogati†Computer Science Department,Carnegie...
... (associated with content words) reduces the number of parameters 335 AUTOMATIC ALIGNMENT IN PARALLELCORPORA Harris Papageorgiou, Lambros Cranias, Stelios Piperidis I Institute for Language ... the optimum alignment of units. The proposed scheme has been tested at sentence level on parallelcorpora of the CELEX database. The success rate exceeded 99%. The next steps of the work ... scheme's efficiency at lower levels endowed with necessary bilingual information about potential delimiters. INTRODUCTION Parallel linguistically meaningful text units are indispensable...