comparable corpora and phrasal

Tài liệu Báo cáo khoa học: "Bilingual Terminology Acquisition from Comparable Corpora and Phrasal Translation to Cross-Language Information Retrieval" pptx

Ngày tải lên : 20/02/2014, 16:20

... involving the comparable corpora. Re-scoring through the Comparable Corpora Comparable corpora could be considered for the disambiguation of translation alternatives and thus selection of best phrasal ... with 10.35%, 8.27% and 3.08% for the WWW-based, the NTCIR- based and comparable corpora- based techniques, re- spectively compared to the hybrid two-stages comparable corpora and linguistics-based ... (1998-1999) for Japanese and Mainichi Daily News (1998-1999) for English were considered as comparable corpora. We have also considered documents of NTCIR-2 test collection as comparable corpora in order...

4
377
0

Báo cáo khoa học: "Exploiting Comparable Corpora and Bilingual Dictionaries for Cross-Language Text Categorization" potx

Ngày tải lên : 17/03/2014, 04:20

... WordNet) and Collins bilingual dictionary. Section 5 evaluates our methodolo- gies and Section 6 concludes the paper suggesting some future developments. 2 Comparable Corpora Comparable corpora ... t i z ∈ T ∗ and for every language L j , and is known, then the corpus is parallel and aligned at document level. For the purpose of this paper it is enough to as- sume that two corpora are comparable, ... Italian and 27,821 English news partitioned by AdnKronos into four ﬁxed cat- egories: QUALITY OF LIFE, MADE IN ITALY, TOURISM, CULTURE AND SCHOOL. The En- glish and the Italian corpora are comparable, ...

8
361
0

Báo cáo khoa học: "Using Bilingual Comparable Corpora and Semi-supervised Clustering for Topic Tracking" ppt

Ngày tải lên : 17/03/2014, 04:20

... Research, and Interna- tional Communications Foundation. References J.Allan and R.Papka and V.Lavrenko, On-line new event detection and tracking, Proc. of the DARPA Workshop, 1998. J.Allan and V.Lavrenko ... chose the TDT3 English corpora as our gold standard corpora. TDT3 consists of 34,600 stories with 60 manually identiﬁed topics. We then created Japanese corpora (Mainichi and Yomiuri newspapers) ... stories, and tracking. 3.1 Extracting Bilingual Story Pairs We extract story pairs which consist of positive English story and its associated Japanese stories using the TDT English and Mainichi and...

8
254
0

Tài liệu Báo cáo khoa học: "Weakly Supervised Named Entity Transliteration and Discovery from Multilingual Comparable Corpora" ppt

Ngày tải lên : 20/02/2014, 12:20

... (Cucerzan and Yarowsky, 1999) and (Collins and Singer, 1999) present algorithms to obtain NEs from untagged corpora. However, they focus on the classiﬁcation stage of already segmented entities, and ... transliteration candidates in another language. Time sequence scoring is then used to re- rank the list and choose the candidate best tem- porally aligned with the NE. Pairs of NEs and the best candidates ... pairs of En- glish NEs and their Russian transliterations. Nega- tive examples here and during the rest of the training were pairs of randomly selected non-NE En- glish and Russian words. New...

8
391
0

Báo cáo khoa học: "Toolkit for Multi-Level Alignment and Information Extraction from Comparable Corpora" pptx

Ngày tải lên : 16/03/2014, 20:20

... terms and named entities. The toolkit pairs similar bilingual comparable documents and extracts parallel sentences and bilingual terminological and named entity dictionaries from comparable corpora. ... from parallel corpora by means of symmetrical word alignment and/ or by phrase generation (Koehn et al., 2003). Our toolkit exploits comparable corpora in order to find and extract comparable ... extraction from comparable corpora. It consists of tools bundled in two workflows: (1) alignment of comparable documents and extraction of parallel sentences and (2) extraction and bilingual...

6
289
0

Báo cáo khoa học: "A Method for Effective and Scalable Mining of Named Entity Transliterations from Large Comparable Corpora" doc

Ngày tải lên : 24/03/2014, 03:20

... 1: Comparable Corpora The corpora can be categorized into two sepa- rate groups, group S (for Small) consisting of EK-S, ET-S, ER-S, and EH-S and group L (for Large) consisting of EK-L and ... language and the pairing of articles in the comparable corpora is known in advance. We want to emphasize here that such corpora are indeed available in many domains such as tech- nical documents and ...  11 1 11 ,|,|| 1       jajajj A m j nm tstpsaapstP jj Here, j t (and resp. i s ) denotes the j th (and resp. i th ) character in w T (and resp. w S ) and m aA 1  is the hidden alignment between w T and w S where j t is aligned...

9
358
0

Tài liệu Báo cáo khoa học: "Effect of Cross-Language IR in Bilingual Lexicon Acquisition from Comparable Corpora" pot

Ngày tải lên : 22/02/2014, 02:20

... Acquisition from Comparable Corpora Takehito Utsuro  Takashi Horiuchi and Kohei Hino Graduate School of Informatics, Takeshi Hamamoto and Takeaki Nakayama Kyoto University  Dpt. Information and Computer ... se- quences, and word frequency vectors v (d j) and v (dlY T ) are generated. Then, cosine similarities between v (d j ) and v (dr') are calculated 3 and pairs of articles di and dE ... help of any existing bilingual lexicons. On the other hand, later works such as Kaji and Aizono (1996), Fung and Yee (1998), Rapp (1999), and Tanaka (2002) studied to exploit existing bilingual lexicons...

8
477
0

Commonly-Used Idioms, Sayings and phrasal verbs

Ngày tải lên : 02/03/2014, 20:36

... The taxes increased across the board and everyone must pay more. act high and mighty - to act proud and powerful The woman always acts high and mighty and nobody likes her. act one's ... energy The man stood up and belted out several old songs. Mr_doody2004@yahoo.com 29 B back Idioms back and forth - backwards and forwards, first one way and then the other way ... Mr_doody2004@yahoo.com 47 bite the hand that feeds you - to harm or turn against someone who does good things for you He is biting the hand that feeds him when he criticizes and fights against his boss....

994
658
0

Báo cáo khoa học: "Detecting Highly Conﬁdent Word Translations from Comparable Corpora without Any Prior Knowledge" doc

Ngày tải lên : 08/03/2014, 21:20

... pair (isola, island). 2. Remove the words isola and island from their respective vocabularies. 3. Since island is not in the vocabulary, the indirect association between arcipelago and island is not ... on precision and recall of bilingual lexicon extraction from parallel corpora. This assumption should also be reasonable for many types of comparable corpora such as Wikipedia or news corpora, which are ... promising performance. 5 Conclusions and Future Work We have designed an algorithm that focuses on ac- quiring and keeping only highly conﬁdent translation candidates from multilingual comparable corpora. By employing...

11
290
0

Báo cáo khoa học: "Scaling Phrase-Based Statistical Machine Translation to Larger Corpora and Longer Phrases" pptx

Ngày tải lên : 17/03/2014, 05:20

... including the joint probability phrase- based model (Marcu and Wong, 2002) and a vari- ant on the alignment template approach (Och and Ney, 2004), and contrast them to the performance of the word-based ... pairs, and has 127 million words in English, and 106 million words in Arabic. The table shows the number of unique Arabic phrases, and gives the average number of translations into English and their ... corpus, and another created from the target language portion of the corpus, • An index that tells us the correspondence between sentence numbers and positions in the source and target language corpora, •...

8
316
0

McGraw Hill''''s Dictionary of american idioms and phrasal verbs

Ngày tải lên : 20/03/2014, 17:47

... on me. backhanded compliment and left-handed compliment an unintended or ambiguous compliment. ᮀ Back- handed compliments are the only kind he ever gives! ᮀ And I think his left-handed compliments ... abbreviations and symbols are used, and these are explained in the section “Terms and Symbols.” The user who understands the meaning of entry head, variable, and wild card term is equipped to understand ... back and made a motion with his hand indicating that Mary should go first. “After you,” smiled Bob. again and again repeatedly; again and even more [times]. ᮀ He knocked on the door again and...

1.1K
17.6K
5

Báo cáo khoa học: "Clustering Comparable Corpora For Bilingual Lexicon Extraction" ppt

Ngày tải lên : 23/03/2014, 16:20

... and that of P 2 is 0.939. Both corpora are more comparable than P 0 of which the comparability is 0.881. Furthermore, both P 1 and P 2 are more comparable than P 1  (comparability 0.912) and ... studies and is now standard. 3.2.2 Results and Analysis In a ﬁrst series of experiments, bilingual lexicons were extracted from the corpora obtained by our approach (P 1 and P 2 ), the corpora ... English documents and 87k French documents) consisting of the corpora LAT94, MON94 and SDA94; P 2 T (368k English documents and 378k French documents) consisting of Wiki-En and Wiki-Fr. 1 http://trec.nist.gov 2 http://www.clef-campaign.org 3 The...

6
308
0

Báo cáo khoa học: "Identifying Word Translations from Comparable Corpora Using Latent Topic Models" potx

Ngày tải lên : 23/03/2014, 16:20

... acquire translation candidates based on comparable and unrelated corpora comes from (Rapp, 1995). Similar approaches are described in (Diab and Finch, 2000), (Koehn and Knight, 2002) and (Gaussier et ... behind our work and gives an overview and a theoretical background of the methods. Section 4 evaluates and discusses ini- tial results. Finally, section 5 proposes several ex- tensions and gives a ... 479–484, Portland, Oregon, June 19-24, 2011. c 2011 Association for Computational Linguistics Identifying Word Translations from Comparable Corpora Using Latent Topic Models Ivan Vuli ´ c, Wim De Smet and...

6
449
0

Báo cáo khoa học: "Extracting loanwords from Mongolian corpora and producing a Japanese-Mongolian bilingual dictionary" ppt

Ngày tải lên : 23/03/2014, 18:20

... corpora (Myaeng and Jeong, 1999; Oh and Choi, 2001) use the phonetic differences between conventional Korean words and loanwords. However, these methods require manually tagged training corpora, ... or comparable corpus is not available, such as Mongolian and Japanese. Fujii et al. (2004) proposed a method that does not require tagged corpora or parallel corpora to extract loanwords and ... bilingual comparable corpora, and matched named entities in each language corpus if they were similar to each other. Thus, Lam et al.’s method cannot be used for a language pair where comparable corpora...

8
271
0

Báo cáo khoa học: "Using comparable corpora to solve problems difﬁcult for human translators" pptx

Ngày tải lên : 31/03/2014, 01:20

... several comparable corpora for English and Russian, including large ref- erence corpora (the BNC and the Russian Refer- ence Corpus) and corpora of major British and Russian newspapers. All corpora ... Barcelona. Michael Carl and Andy Way, editors. 2003. Re- cent advances in example-based machine translation. Kluwer, Dordrecht. Ido Dagan and Kenneth Church. 1997. Ter- might: Coordinating humans and machines ... corpus- based extraction and the very large lexicon. In Lars Borin, editor, Language and Computers, Parallel corpora, parallel worlds, pages 137–149. Rodopi. John S. Justeson and Slava M. Katz. 1995....