0

automatic identification of word translations from unrelated english and german corpora

Báo cáo khoa học:

Báo cáo khoa học: "Automatic Identification of Word Translations from Unrelated English and German Corpora" pot

Báo cáo khoa học

... in terms of corpus frequencies: kl~ = frequency of common occurrence of word A and word B kl2 = corpus frequency of word A - kll k21 = corpus frequency of word B - kll k22 = size of corpus ... 520 Automatic Identification of Word Translations from Unrelated English and German Corpora Reinhard Rapp University of Mainz, FASK D-76711 Germersheim, Germany rapp @usun2.fask.uni-mainz.de ... example, if the words teacher and school co- occur more often than expected by chance in a corpus of English, then the German translations of teacher and school, Lehrer and Schule, should...
  • 8
  • 438
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Automatic Identification of Pro and Con Reasons in Online Reviews" ppt

Báo cáo khoa học

... Opinion Bearing Words and Sen-tences. In the Companion Volume of the Pro-ceedings of IJCNLP-05, Jeju Island, Republic of Korea. Kim, Soo-Min and Eduard Hovy. 2006. Identifying and Analyzing ... Sources of Opinions with Conditional Random Fields and Extraction Pat-terns. Proceedings of HLT/EMNLP-05. Esuli, Andrea and Fabrizio Sebastiani. 2005. De-termining the semantic orientation of ... Soo-Min and Eduard Hovy. 2004. Determin-ing the Sentiment of Opinions. Proceedings of COLING-04. pp. 1367-1373. Geneva, Switzer-land. Kim, Soo-Min and Eduard Hovy. 2005. Automatic Detection of...
  • 8
  • 461
  • 1
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Automatic Collection of Related Terms from the Web" pptx

Báo cáo khoa học

... term of the original seed term byhand. The result is shown in the left half (EvaluationI) of Table 2. In this evaluation, 519 terms out of 610terms were correct: the precision is 85%. From ... terms that should becollected from each seed word, and then checkedwhether each of the target terms was included inthe system output. We counted the number of tar-get terms in the following ... (20%) out of 210 terms were col-lected by the system. This low recall primarilycomes from the failure of automatic term recogni-tion (case A in the above classification). Improve-ment of this...
  • 4
  • 437
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "AUTOMATIC ACQUISITION OF SUBCATEGORIZATION FRAMES FROM UNTAGGED TEXT" doc

Báo cáo khoa học

... infinitive marker and to as a preposition. Then he measures the mutual information between oc- currences of the verb and occurrences of infinitives following within a certain number of words. Unlike ... Table 2: Efficiency of verb detection for each of the five SFs, as tested on 2.6 million words of the Wall Street Journal and controlled by the Penn Treehank's hand-verified tagging ... be of limited value. Statistical disambiguators make dictionaries more useful, but they have a fairly high error rate, and degrade in the presence of many unfamiliar words. Further, it is often...
  • 6
  • 416
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Automatic Compilation of Travel Information from Automatically Identified Travel Blogs" doc

Báo cáo khoa học

... blogs from a blog database, and ex-tracting travel information from them. In the identification of travel blogs, we obtained of 38.1% for Recall and 86.7% for Precision. In the extraction of ... identifica-tion of travel blogs, and (2) extraction of travel information from blogs. We reported on them in Sections 4.1 and 4.2. 4.1 Identification of Travel Blogs Data sets and experimental settings ... method 51.1 60.5 Table 1: Identification of travel blogs Our method could not identify 266 of the tra-vel blogs. We randomly selected 50 entries from these 266, and analysed the errors. Among...
  • 4
  • 307
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Automatic Acquisition of Adjectival Subcategorization from Corpora" docx

Báo cáo khoa học

... Briscoe, D. Carter, and C. Grover. 1987. The derivation of a grammatically-indexedlexicon from the LongmanDictionary of Con-temporary English. In Proceedings of the 25th AnnualMeeting of the Associationfor ... acquisition.1 IntroductionResearch into automatic acquisition of lexical in-formation from large repositories of unannotatedtext (such as the web, corpora of published text,etc.) is starting ... University of Edinburgh Laboratory for Foundations of Computer Science.state -of- art statistical systems and for improving theportability of these systems between domains.One type of lexical...
  • 8
  • 390
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "A Practical Solution to the Problem of Automatic Part-of-Speech Induction from Text" pdf

Báo cáo khoa học

... prob-lem of automatic word sense induction. Proceedings of ACL (Companion Volume), Barcelona, 195-198. Schütze, Hinrich (1993). Part -of- speech induction from scratch. Proceedings of ACL, Columbus, ... we only selected words where we felt reasonably confident about their possible parts of speech. Note that the list of words was compiled before the start of our experiments and remained unchanged ... ambiguous words to clusters is not required at this stage, as this is taken care of in the next step. This step involves computing the differential vector of each word from the centroid of its...
  • 4
  • 433
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "TOWARDS THE AUTOMATIC IDENTIFICATION OF ADJECTIVAL SCALES: CLUSTERING ADJECTIVES ACCORDING TO MEANING" ppt

Báo cáo khoa học

... towards the automatic identification of adjectival scales. We discuss the properties of adjectival scales and of groups of semantically related adjectives and how they imply sources of linguistic ... between X and Y. An unbiased estimator of x is the statistic C-Q T= where n is the number of paired observations in the sample and C and Q are the numbers of observed concordances and discordances ... assessment of the significance of an improvement over the base line of the random algo- rithm much harder. As a consequence of point (3) made above, we need to understand the significance of the...
  • 11
  • 379
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Detecting Highly Confident Word Translations from Comparable Corpora without Any Prior Knowledge" doc

Báo cáo khoa học

... Automatic identification of word translations from unrelated English and Ger-man corpora. In Proceedings of the 37th Annual458 laxes them by lowering the threshold and expand-ing the search space ... list. In otherwords, if the first translation candidate for the source word isola is the target word island, and, vice versa, the firsttranslation candidate for the target word island is isola, ... comparable data. Corpora and software for4In case of Europarl, we use only the evidence of docu-ment alignment during the training and do not benefit from the parallelness of the sentences...
  • 11
  • 290
  • 0
Support to the identification of potential risks for the environment and human health arising from hydrocarbons operations involving hydraulic fracturing in Europe doc

Support to the identification of potential risks for the environment and human health arising from hydrocarbons operations involving hydraulic fracturing in Europe doc

Điện - Điện tử

... limited to lower volume fracturing of some tight gas and conventional reservoirs in the southern part of the North Sea and in onshore Germany, the Netherlands, Denmark and the UK. Preliminary indications ... Because of the large areas of clay deposition in tidal flats and deep water, shale gas reserves can cover wider areas extending to tens of thousands of square km(US EIA 2011 NPR Sections V, VI and ... hundreds to thousands of feet below the land surface and lateral sections may extend 1000 to 6000 feet [300 to 2000 metres] away from the well. Fluids, commonly made up of water and chemical additives,...
  • 292
  • 586
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Identifying Word Translations from Comparable Corpora Using Latent Topic Models" potx

Báo cáo khoa học

... num-ber of words assigned to the topic zk, and |WS| isthe total number of distinct words in the vocabulary.The formula for a set of ψ word- topic probabilitydistributions for the target side of ... Italian word vectors and English word vectors with TF-IDF scores in the original word- document space (Cos), with aligned documents.Table 1 shows the Precision@1 scores (the per-centage of words ... knowledge. Two wordsare potential translation candidates if they are oftenpresent in the same cross-lingual topics and not ob-served in other cross-lingual topics. In other words,a word w2 from a...
  • 6
  • 449
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "AN EXPERT SYSTEM FOR THE PRODUCTION OF PHONEME STRINGS FROM UNMARKED ENGLISH TEXT USING MACHINE-INDUCED RULES" pdf

Báo cáo khoa học

... scanned for word and cluster boundaries, and lists of pointers to boundary locations in the string are constructed. The parser also counts the number of syllables in each word, and constructs ... parser which is capable of making these sorts of part -of- speech dlstlnctlons. 9 Such a parser need not be very accurate in terms of the proper assignment of words to part -of- speech classes. However, ... execution and storage requirements, and desirable in terms of transcription accuracy. One of the unique features of UTTER is the possibility of creating "mlnl-lmplementatlons" of UTTER...
  • 8
  • 493
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Automatic Identification of Non-compositional Phrases" pdf

Báo cáo khoa học

... resolve word sense ambiguity. In Proceedings of ACL/EACL-97, pages 64-71, Madrid, Spain, July. Dekang Lin. 1998. Automatic retrieval and clustering of simi- lar words. In Proceedings of COLING/ACL-98, ... mutual information of a collocation is the log- arithm of the ratio between the probability of the collocation and the probability of events A, B, and C co-occur if we assume B and C are conditionally ... mutual information filter (3). 320 Automatic Identification of Non-compositional Phrases Dekang Lin Department of Computer Science University of Manitoba and Winnipeg, Manitoba, Canada, R3T...
  • 8
  • 331
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Automatic Acquisition of Script Knowledge from a Text Collection" docx

Báo cáo khoa học

... Automatic Acquisition of Script Knowledge from a Text CollectionToshiaki FujikiHidetsugu NanbaInterdisciplinary Graduate School of Graduate School of Science and EngineeringInformation ... wrong results of text cluster-ing(these are the errors of the tools used), errors insupplementing omitted subjects and objects, and errors resulting from incorrect interpretation of passive sentences ... preliminary experiment with oursystem for automatic acquisition of script knowl-edge and investigated the effectiveness of ourmethod. We used issues of Nihon Keizai Shim-bun for the past 11...
  • 4
  • 351
  • 0
Consultation on the Removal of Speaking and Listening Assessment from GCSE English and GCSE English Language ppt

Consultation on the Removal of Speaking and Listening Assessment from GCSE English and GCSE English Language ppt

Kỹ năng nghe tiếng Anh

... of assessment of GCSE qualifications in English and/ or English language for award of qualifications in 2014, 2015 and 2016, paragraphs 11 and 12 of the GCSE Subject Criteria for English ( English ... the Removal of Speaking and Listening Assessment from GCSE English and GCSE English Language Ofqual 2013 7 Our proposals Proposal 1: add a requirement for GCSE English and GCSE English language ... the Removal of Speaking and Listening Assessment from GCSE English and GCSE English Language Ofqual 2013 3 Background The current qualifications GCSE English and GCSE English language...
  • 24
  • 608
  • 0

Xem thêm