0

named entities from large corpora

Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Creating a Multilingual Collocation Dictionary from Large Text Corpora" docx

Báo cáo khoa học

... Cooccurrence Extraction with FipsCollocations are extracted from syntactically ana-lysed corpora. The analysis is performed by Fips, a large- scale parser based on an adaptation ofChomksy's ... returns chunks of partial analyses. If132Creating a Multilingual Collocation Dictionary from Large Text Corpora Luka Nerima, Violeta Seretan, Eric WehrliLanguage Technology Laboratory (LATL), ... linguistic analysis. Theoriginality of our approach comes from the factthat collocations are not extracted from raw texts,but rather from syntactically parsed texts. The lin-guistic analysis...
  • 4
  • 479
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Creating a Multilingual Collocation Dictionary from Large Text Corpora" ppt

Báo cáo khoa học

... (paragraph-level)structure of documents is examined, possibly usingmark-up from text encoding.133Creating a Multilingual Collocation Dictionary from Large Text Corpora Luka Nerima, Violeta Seretan, Eric WehrliLanguage ... linguistic analysis. Theoriginality of our approach comes from the factthat collocations are not extracted from raw texts,but rather from syntactically parsed texts. The lin-guistic analysis ... textual corpora from the World Trade Organisation (WTO), whichconsist in parallel documents in three languages:English, French and Spanish. All the examplesgiven in this paper are taken from...
  • 4
  • 353
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Bilingual Terminology Acquisition from Comparable Corpora and Phrasal Translation to Cross-Language Information Retrieval" pptx

Báo cáo khoa học

... Japanese-English language pair,especially if involving the comparable corpora. Re-scoring through the Comparable Corpora Comparable corpora could be considered for thedisambiguation of translation ... comparable corpora- based techniques, re-spectively compared to the hybrid two-stages com-parable corpora and linguistics-based pruning.The proposed approach based on bi-directionalcomparable corpora ... TR2-007.P. Fung. 2000. A Statistical View of Bilingual Lexi-con Extraction: From Parallel Corpora to Non-Parallel Corpora. In Jean Veronis, Ed. Parallel Text Process-ing.G. Grefenstette. 1999....
  • 4
  • 377
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Finding Parts in Very Large Corpora" pdf

Báo cáo khoa học

... the machines at our disposal, so still larger corpora would not be out of the question. Finally, as noted above, Hearst [2] tried to find parts in corpora but did not achieve good results. ... Lexicography 3 (1990), 235-245. [2] Marti Hearst, "Automatic acquisition of hy- ponyms from large text corpora, " in Proceed- ings of the Fourteenth International Conference on Computational ... Abstract We present a method for extracting parts of objects from wholes (e.g. "speedometer" from "car"). Given a very large corpus our method finds part words with 55% accuracy...
  • 8
  • 351
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Effect of Cross-Language IR in Bilingual Lexicon Acquisition from Comparable Corpora" pot

Báo cáo khoa học

... translationknowledge acquisition from WWWnews sites, this paper studies issues onthe effect of cross-language retrieval ofrelevant texts in bilingual lexicon ac-quisition from comparable corpora. Weexperimentally ... parallel/comparative corpora. However, the sizes as well as the domainof existing parallel/comparative corpora are lim-ited, while it is very expensive to manually col-lect parallel/comparative corpora. ... approach of acquiring transla-tion knowledge of domain specific named entities, event expressions, and collocational expressions from the collection of bilingual news articles onWWW news sites...
  • 8
  • 477
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Learning Condensed Feature Representations from Large Unsupervised Data Sets for Supervised Learning" docx

Báo cáo khoa học

... utilize a large amount of unsuperviseddata to supplement supervised data. Specifically,an approach that involves incorporating ‘clustering-based word representations (CWR)’ induced from unsupervised ... LimitedMemory BFGS Method for Large Scale Optimization.Math. Programming, Ser. B, 45(3):503–528.Mitchell P. Marcus, Beatrice Santorini, and Mary AnnMarcinkiewicz. 1994. Building a Large AnnotatedCorpus ... 2011.c2011 Association for Computational LinguisticsLearning Condensed Feature Representations from Large UnsupervisedData Sets for Supervised LearningJun Suzuki, Hideki Isozaki, and Masaaki...
  • 6
  • 300
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Annotating and Recognising Named Entities in Clinical Notes" pot

Báo cáo khoa học

... 15000 clin-ical named entities in 11 entity types. Thispaper reports on the challenges involved increating the annotation schema, and recog-nising and annotating clinical named enti-ties. ... step to the extraction of structured in-formation from these clinical notes is to achieveaccurate identification of clinical concepts or named entities. An entity may refer to a concreteobject ... 3 named entities - CT, pituitary macroade-noma and suprasellar cisterns in the sentence:CT revealed pituitary macroadenoma in suprasel-lar cisterns.In recent years, the recognition of named...
  • 9
  • 413
  • 0
Báo cáo khoa học:

Báo cáo khoa học: " Translating Named Entities Using Monolingual and Bilingual Resources" ppt

Báo cáo khoa học

... IdentiFinder named entity identifier (Bikel et al., 1999) to iden-tify all named entities in the topretrieved docu-ments for each sub-phrase. All named entities ofthe type of the named entity ... articles and hence the named entities will most likely be reported in many languages in-cluding the target language. Instead of having tocome up with translations for the named entities of-ten with ... While the identifica-tion of named entities in text has received sig-nificant attention (e.g., Mikheev et al. (1999) andBikel et al. (1999)), translation of named entities has not. This translation...
  • 9
  • 297
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Detecting Highly Confident Word Translations from Comparable Corpora without Any Prior Knowledge" doc

Báo cáo khoa học

... of bilingual lexicon extraction from parallel corpora. This assumption shouldalso be reasonable for many types of comparable corpora such as Wikipedia or news corpora, whichare topically aligned ... trans-lation candidates from multilingual comparable corpora. By employing the algorithm we haveimproved precision scores of the methods rely-ing on per-topic word distributions from a cross-language ... efficiently bridge the gap betweenlanguages. That seed lexicon is usually crawled from the Web or obtained from parallel corpora. Recently, Li et al. (2011) have proposed an ap-proach that improves...
  • 11
  • 290
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "CS NIPER Annotation-by-query for non-canonical constructions in large corpora" pdf

Báo cáo khoa học

... annotation tasks that require manual analysisover large corpora. The approach is generalizableto any kind of linguistic phenomena that can be lo-cated in corpora on the basis of queries and requiremanual ... suitable software. Their empirical distribu-tion in corpora is thus largely unknown.A major task in recognizing NCCs is distin-guishing them from structurally similar construc-86Figure 3: KWIC ... investiga-tion requires the analysis of large corpora due toa relatively low frequency of instances and whoseidentification requires expert knowledge to distin-guish them from other similar constructions....
  • 6
  • 356
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Toolkit for Multi-Level Alignment and Information Extraction from Comparable Corpora" pptx

Báo cáo khoa học

... sentence pairs are extracted from the aligned comparable corpora (section 2.2). The workflow for named entity (NE) and terminology extraction and mapping from comparable corpora extracts data in ... and named entities. The toolkit pairs similar bilingual comparable documents and extracts parallel sentences and bilingual terminological and named entity dictionaries from comparable corpora. ... translation;  named entity dictionaries. The demonstration showcases two general use case scenarios defined in the toolkit: “parallel data mining from comparable corpora and named entity/terminology...
  • 6
  • 289
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Recognizing Named Entities in Tweets" docx

Báo cáo khoa học

... semi-supervised learning.1 Introduction Named Entities Recognition (NER) is generally un-derstood as the task of identifying mentions of rigiddesignators from text belonging to named- entitytypes such as ... Extracting personal names from email: apply-ing named entity recognition to informal text. In HLT,pages 443–450.David Nadeau and Satoshi Sekine. 2007. A survey of named entity recognition and ... challengesand misconceptions in named entity recognition. InCoNLL, pages 147–155.Sameer Singh, Dustin Hillard, and Chris Leggetter. 2010.Minimally-supervised extraction of entities from textadvertisements....
  • 9
  • 296
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Building Emotion Lexicon from Weblog Corpora" potx

Báo cáo khoa học

... 133–136,Prague, June 2007.c2007 Association for Computational LinguisticsBuilding Emotion Lexicon from Weblog Corpora Changhua Yang Kevin Hsin-Yih Lin Hsin-Hsi Chen Department of Computer Science and ... mine the relationships between words and emotions using weblog corpora. A collocation model is proposed to learn emotion lexicons from weblog articles. Emotion classification at sentence level ... Blog from January to July, 2006, spanning a period of 212 days. In total, 336,161 bloggers’ articles were col-lected. Each blogger posts 16 articles on average. We used the articles from...
  • 4
  • 302
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Detecting Semantic Relations between Named Entities in Text Using Contextual Features" pdf

Báo cáo khoa học

... solvesproblems, which result from when a parallelsentence arises from predication ellipsis. How-ever, there are several types of parallel sentencethat differ from the one we explained. (For ... aresorted in order of likelihood of being the antecedent.The sorting algorithm has two steps. First, from thebeginning of the text until the pronoun appears, nounOsakaoasu , NaomiothersnigaKenwaOsakaoasu ... anaphora resolutions here.Applied centering theory to relation detection isas follows. First, from the beginning of the text untilthe following NE appears, noun phrases are stackeddepending...
  • 4
  • 314
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Mapping Concrete Entities from PAROLE-SIMPLE-CLIPS to ItalWordNet: Methodology and Results" potx

Báo cáo khoa học

... (henceforth TCs) clustered in three categories distinguishing 1st OrderEntities, 2ndOrderEntities and 3rdOrder Entities. Their subclasses, hierarchically ordered by means of a subsumption ... ontology of semantic types. 2 Corpora e Lessici dell'Italiano Parlato e Scritto. 161The IWN Top Ontology (TO) (Roventini et al., 2003), which slightly differs from the EWN TO3, consists ... 161–164,Prague, June 2007.c2007 Association for Computational LinguisticsMapping Concrete Entities from PAROLE-SIMPLE-CLIPS to ItalWordNet: Methodology and Results Adriana Roventini, Nilda...
  • 4
  • 257
  • 0

Xem thêm