0

correcting a postagged corpus

Báo cáo khoa học:

Báo cáo khoa học: "Unsupervised Learning of Arabic Stemming using a Parallel Corpus" pot

Báo cáo khoa học

... subsequently aligned byautomatic means. A small parallel corpus can beavailable when native speakers and translators arenot, which makes building a stemmer out of such corpus a preferable direction.rule ... AlAst$ArypTask: stem AlAst$ArypChoices ScoreAlAst$Aryp 0.2AlAst$Aryp 0.7AlAst$Aryp 0.8AlAst$Aryp 0.1......Figure 3: Scoring the StemHowever, this approach has several drawbacksthat ... improvement obtained over usingunstemmed text.1.1 Arabic detailsIn this paper, Arabic was the target language but theapproach is applicable to any language that needsaffix removal. In Arabic, unlike...
  • 8
  • 424
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Creating a manually error-tagged and shallow-parsed learner corpus" pptx

Báo cáo khoa học

... Computational LinguisticsCreating a manually error-tagged and shallow-parsed learner corpus Ryo NagataKonan University8-9-1 Okamoto,Kobe 658-0072 Japanrnagata @ konan-u.ac.jp.Edward Whittaker ... 44th Annual Meeting of ACL, pages 241–248.Katsuaki Okihara. 1985. English writing (in Japanese).Taishukan, Tokyo.Alla Rozovskaya and Dan Roth. 201 0a. Annotating ESLerrors: Challenges and rewords. ... Vera SheinmanThe Japan Institute forEducational Measurement Inc.3-2-4 Kita-Aoyama, Tokyo, 107-0061 Japanwhittaker,sheinman @jiem.co.jpAbstractThe availability of learner corpora, especiallythose...
  • 10
  • 467
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Liars and Saviors in a Sentiment Annotated Corpus of Comments to Political debates" pdf

Báo cáo khoa học

... sentiments for a variety of topics and corresponding targets are potentially involved (Riloff and Wiebe., 2003; Sarmento et al., 2009). Alternative approaches to automatic and manual construction ... Natu-ral Language Processing and Computational Natural Language Learning, Prague. Krippendorff, Klaus. 2004. Content Analysis: An Intro-duction to Its Methodology, 2nd Edition. Sage Publi-cations, ... 564–568,Portland, Oregon, June 19-24, 2011.c2011 Association for Computational LinguisticsLiars and Saviors in a Sentiment Annotated Corpus of Comments to Political debates Paula Carvalho...
  • 5
  • 499
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Collecting a Why-question corpus for development and evaluation of an automatic QA-system" pdf

Báo cáo khoa học

... each paid reward.• Qualifications To improve the data quality, a HIT can also be attached to certain tests,“qualifications” that are either system-providedor created by the requester. An example ... the assign-ments have been completed.• Rewards At upload time, each HIT has to beassigned a fixed reward, that cannot be changedlater. Minimum reward is $0.01. Amazon.comcollects a 10% (or a ... excess of information. FAQ-pages tend to alsoanswer questions which are not asked, and also con-tain practical examples. Human-powered answersoften contain unrelated information and discourse-like...
  • 9
  • 610
  • 1
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "ModelTalker Voice Recorder – An Interface System for Recording a Corpus of Speech for Synthesis" ppt

Báo cáo khoa học

... pitch, amplitude and pronuncia-tion and users are given immediate feedback on the acceptability of each recording. Users can then rerecord an unacceptable utterance. Recordings are automatically ... utterance. This alignment is retained so that each utterance is automatically labeled. Once the entire corpus has been recorded, alignments are automatically refined based on specific individual ... naturalness and individuality one associates with one’s own voice. Individuals with difficulty speak-ing can be any age, gender, and from any part of the country, with regional dialects and...
  • 4
  • 419
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "A Method for Correcting Errors in Speech Recognition Using the Statistical Features of Character Co-occurrence" pptx

Báo cáo khoa học

... grammatical and n-gram based statistical language constraints, and uses a robust parsing technique to apply the grammatical constraints described by context-free grammar (Tsukada et aL, 97). ... the Error-Pattem-Database and String-Database can be mechanically prepared, which reduces the effort required to prepare the databases and makes it possible to apply this method to a new recognition ... Error-Pattern examples. Table 2-1 Examples of Error-Patterns Correct-Part Error-Part 2.1.1 Extraction of Error-Patterns The Error-Pattern-Database is mechanically prepared using a pair of parts...
  • 5
  • 588
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "WebCAGe – A Web-Harvested Corpus Annotated with GermaNet Senses" docx

Báo cáo khoa học

... hand-crafted sense-annotatedcorpora have been available (Agirre et al., 2007;Erk and Strapparava, 2012; Mihalcea et al., 2004),while WSD research for languages that lack thesecorpora has lagged behind ... the 3rd In-ternational Language Resources and Evaluation(LREC’02), Las Palmas, Canary Islands, pp. 609–612Santamar´ a, C., Gonzalo, J., Verdejo, F. 2003. Au-tomatic Association of Web Directories ... representative examples in Yarowsky’s ap-proach is performed completely manually and istherefore limited to the amount of data that canreasonably be annotated by hand.Leacock et al. (1998), Agirre...
  • 10
  • 419
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Using an Annotated Corpus as a Stochastic Grammar" ppt

Báo cáo khoa học

... ~ A may be 40 M. Marcus, 1991. "Very Large Annotated Database of America~ English". DARPA Speech and Naawal Language Workshop, ~ Grove, Morgan Kaufmarm. F. Pereira and Y. Schabes, ... the Air Travel Information System (ATIS) spoken language corpus. Preliminary experiments yield 96% test set parsing accuracy. 1 Motivation As soon as a formal grammar characterizes a non- ... trivial part of a natural language, .almost every input string of reasonable length gets an unmanageably large number of different analyses. Since most of these analyses are not perceived as...
  • 8
  • 393
  • 0
Báo cáo khoa học:

Báo cáo khoa học: " a Movie Dialogue Corpus for Research and Development" potx

Báo cáo khoa học

... Seve-ral factors, such as the availability of more power-ful computers, an almost unlimited storage ca-pacity, the availability of large volumes of data in digital format, as well as the ... dialogue management and natural language generation. Springer. Stallard D (2000) Talk’n’travel: a conversational system for air travel planning. In Proceedings of the 6th Conference on Applied ... hand, contain all additional information/texts appearing in the scripts, which are typically of narrative nature and explain what is happening in the scene. Figure 1 depicts a browser snapshot...
  • 5
  • 424
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "A Corpus for Modeling Morpho-Syntactic Agreement in Arabic: Gender, Number and Rationality" docx

Báo cáo khoa học

... Penn Arabic Treebank:Building a Large-Scale Annotated Arabic Corpus. InNEMLAR Conference on Arabic Language Resourcesand Tools, pages 102–109, Cairo, Egypt.Yuval Marton, Nizar Habash, and ... Func-tional Approach. In Proceedings of the seventh In-ternational Conference on Language Resources andEvaluation (LREC), Valletta, Malta.Mohammed Attia. 2008. Handling Arabic Morpholog-ical and ... Societyfor Information Science and Technology, 55(3):189–213.Mohamed Altantawy, Nizar Habash, Owen Rambow, andIbrahim Saleh. 2010. Morphological Analysis andGeneration of Arabic Nouns: A Morphemic...
  • 6
  • 378
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Solving Relational Similarity Problems Using the Web as a Corpus" potx

Báo cáo khoa học

... relations like equative, e.g., findingplayer and coach on the Web suggests an equativerelation for player coach (and for coach player).As Table 3 shows, this is different for SAT ver-bal analogy, ... helpful.456ReferencesHiyan Alshawi and David Carter. 1994. Trainingand scaling preference functions for disambiguation.Computational Linguistics, 20(4):635–648.Ken Barker and Stan Szpakowicz. 1998. Semi-automaticrecognition ... Science and Engineering.Christiane Fellbaum, editor. 1998. WordNet: An Elec-tronic Lexical Database. MIT Press.Roxana Girju, Dan Moldovan, Marta Tatu, and DanielAntohe. 2005. On the semantics...
  • 9
  • 390
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Learning Semantic Links from a Corpus of Parallel Temporal and Causal Relations" doc

Báo cáo khoa học

... e.g. took and began meet only attheir roots, so the LCA senses are act#0 and be#0.We also extracted temporal and causal word associ-ations from the Google N-gram corpus (Brants andFranz, 2006), ... achievingan F-measure of 49.0 for temporals and 52.4for causals. Analysis of these models sug-gests that additional data will improve perfor-mance, and that temporal information is cru-cial to causal ... existingcorpora are missing some crucial pieces for study-ing temporal-causal interactions. Our research aimsto fill these gaps by building a corpus of paralleltemporal and causal relations and exploring...
  • 4
  • 363
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Creating a Corpus of Parse-Annotated Questions" docx

Báo cáo khoa học

... datarepeatParse a new section of raw dataManually correct errors in the parser outputAdd the corrected data to the training setExtract a new grammar for the parseruntil All the data has been processedAlgorithm ... ofPennsylvania, Philadelphia, PA.Daniel Gildea. 2001. Corpus variation and parser perfor-mance. In Lillian Lee and Donna Harman, editors, Pro-ceedings of EMNLP, pages 167–202, Pittsburgh, PA.Charles ... can be rapidly induced from appropri-ate treebank material. However, treebank- andmachine learning-based grammatical resources re-flect the characteristics of the training data. Theygenerally...
  • 8
  • 405
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Test Collection Selection and Gold Standard Generation for a Multiply-Annotated Opinion Corpus" potx

Báo cáo khoa học

... strict and lenient met-rics are also applied in annotations of relevance. 4.2 High agreement To see how the generated gold standards agree with the annotations of all annotators, we analyze ... gold standard; for the lenient metric, sentences with annotations agreed by at least two annotators are selected as the testing collection and the major-ity of annotations are treated as the ... of annotations are listed and two methods are introduced to evaluate the quality of the human-tagged opinion corpora. 3.1 Combinations of annotations Three major properties are annotated for...
  • 4
  • 418
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Generating Usable Formats for Metadata and Annotations in a Large Meeting Corpus" pptx

Báo cáo khoa học

... metadataand annotations. The annotation files areconverted to a tabular format using an eas-ily adaptable XSLT-based mechanism, andtheir consistency is verified in the process.Metadata files are ... order to generate tabular files(TSV) and a table-creation script.4. Create and populate metadata tables withindatabase.5. Adapt the XSLT stylesheet as needed for vari-ous table formats.5 Results: ... names or analyse folders. Moreover, the ad-vantage of creating IMDI files is that the metadatais compliant with a widely used standard accompa-nied by freely available tools such as the metadatabrowser....
  • 4
  • 373
  • 0

Xem thêm