0

large linguisticallyprocessed web corpora

Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Large linguistically-processed Web corpora for multiple languages" doc

Báo cáo khoa học

... model, and thus identifying it with automated tech-niques is far from trivial.88 Large linguistically-processed Web corpora for multiple languagesMarco BaroniSSLMITUniversity of BolognaItalybaroni@sslmit.unibo.itAdam ... sizes of over 1 billionwords in each case. We provide Web ac-cess to the corpora in our query tool, theSketch Engine.1 IntroductionThe Web contains vast amounts of linguistic datafor many ... contain connected text.Second, the TreeTagger was not trained on Web data, and thus its performance on texts that areheavy on Web- like usage (e.g., texts all in lower-case, colloquial forms...
  • 4
  • 314
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "An Efficient Indexer for Large N-Gram Corpora" docx

Báo cáo khoa học

... toolthat implements efficient indexing and re-trieval of large N-gram datasets, such as the Web1 T 5-gram corpus. Our tool indexes theentire Web1 T dataset with an index size ofonly 100 MB and performs ... Lan-guage Processing (NLP), the models give a muchbetter performance with larger data sets.However the large data sets, such as the Web1 T5-Gram corpus of (Brants and Franz, 2006), presenta major challenge. ... language models and applying them tovarious problems, they are not designed for very large corpora, such as the Web1 T 5-gram corpus(Brants and Franz, 2006), hence they do not provideefficient implementations...
  • 6
  • 320
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Constructing Transliteration Lexicons from Web Corpora" docx

Báo cáo khoa học

... existing dictionaries. Regularly exploring Web corpora is a good way to update dictionaries. Transliterated-term extraction using non-parallel corpora has also been conducted (Kuo, 2003). ... into Chinese syllables using the trained cross- Constructing Transliteration Lexicons from Web Corpora Jin-Shea Kuo1, 2 Ying-Kuei Yang2 1Chung-Hwa Telecommunication Laboratories, ... Internet is one of the largest distributed databases in the world. It comprises various kinds of data and at the same time is growing rapidly. Though the World Wide Web is not systematically...
  • 4
  • 218
  • 0
Tài liệu The Anatomy of a Large-Scale Hypertextual Web Search Engine ppt

Tài liệu The Anatomy of a Large-Scale Hypertextual Web Search Engine ppt

Tiêu chuẩn - Qui chuẩn

... intechnology and web proliferation, creating a web search engine today is very different from threeyears ago. This paper provides an in-depth description of our large- scale web search engine ... World Wide Web Worm (WWWW) [McBryan 94] had an indexof 110,000 web pages and web accessible documents. As of November, 1997, the top search enginesclaim to index from 2 million (WebCrawler) ... PageRank: Bringing Order to the Web The citation (link) graph of the web is an important resource that has largely gone unused in existing web search engines. We have created maps containing as many as...
  • 20
  • 571
  • 0
Tài liệu Large debt financing syndicated Loans versus corporate bonds docx

Tài liệu Large debt financing syndicated Loans versus corporate bonds docx

Ngân hàng - Tín dụng

... financing for large firms. Since the introduction of the euro syndicated loans and corporate bonds have become the main sources for large debt financing: in both markets, firms can raise large amounts ... find that larger firms are more likely to issue debt in the syndicated loan markets than the corporate bond market. Secondly, when including a larger sample with smaller firms from the larger ... direct corporate bond financing: In both markets, firms can tap the financial markets to raise large amounts of funds with medium and long-term maturities. Today, many of Europe’s largest...
  • 37
  • 426
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Finding Parts in Very Large Corpora" pdf

Báo cáo khoa học

... the machines at our disposal, so still larger corpora would not be out of the question. Finally, as noted above, Hearst [2] tried to find parts in corpora but did not achieve good results. ... Lexicography 3 (1990), 235-245. [2] Marti Hearst, "Automatic acquisition of hy- ponyms from large text corpora, " in Proceed- ings of the Fourteenth International Conference on Computational ... 3* 4* 2 0 4* 3* 1 0 4* 3* 3* 3* 4* 1 2 1 0 2 4* 64 Finding Parts in Very Large Corpora Matthew Berland, Eugene Charniak rob, ec @ cs. brown, edu Department of Computer Science...
  • 8
  • 351
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Creating a Multilingual Collocation Dictionary from Large Text Corpora" docx

Báo cáo khoa học

... Extraction with FipsCollocations are extracted from syntactically ana-lysed corpora. The analysis is performed by Fips, a large- scale parser based on an adaptation ofChomksy's "Principles ... returns chunks of partial analyses. If132Creating a Multilingual Collocation Dictionary from Large Text Corpora Luka Nerima, Violeta Seretan, Eric WehrliLanguage Technology Laboratory (LATL), Dept. ... Linguis-tics, 19(1):61-74.Gale W. and Church K. (1991). A program for aligningsentences in bilingual corpora Computational Lin-guistics, 19(1):75-102.Gross, G. (1996). Les expressions figees en...
  • 4
  • 479
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Creating a Multilingual Collocation Dictionary from Large Text Corpora" ppt

Báo cáo khoa học

... coherent text spansfound in the corpora resources. At the same time,we intend to provide a quite precise and delimitedcontext, that's why we do not consider a largercontext (such as the whole ... usingmark-up from text encoding.133Creating a Multilingual Collocation Dictionary from Large Text Corpora Luka Nerima, Violeta Seretan, Eric WehrliLanguage Technology Laboratory (LATL), Dept. ... col-location's keys occur on the same sentence, as theyare in a syntactical relation).When parallel corpora are available, also thetranslation equivalents of the collocation contextare displayed,...
  • 4
  • 353
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "CS NIPER Annotation-by-query for non-canonical constructions in large corpora" pdf

Báo cáo khoa học

... annotation tasks that require manual analysisover large corpora. The approach is generalizableto any kind of linguistic phenomena that can be lo-cated in corpora on the basis of queries and requiremanual ... (Corpus Sniper), atool that implements (i) a web- based multi-user scenario for identifying and annotatingnon-canonical grammatical constructions in large corpora based on linguistic queries and(ii) ... for the an-notation of linguistic phenomena whose investiga-tion requires the analysis of large corpora due toa relatively low frequency of instances and whoseidentification requires expert...
  • 6
  • 356
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation" ppt

Báo cáo khoa học

... Lin-guistically motivated large- scale NLP with C&C andBoxer. In Proc. ACL Demo and Poster Sessions, pages33–36.Ido Dagan and Alan Itai. 1990. Automatic processing of large corpora for the resolution ... parallel text.5 Data Web- scale text data is used for monolingual featurecounts, parallel text is used for classifier co-training,and labeled data is used for training and evaluation. Web- scale N-gram ... MaryMarcinkiewicz. 1993. Building a large annotated cor-pus of English: The Penn Treebank. ComputationalLinguistics, 19(2):313–330.Preslav Nakov and Marti Hearst. 2005. Using the web asan implicit training...
  • 10
  • 406
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "A System for Large-Scale Acquisition of Verbal, Nominal and Adjectival Subcategorization Frames from Corpora" pot

Báo cáo khoa học

... acquisition of lexical in-formation from large repositories of unannotatedtext (such as the web, corpora of published text,etc.) is starting to produce large scale lexical re-sources which include ... for large- scale acquisition of subcategorizationframes (SCFs) from English corpus datawhich can be used to acquire comprehen-sive lexicons for verbs, nouns and adjectives.The system incorporates ... Association for Computational LinguisticsA System for Large- Scale Acquisition of Verbal, Nominal and AdjectivalSubcategorization Frames from Corpora Judita Preiss, Ted Briscoe, and Anna KorhonenComputer...
  • 8
  • 551
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Discovering Relations among Named Entities from Large Corpora" pot

Báo cáo khoa học

... discov-ery, however, needed large annotated corpora whichcost a great deal of time and effort. We proposean unsupervised method for relation discovery from large corpora. The key idea is clustering ... discovering relationsamong various entities from large text corpora. Ourmethod does not need the richly annotated corpora required for supervised learning — corpora whichtake great time and effort ... frequently mentioned in large corpora. Conversely, relations mentioned once ortwice are not likely to be important.Our basic idea is as follows:1. tagging named entities in text corpora 2. getting...
  • 8
  • 283
  • 0
Developing Large Web Applications pdf

Developing Large Web Applications pdf

Quản trị Web

... contribute to the complexity of many large web applications. Typically, large web applications have the following characteristics:Continuous availabilityMost large web applications must be running ... load times. Web developers need towrite code that is especially robust. Large user base Large web applications usually have large numbers of users. This necessitatesmanagement of a large number ... informationarchitecture of the module.28 | Chapter 3: Large- Scale HTMLwww.it-ebooks.infoCHAPTER 1The TenetsAs applications on the Web become larger and larger, how can web developers managethe complexity?...
  • 302
  • 794
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Scaling to Very Very Large Corpora for Natural Language Disambiguation" potx

Báo cáo khoa học

... unsupervised learning with large training corpora, in hopes of being able to obtain the benefits that come from significantly larger training corpora without incurring too large a cost. 2 Confusion ... exploiting very large corpora when labeled data comes at a cost. 1 Introduction Machine learning techniques, which automatically learn linguistic information from online text corpora, have ... a large corpus. Computers and the Humanities, 26:415 439. Golding, A. R. (1995). A Bayesian hybrid method for context-sensitive spelling correction. In Proc. 3rd Workshop on Very Large Corpora, ...
  • 8
  • 265
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "AUTOMATIC ACQUISITION OF A LARGE SUBCATEGORIZATION DICTIONARY FROM CORPORA" doc

Báo cáo khoa học

... (ed.). 1977. Webster's seventh new collegiate dictionary. Springfield, MA: G. & C. Merriam. Hearst, Marti. 1992. Automatic Acquisition of Hyponyms from Large Text Corpora. In Pro- ... frames were determined are in Webster's (Gove 1977) (the only noticed exceptions being certain instances of prefixing, such as overcook and repurchase), but a larger number of the verbs ... AUTOMATIC ACQUISITION OF A LARGE SUBCATEGORIZATION DICTIONARY FROM CORPORA Christopher D. Manning Xerox PARC and Stanford University Stanford...
  • 8
  • 342
  • 0

Xem thêm

Tìm thêm: xác định các mục tiêu của chương trình xác định các nguyên tắc biên soạn khảo sát các chuẩn giảng dạy tiếng nhật từ góc độ lí thuyết và thực tiễn khảo sát chương trình đào tạo của các đơn vị đào tạo tại nhật bản điều tra đối với đối tượng giảng viên và đối tượng quản lí điều tra với đối tượng sinh viên học tiếng nhật không chuyên ngữ1 khảo sát thực tế giảng dạy tiếng nhật không chuyên ngữ tại việt nam nội dung cụ thể cho từng kĩ năng ở từng cấp độ xác định mức độ đáp ứng về văn hoá và chuyên môn trong ct phát huy những thành tựu công nghệ mới nhất được áp dụng vào công tác dạy và học ngoại ngữ các đặc tính của động cơ điện không đồng bộ đặc tuyến tốc độ rôto n fi p2 đặc tuyến dòng điện stato i1 fi p2 động cơ điện không đồng bộ một pha sự cần thiết phải đầu tư xây dựng nhà máy thông tin liên lạc và các dịch vụ phần 3 giới thiệu nguyên liệu từ bảng 3 1 ta thấy ngoài hai thành phần chủ yếu và chiếm tỷ lệ cao nhất là tinh bột và cacbonhydrat trong hạt gạo tẻ còn chứa đường cellulose hemicellulose chỉ tiêu chất lượng theo chất lượng phẩm chất sản phẩm khô từ gạo của bộ y tế năm 2008 chỉ tiêu chất lượng 9 tr 25