Báo cáo khoa học: "Weakly-Supervised Acquisition of Open-Domain Classes and Class Attributes from Web Documents and Query Logs" pot

... contents of query logs duringthe extraction of labeled classes of instances from Web documents, we acquire thousands (4,583, tobe exact) of open-domain classes covering a widerange of topics and ... period, symptoms, ] Query logs Web documents (1)(2)Figure 1: Overview of weakly-supervised extraction of class instances, class labels and class attributes from Web documents and query logsstudy ... informationextraction exploits both Web documents and query logs to acquire thousands of open-domain classes of instances, along with rel-evant sets of open-domain class attributes atprecision levels...

Tài liệu Báo cáo khoa học: "AUTOMATIC ACQUISITION OF SUBCATEGORIZATION FRAMES FROM UNTAGGED TEXT" doc

... nique based on the Case Filter of Rouvret and Vergnaud (1980). The completeness of the output list increases monotonically with the total number of occurrences of each verb in the corpus. False ... immediately to the left of a tensed verb, immediately to the right of a preposition, or immediately to the right of a main verb. Adverbs and adverbial phrases (including days and dates) are ignored ... Table 2: Efficiency of verb detection for each of the five SFs, as tested on 2.6 million words of the Wall Street Journal and controlled by the Penn Treehank's hand-verified tagging...

Báo cáo khoa học: "Automatic Acquisition of Ranked Qualia Structures from the Web" potx

... are made up of ” NPQTis made up of NP’C“p(x) are made of NPQTare made of NP’C“p(x) comprise” NPQTcomprise (of) ? NP’C“p(x) consist of NPQTconsist of NP’CTable 2: Clues and Patterns ... (Web- Jac) measure relies onthe web search engine to calculate the number of documents in which x and y co-occur close to eachother, divided by the number of documents each oneoccurs, i.e. Web- Jac(x, ... de-ﬁned over part -of- speech tags and c a function c :string → string called the clue. Given a nomi-nal n and a clue c, the query c(n) is sent to the web search engine and the abstracts of the ﬁrst...

Báo cáo khoa học: "Automatic Acquisition of Adjectival Subcategorization from Corpora" docx

... forsubcategorization acquisition. 1 IntroductionResearch into automatic acquisition of lexical in-formation from large repositories of unannotatedtext (such as the web, corpora of published text,etc.) ... University of Edinburgh Laboratory for Foundations of Computer Science.state -of- art statistical systems and for improving theportability of these systems between domains.One type of lexical ... further classified accord-ing to the nature of the arguments with which theycombine — finite and non-finite clauses and nounphrases, phrases with and without complementisers,etc. — and whether...

Báo cáo khoa học: "Automatic Acquisition of English Topic Signatures Based on a Second Language" potx

... Chinese-English and English-Chinese bilingual lexicons and a large amount of Chinese text, which can be collected either from the Web or from Chinese corpora. Since topic sig-natures are potentially ... instances of the ﬁ-nancial sense of interest. One set was extracted from a hand-tagged corpus (Bruce and Wiebe,1994) and the other by our algorithm.3 Application on WSDTo evaluate the usefulness of ... mainly from the Mandarin portion of the Chinese Gigaword Corpus (CGC), producedby the LDC3, which contains 1.3GB of newswiretext drawn from Xinhua newspaper. Some Chi-nese translations of English...

Báo cáo khoa học: "Automatic Acquisition of Named Entity Tagged Corpus from World Wide Web" pot

... Collecting Web Documents It is not appropriate for our purpose to randomly col-lect documents from the web. This is because not all web documents actually contain some NE instances and we also ... web to be used for learning of Named En-tity Recognition systems. We use an NElist and an web search engine to col-lect web documents which contain theNE instances. The documents are reﬁnedthrough ... automati-cally constructs an NE tagged corpus from the web to be used for learning of NER systems. We use anNE list and an web search engine to collect web doc-uments which contain the NE instances....

Báo cáo khoa học: "Automatic Acquisition of Language Model based on Head-Dependent Relation between Words" pdf

... complete-sequence composed of two complete-links, and (b) is a leftward one. (c) is a complete-sequence composed of zero complete- links, and it can be both leftward and rightward. The word of "complete" ... list of all pairs of unique words in the corpus. The initial pairs represent the ten- tative head-dependent relations of the words. And the initial probabilities of the pairs can be given randomly. ... gram-based and the other is grammar-based. N-gram model estimates the probability of a sentence as the product of the probability of each word in the sentence. It assumes that probability of the...

Báo cáo khoa học: "AUTOMATIC ACQUISITION OF A LARGE SUBCATEGORIZATION DICTIONARY FROM CORPORA" doc

... basic measures of results are the information retrieval notions of recall and precision: How many of the subcategorization frames of the verbs were learned and what percentage of the things ... many of the uses of verbs in a text are captured by our subcategorization dictionary. For two randomly selected pieces of text from other parts of the New York Times newswire, a portion of ... baseline accuracy of 32% that would result from always guessing TV (transitive verb) and a per- formance figure of 62% that would result from a system that correctly classified all TV and THAT verbs...

Báo cáo khoa học: "AUTOMATIC ACQUISITION OF THE LEXICAL SEMANTICS OF VERBS FROM SENTENCE FRAMES*" doc

... AUTOMATIC ACQUISITION OF THE LEXICAL SEMANTICS OF VERBS FROM SENTENCE FRAMES* Mort Webster and Mitch Marcus Department of Computer and Information Science University of Pennsylvania ... analysis of the classes currently handled. It is interesting to note that although the partial ordering of verb classes is defined in terms of fea- tures defined over syntactic and theta ... Figure 1 shows our classification of some verb classes of English, given this feature set. (This classification owes much to Levin(1985), as well as to Grimshaw(1983) and Jackendoff(1983).) This...

Báo cáo khoa học: "Automatic Acquisition of Script Knowledge from a Text Collection" docx

... Automatic Acquisition of Script Knowledge from a Text CollectionToshiaki FujikiHidetsugu NanbaInterdisciplinary Graduate School of Graduate School of Science and EngineeringInformation ... automatic acquisition of script knowl-edge and investigated the effectiveness of ourmethod. We used issues of Nihon Keizai Shim-bun for the past 11 years (1990-2000) as a news-paper corpus and ... weextracted only the first paragraph from each report, and arranged the paragraphs in clusters based onthe date of issue of the report. We used only thefirst paragraphs of the news reports because theytend...

Xem thêm