0

keyword extraction from the web for foaf metadata

Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Extraction and Approximation of Numerical Attributes from the Web" pdf

Báo cáo khoa học

... and reliable Web data. For some questions, the exact answer is the onlypossible one (e.g., the height of a person), while for others it is only the center of a distribution(e.g., the weight of ... datasets, Web and TRECbased. Web- based QA dataset. We created QAdatasets for size, height, width, weight, and depthattributes. For each attribute we extracted from the Web 250 questions in the following ... based on the number of web snip-pets retrieved during the value acquisition stage.If there are several values with the same frequencywe select the median of these values.Approximating the attribute...
  • 10
  • 465
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "A DOM Tree Alignment Model for Mining Parallel Data from the Web" doc

Báo cáo khoa học

... (1) Given a web site, the root page and web pages directly linked from the root page are downloaded. Then for each of the downloaded web page, all of its anchor texts (i.e. the hyperlinked ... English-Chinese parallel data from the web. The mining procedure is initiated by acquiring Chinese website list. We have downloaded about 300,000 URLs of Chinese websites from the web directories at ... that, using the new web mining scheme, the web mining throughput is increased by 32%; (ii) The quality of the mined data is improved. By lever-aging the web pages’ HTML structures, the sen-tence...
  • 8
  • 435
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: " Mining the Web for Language Learning" pdf

Báo cáo khoa học

... verbsdescribing actions for the noun “TV.” In the results,we find fresh and authentic sample sentences mined from the web, the first of which contains “watchTV,” the most common collocation, as the top result.Additionally, ... consistsof the crawler and the raw web page storage. The crawler periodically downloads two kinds of web pages, which are put into the storage. The first kindof web pages are parallel web pages ... round of the mining process. The second layer consists of the extractor, the filter, the classifiers and the readability evaluator,which are applied sequentially. The extractor scans the raw web page...
  • 6
  • 658
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Automatic Collection of Related Terms from the Web" pptx

Báo cáo khoa học

... query. In case the query is a term, its hitis the number of pages that contain the term on the Web. We use the following notation.H(x)= the number of pages that contain the term x” The number H ... half(Evaluation II) in Table 2 shows the result.S: the target term was collected by the system.F: the target term was removed in the filtering step.A: the target term existed in the compiled corpus,but ... automatic term extrac-tion.C: the target term existed in the collected web pages, but did not exist in the compiled corpus.R: the target term did not exist on the collected web pages.Only 43 terms...
  • 4
  • 437
  • 0
Tài liệu This material is from the Council for Economic Education docx

Tài liệu This material is from the Council for Economic Education docx

Cao đẳng - Đại học

... by the Indiana Council for Economic Education (ICEE). For further information see www.econed-in.org☞Parent and Community Support: Parents are strong supporters of the mini-economy program. They ... become rather hectic, these tips should be helpful:  Display the Items/Privileges Before the Auction Takes Place. This allows the students toexamine carefully what will be offered for sale ... usually open before school or at the end of the day. Others have special “store times” for students to shop.✎ Setting Prices: Unlike the class auction, it will be necessary to set prices for store...
  • 14
  • 467
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Automatic Set Instance Extraction using the Web" pptx

Báo cáo khoa học

... and the Bootstrapper thenenhances it further more. On average, the Ex-pander improves the performance of the Provider from 37% to 80% for English, 24% to 82% for Chinese, and 12% to 89% for ... com-ponents: the Fetcher, Extractor, and Ranker. The Fetcher is responsible for fetching web docu-ments, and the URLs of the documents come from top results retrieved from the search engine us-ing the ... effective even without the Expander; itdirectly improves the performance of the Provider from 37% to 77% for English, 24% to 52% for Chinese, and 12% to 39% for Japanese. The simple back-off strategy...
  • 9
  • 331
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Automatic Acquisition of Ranked Qualia Structures from the Web" potx

Báo cáo khoa học

... results for the formal and agentive roles, while for the con-stitutive and telic roles the Web- Jac measure per-Figure 1: Average F1measure for the different rank-ing measuresformed best. The ... coefficient (Web- Jac), the PointwiseMutual Information (Web- PMI) and the conditionalprobability (Web- P). We also present a version of the conditional probability which does not use the Web but merely ... presents an approach for the au-tomatic acquisition of qualia structures for nouns from the Web and thus opens the pos-sibility to explore the impact of qualia struc-tures for natural language...
  • 8
  • 378
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Mining the Web for Bilingual Text" pot

Báo cáo khoa học

... [END:TITLE]. The number inside the chunk token is the length of the text chunk, not counting whitespace; from this point on only the length of the text chunks is used, and therefore the structural ... considered the most reliable, these were used as the basis for the computation of recall and precision. For this reason, and because the human-judged set included only a sample of the full ... data from the European Cor- pus Initiative (ECI), available from the Linguis- tic Data Consortium (LDC). In a formal evaluation, STRAND with the new language identification stage was run for...
  • 8
  • 229
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Mining Parenthetical Translations from the Web by Word Alignment" potx

Báo cáo khoa học

... our modified version of the competitive link-ing algorithm, the link score of a pair of words is the sum of the φ2 scores of the words themselves, their prefixes and their suffixes. In addition ... pairs, where the translation of the in-parenthesis terms is a suffix of the pre-parenthesis text. The lengths and frequency counts of the suffixes have been used to determine what is the translation ... Whenever there is more than one translation, we randomly pick one as the answer key. For each Chinese and English word in the Wikipedia data, we first find whether there is a translation for the...
  • 9
  • 612
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Semantic Class Learning from the Web with Hyponym Pattern Linkage Graphs" pdf

Báo cáo khoa học

... demon-strated how to scale up their algorithms for the web. Several techniques for semantic class inductionhave also been developed specifically for learning from the web. (Pasáca, 2004) uses Hearsts ... hyponym patterns toextract class instances from the web and then evalu-ates them further by computing mutual informationscores based on web queries. The work by (Widdows and Dorow, 2002) on lex-ical ... to instantiate the pattern. On the first iteration, the pattern is given to Google as a web query, and new class members are extracted from the retrieved text snippets. We wanted the system to...
  • 9
  • 340
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Extracting Hypernym Pairs from the Web" potx

Báo cáo khoa học

... very large text corpora. Today, the web contains more data than the largest availabletext corpus. For this reason, we are interested in em-ploying the web for the extraction of hypernym re-lations. ... interesting information. Most web search engines impose a limit on the numberof results returned from a query (for example 1000),which limits the opportunities for assessing the per-formance of ... about whether the size of the web allows to achieve meaningful resultswith basic extraction techniques.In section two we introduce the task, hypernym extraction. Section three presents the results...
  • 4
  • 395
  • 0

Xem thêm

Tìm thêm: hệ việt nam nhật bản và sức hấp dẫn của tiếng nhật tại việt nam xác định các nguyên tắc biên soạn khảo sát các chuẩn giảng dạy tiếng nhật từ góc độ lí thuyết và thực tiễn khảo sát chương trình đào tạo của các đơn vị đào tạo tại nhật bản khảo sát chương trình đào tạo gắn với các giáo trình cụ thể tiến hành xây dựng chương trình đào tạo dành cho đối tượng không chuyên ngữ tại việt nam điều tra đối với đối tượng giảng viên và đối tượng quản lí điều tra với đối tượng sinh viên học tiếng nhật không chuyên ngữ1 khảo sát thực tế giảng dạy tiếng nhật không chuyên ngữ tại việt nam khảo sát các chương trình đào tạo theo những bộ giáo trình tiêu biểu xác định mức độ đáp ứng về văn hoá và chuyên môn trong ct mở máy động cơ rôto dây quấn các đặc tính của động cơ điện không đồng bộ hệ số công suất cosp fi p2 đặc tuyến hiệu suất h fi p2 động cơ điện không đồng bộ một pha thông tin liên lạc và các dịch vụ phần 3 giới thiệu nguyên liệu từ bảng 3 1 ta thấy ngoài hai thành phần chủ yếu và chiếm tỷ lệ cao nhất là tinh bột và cacbonhydrat trong hạt gạo tẻ còn chứa đường cellulose hemicellulose chỉ tiêu chất lượng 9 tr 25