a database of lexical relations

Text mining tutorial pascal

Text mining tutorial pascal

Ngày tải lên : 23/10/2014, 11:47
... AFTER, AGAIN, AGAINST, ALL, ALMOST, ALONE, ALONG, ALREADY, ALSO, Slovenian: A, AH, AHA, ALI, AMPAK, BAJE, BODISI, BOJDA, BRŽKONE, BRŽČAS, BREZ, CELO, DA, DO, Croatian: A, AH, AHA, ALI, AKO, BEZ, ... Visualization of a single document Why visualization of a single document is hard? Visualizing of big text corpora is easier task because of the big amount of information: statistics already starts ... INFORMATION RETRIEVAL (\#2 INFORMATION RETRIEVAL) GROUP" WordNet – a database of lexical relations WordNet is the most well developed and widely used lexical database for English …it consist from databases...
  • 125
  • 279
  • 1
Báo cáo khoa học: "A Study on Automatically Extracted Keywords in Text Categorization" doc

Báo cáo khoa học: "A Study on Automatically Extracted Keywords in Text Categorization" doc

Ngày tải lên : 08/03/2014, 02:21
... evaluation of the usefulness of statistical phrases for automated text categorization In Text Databases and Document Management: Theory and Practice, pages 78– 102 Rada Mihalcea and Paul Tarau ... the one and only truth The evaluation measures are precision (how many of the automatically assigned keywords that are also manually assigned keywords) and recall (how many of the manually assigned ... document classification In Proceedings of the Conference on Recent Advances in Natural Language Processing (RANLP 2005) Maria Fernanda Caropreso, Stan Matwin, and Fabrizio Sebastiani 2001 A learner-independent...
  • 8
  • 496
  • 0
Tài liệu Word Segmentation for Vietnamese Text Categorization: An online corpus approach pptx

Tài liệu Word Segmentation for Vietnamese Text Categorization: An online corpus approach pptx

Ngày tải lên : 12/12/2013, 11:15
... Vietnamese text categorization until we have a good lexicon and/or a large and trusted training corpus Character-based approaches (syllable-based in Vietnamese case) purely extract certain number ... clustering of [16] Yiming Yang, 1999 An evaluation of Statistical Approaches to Text words for text categorization Proceedings of the 21st Annual Categorization Journal of Information Retrieval, Vol ... syllables, thus, we need a statistic measure of syllable associations Mutual information (MI), an important concept of information theory, has been used in natural language processing to capture...
  • 6
  • 741
  • 1
An investigation into the effects of brainstorming and giving a text as model on phan dinh phung high school student's attitude and writing ability

An investigation into the effects of brainstorming and giving a text as model on phan dinh phung high school student's attitude and writing ability

Ngày tải lên : 18/12/2013, 10:08
... students at elementary level had writing lesson, they would need the teachers assistance at early stages Using a text as a model has both sides, advantages and disadvantages To begin with its advantages, ... the approval the application of giving a text as a model in the future Table 11: The reasons for the disapproval of application of giving a text as a model in the future TABLE OF CONTENTS Page ... vocabulary and more relevant ideas than giving a text as a model Moreover, a majority of students have a positive attitude towards brainstorming meanwhile a small number of students have a positive...
  • 60
  • 717
  • 0
Tài liệu Báo cáo khoa học: "Learning with Unlabeled Data for Text Categorization Using Bootstrapping and Feature Projection Techniques" doc

Tài liệu Báo cáo khoa học: "Learning with Unlabeled Data for Text Categorization Using Bootstrapping and Feature Projection Techniques" doc

Ngày tải lên : 20/02/2014, 16:20
... a Naive Bayes classifier can be built Since the Naive Bayes classifier can label all unlabeled documents for their category, we can finally obtain labeled training data (machine-labeled data) ... classifiers can be improved by augmenting a small number of labeled training data with a large pool of unlabeled data Bennet and Demiriz achieved small improvements on some UCI data sets using ... equal to c j , the output value is Otherwise, the output value is j j Using a Feature Projection Technique for Handling Noisy Data of Machine-labeled Data We finally obtained labeled data of a...
  • 8
  • 443
  • 0
Tài liệu Báo cáo khoa học: "Fragments and Text Categorization" pptx

Tài liệu Báo cáo khoa học: "Fragments and Text Categorization" pptx

Ngày tải lên : 20/02/2014, 16:20
... Table 2) -5 accuracy Table 2: Results for skip-tail and the Naăve Bayes (n=number of classication tasks, NB=average of error rates for full documents, stail=average of error rates for skip-tail, ... skip-tail and for all languages and for English and Czech in the case of fragments However, an increase of accuracy is observed even for 60% of the average length (see Fig 1) Moreover, for the average ... international ACM SIGIR conference on Research and development in information retrieval, pages 4249 ACM Press Ă Additional information, namely lexical or syntactic, may result in even higher accuracy of...
  • 4
  • 360
  • 0
An Analysis of Database Workload Performance on Simultaneous Multithreaded Processors potx

An Analysis of Database Workload Performance on Simultaneous Multithreaded Processors potx

Ngày tải lên : 07/03/2014, 14:20
... contains the database code and is shared among all database processes • The Program Global Area (PGA) contains perprocess data, such as private stacks, local variables, and private session variables ... The Shared Global Area (SGA) contains the database buffer cache, the data dictionary (indices and other metadata), the shared SQL area (which allows multiple users to share a single copy of an SQL ... each transaction corresponds to a bank account deposit Each transaction is small, but updates several database tables (e.g., teller and branch) OLTP workloads are intrinsically parallel, and therefore...
  • 12
  • 406
  • 0
Báo cáo khoa học: "A Comparison and Semi-Quantitative Analysis of Words and Character-Bigrams as Features in Chinese Text Categorization" potx

Báo cáo khoa học: "A Comparison and Semi-Quantitative Analysis of Words and Character-Bigrams as Features in Chinese Text Categorization" potx

Ngày tải lên : 08/03/2014, 02:21
... it is significant to make a detailed comparison and analysis here on the relative value of words and bigrams as features in Text Categorization The organization of this paper is as follows: Section ... for “natural intraword bigrams” We can moderately distinguish these two kinds of bigrams by a division at -1.4 12 Overall Information Quantity of a Feature Space The performance limit of a classification ... classifications, F1-measure, precision, recall and accuracy (Baeza-Yates and Ribeiro-Neto, 1999; Sebastiani, 2002) have the same value by microaveraging9, and are labeled with “performance” in the following...
  • 8
  • 492
  • 0
Báo cáo khoa học: "A Generalized Vector Space Model for Text Retrieval Based on Semantic Relatedness" pot

Báo cáo khoa học: "A Generalized Vector Space Model for Text Retrieval Based on Semantic Relatedness" pot

Ngày tải lên : 08/03/2014, 21:20
... that semantic information can create an individual space, leading to a dual representation of each document, namely, a vector with document’s terms and another with semantic information Rationally, ... harmonic mean offers a lower upper bound than the average of depths and we think is a more realistic estimation of the path’s depth SCM and SPE capture the two most important parameters of measuring ... same pattern as the human ratings, as closely as our measure of relatedness does (low y values for small x values and high y values for high x) The same pattern applies in the M&C and 353-C data...
  • 9
  • 394
  • 0
Báo cáo khoa học: "Exploiting Comparable Corpora and Bilingual Dictionaries for Cross-Language Text Categorization" potx

Báo cáo khoa học: "Exploiting Comparable Corpora and Bilingual Dictionaries for Cross-Language Text Categorization" potx

Ngày tải lên : 17/03/2014, 04:20
... on the availability of a multilingual lexical resource For languages with scarce resources a bilingual dictionary could be not easily available Secondly, an important requirement of such a resource ... in many NLP applications, such as Word Sense Disambiguation (Strapparava et al., 2004), Text Categorization and Term Categorization A Domain Model is composed of soft clusters of terms Each cluster ... lemmata # lemmata 22,704 26,404 43,384 5,724 Table 3: Number of lemmata in the training parts of the corpus Evaluation The CLTC task has been rarely attempted in the literature, and standard evaluation...
  • 8
  • 361
  • 0
Báo cáo khoa học: "Modeling Topic Dependencies in Hierarchical Text Categorization" pot

Báo cáo khoa học: "Modeling Topic Dependencies in Hierarchical Text Categorization" pot

Ngày tải lên : 23/03/2014, 14:20
... 2006b Making tree kernels practical for natural language learning In Proccedings of EACL’06 S Riezler and A Vasserman 2010 Incremental feature selection and l1 regularization for relaxed maximumentropy ... hypotheses and 32k of training data for rerankers using STK data over all categories (103 or 363) Additionally, the F1 of some binary classifiers are reported 4.2 Classification Accuracy In the ... training set in two chunks of data: Train1 and Train2 The binary classifiers are trained on Train1 and tested on Train2 (and vice versa) to generate the hypotheses on Train2 (Train1) The union of...
  • 9
  • 210
  • 0
Báo cáo khoa học: "A Framework of Feature Selection Methods for Text Categorization" potx

Báo cáo khoa học: "A Framework of Feature Selection Methods for Text Categorization" potx

Ngày tải lên : 30/03/2014, 23:20
... probability distribution Both the left and right graphs have shadowed areas of the same size And the value of Ai − Bi can be rewritten as the following A − Bi Ai − Bi = i ⋅ Ai = (1 − ) ⋅ Ai Ai Ai ... documents as the training data and the remaining 10% as testing data The training data are used for training SVM classifiers, learning parameters in WFO method and selecting "good" features for each ... measurements In the experiment, we consider the case when training data have equal class prior probabilities When training data are unbalanced, we need to change the forms of the two basic measurements...
  • 9
  • 406
  • 0
Báo cáo khoa học: "A Ranking Model of Proximal and Structural Text Retrieval Based on Region Algebra" ppt

Báo cáo khoa học: "A Ranking Model of Proximal and Structural Text Retrieval Based on Region Algebra" ppt

Ngày tải lên : 31/03/2014, 03:20
... useschronic and acute asthma ([neoplastic] Ê (lung Q cancer)) ([therapeutic] Ê (radiation Q therapy)) lung cancer lung cancer, radiation therapy ([disease]Êpancytopenia) ([neoplastic]Ê(acuteQmegakaryocyticQleukemia)) ... female with anorexia/bulimia complications and management of anorexia and bulimia ([disease] Ê diabete) ([disease] Ê (peripheral Q neuropathy)) ([therapeutic] Ê pentoxifylline) 50 year old diabetic ... Salminen and Frank Tompa 1994 Pat expressions: an algebra for text search Acta Linguistica Hungarica, 41(1-4):277306 Anja Theobald and Gerhard Weilkum 2000 Adding relevance to XML In Proceedings of...
  • 8
  • 419
  • 0
Báo cáo khoa học: "Automatic Text Summarization Based on the Global " ppt

Báo cáo khoa học: "Automatic Text Summarization Based on the Global " ppt

Ngày tải lên : 31/03/2014, 04:20
... came r e is an open-class attribute, potentially encompassing all the binary relations lexicalized in natural languages An exhaustive listing of thematic roles and rhetorical relations appears ... preferences of its reader Let us refer to the adaptation of the summarization process to a particular user as personalization GDA-based summarization can be easily personalized because our method ... its applications to automatic text summarization We are evaluating our summarization method using online Japanese articles with GDA tags We are also extending text summarization to that of hypertext...
  • 5
  • 298
  • 0
Báo cáo khoa học: "TEXTUAL EXPERTISE IN WORD EXPERTS: AN APPROACH TO TEXT PARSING BASED ON TOPIC/COMMENT MONITORING" potx

Báo cáo khoa học: "TEXTUAL EXPERTISE IN WORD EXPERTS: AN APPROACH TO TEXT PARSING BASED ON TOPIC/COMMENT MONITORING" potx

Ngày tải lên : 31/03/2014, 17:20
... Systemsof tware, pascal-Ccmpi let Pascal Mik=oc~te~ l~oqre~nierspracbe.Pascal PascWaze Pasta Compt Herstel lez pascWare let Pasta Hers ~eller PasCWa re FRA~Z SIL'T F~ SVAL FRAME SVAL F*,, FRA~ SVAL ... underlining of variables indicates that they have already been bound, i.e the evaluation of the condition in which a variable occurs takes the value already assigned, otherwise a value assignment is made ... FALSE arc of #2, #3 also evaluates to FALSE as according to the current state of analysis context contains no information indicating that frame" (Mikrocomputer) has a slot" to which has been assigned...
  • 7
  • 314
  • 0
Image denoising techniques to improve the performance on optical character recognition.

Image denoising techniques to improve the performance on optical character recognition.

Ngày tải lên : 12/04/2014, 15:39
... V.R.Vijaykumar, P.T.Vanathi, P.Kanagasabapathy Fast and Efficient Algorithm to RemoveGaussian Noise in Digital Images [3] Faisal Shafait Daniel Ke ys ers T homas M Breuel Efficient Implementation of Local ... image is: Xij= Oij+Gij Each of the noise value G is drawn by zeromean Gaussian distribution Many Gaussian noise reduction algorithms demand standard deviation and consider it as a measure of ... [5] T Kasar, J Kumar and A G Ramakrishnan Font and Background Color Independent Text Binarization [6] Julinda Gllavata, Ralph Ewerth and Bernd Freisleben FINDING TEXT IN IMAGES VIA LOCAL THRESHOLDING...
  • 26
  • 301
  • 0
đề tài   text categorization phân loại văn bản (chương 16)

đề tài text categorization phân loại văn bản (chương 16)

Ngày tải lên : 27/06/2014, 11:55
... Khoa CNTT, ĐH Bách Khoa TPHCM, Bài Giảng Lý Thuyết Thông Tin [3] Kostas Fragos, Yannis Maistros, Christos Skourlas, A Weighted Maximum Entropy Language Model for Text Classification [4] Kamal ... (∑ (⃗ )) (⃗ )) Sau tính hai xác suất, lớp có xác suất cao lớp cho văn 34 Tài liệu tham khảo [1] Christopher D.Manning, Hinrich Schutze, Foundations of Statistical Natural Language Processing, ... phải phân chia nhiều lớp, để đánh giá tổng thể toàn lớp phân loại, sau lập 10 bảng thống kê cho lớp, hai phương pháp áp dụng để đánh giá micro-averaging macro-averaging 2.6.1 Macro-Averaging Đây...
  • 38
  • 371
  • 0
8 Ways to Great: Peak Performance on the Job and in Your Life pps

8 Ways to Great: Peak Performance on the Job and in Your Life pps

Ngày tải lên : 29/07/2014, 03:20
... a lot of time trying to become a better manager But that would have meant taking away time from doing what he does best—analyzing the market and making smart trading decisions In any case, Darren ... good trader can become great, and any great trader can become elite In fact, every one of the qualities I mentioned above results from mastering a teachable skill that can be acquired and applied ... recognize patterns in the market May lose objectivity and think I see patterns that are not really relevant Highly analytical Able to objectively analyze data and not get distracted by size of trades...
  • 126
  • 658
  • 1
Báo cáo sinh học: "The effect of improved reproductive performance on genetic gain and inbreeding in MOET breeding schemes for beef cattle" pot

Báo cáo sinh học: "The effect of improved reproductive performance on genetic gain and inbreeding in MOET breeding schemes for beef cattle" pot

Ngày tải lên : 09/08/2014, 18:22
... (average age of parents when offspring are born) for males and females; and 5) variance of family sizes for male and female parents The latter was calculated as described in Villanueva et al (1994) ... coefficient of variation was calculated as where MEAN is the overall mean of embryos collected CV (or2 flush and per donor The estimates of Qand o-w were obtained from an analper B ysis of variance of ... simulated assuming and additive infinitesimal model with an initial heritability of 0.35 The nucleus was established with males and 18 females of 2, and years of age The number of animals in each age...
  • 17
  • 371
  • 0