semisupervised simhash for efficient document similarity search

Báo cáo khoa học: "Semi-Supervised SimHash for Efficient Document Similarity Search" pptx

Báo cáo khoa học: "Semi-Supervised SimHash for Efficient Document Similarity Search" pptx

Ngày tải lên : 30/03/2014, 21:20
... Association for Computational Linguistics, pages 93–101, Portland, Oregon, June 19-24, 2011. c 2011 Association for Computational Linguistics Semi-Supervised SimHash for Efficient Document Similarity Search Qixia ... are similar to a query document is an important component in modern information retrieval. Some ex- isting hashing methods can be used for effi- cient document similarity search. However, unsupervised ... best performance. 1 Introduction Document Similarity Search (DSS) is to find sim- ilar documents to a query doc in a text corpus or on the web. It is an important component in mod- ern information...
  • 9
  • 389
  • 0
Tài liệu Báo cáo khoa học: "SPEECH OGLE: Indexing Uncertainty for Spoken Document Search" pptx

Tài liệu Báo cáo khoa học: "SPEECH OGLE: Indexing Uncertainty for Spoken Document Search" pptx

Ngày tải lên : 20/02/2014, 15:20
... employed for performing this com- putation. The computation for the backward proba- bility β n stays unchanged (Rabiner, 1989) whereas during the forward pass one needs to split the for- ward ... section, position in- formation is crucial for being able to evaluate prox- imity information when assigning a relevance score to a given document. In the spoken document case however, we are faced ... TREC evaluations. The PSPL lattices for each segment in the spoken document collection were indexed. In terms of rel- ative size on disk, the uncompressed speech for the first 20 lectures uses 2.5GB,...
  • 4
  • 255
  • 0
Tài liệu Báo cáo khoa học: "A Hybrid Hierarchical Model for Multi-Document Summarization" ppt

Tài liệu Báo cáo khoa học: "A Hybrid Hierarchical Model for Multi-Document Summarization" ppt

Ngày tải lên : 20/02/2014, 04:20
... p z s n = p(z s n |c o m ) via Eq.(3). The similarity between the distributions is then measured with transformed IR 818 Document Cluster 1 Document Cluster 2 Document Cluster n f 1 f 2 f 3 f q f-input ... scores 0.43 0.20 0.03 . . Figure 3: Flow diagram for Hybrid Learning Algorithm for Multi -Document Summarization. 7 Conclusion In this paper, we presented a hybrid model for multi -document summarization. We demonstrated that ... Report MSR-TR-2005-101, Microsoft Research, Red- wood, Washington, 2005. D.R. Radev, H. Jing, M. Stys, and D. Tam. Centroid-based summarization for multiple documents. In In Int. Jrnl. Information Process- ing...
  • 10
  • 559
  • 0
USE OF MINIMAL LEXICAL CONCEPTUAL STRUCTURES FOR SINGLE-DOCUMENT SUMMARIZATION doc

USE OF MINIMAL LEXICAL CONCEPTUAL STRUCTURES FOR SINGLE-DOCUMENT SUMMARIZATION doc

Ngày tải lên : 07/03/2014, 11:20
... license for English and Spanish parsers. 2. Update the host/port in the files fdges-client.pl (for Spanish) and fdgen-client.pl (for English). The current values should look like this for fdges-client.pl: $remote ... capabilities useful for intelligence analysts, such as cross-lingual summa- rization and data mining. 6.3 CONTRIBUTIONS TO RESOURCES FOR RESEARCH This work provides an integral part for many NLP applications ... Generation of Informative Cross-Lingual Headlines for Text and Speech. Thesis Proposal, University of Maryland, 2003. 5.3 OTHER PRODUCTS 1. Trimmer: Trimmer generates a headline for a news story...
  • 12
  • 361
  • 0
Leveraging User Comments for Aesthetic Aware Image Search Reranking pot

Leveraging User Comments for Aesthetic Aware Image Search Reranking pot

Ngày tải lên : 07/03/2014, 17:20
... Retrieval]: Information Search and Retrieval; H.5.1 [Information Interfaces and Presentation]: Multimedia Information Systems Keywords opinion mining, visual aesthetics modeling, image search reranking, ... aesthetic scores for ranking aesthetic-aware reranking. Intuitively, relevance and aesthetic quality are orthogonal dimensions and therefore convey complementary information about documents being ... Oliver Telefonica Research Barcelona, Spain nuriao@tid.es ABSTRACT The increasing number of images available online has created a growing need for efficient ways to search for relevant con- tent....
  • 10
  • 383
  • 0
Báo cáo khoa học: "Lexically-Triggered Hidden Markov Models for Clinical Document Coding" pot

Báo cáo khoa học: "Lexically-Triggered Hidden Markov Models for Clinical Document Coding" pot

Ngày tải lên : 07/03/2014, 22:20
... context and the document as a whole. For each candidate code, three types of features are generated: document features, ConText features, and code-semantics features (Table 1). Document: Document features ... Cherry Institute for Information Technology National Research Council Canada {Svetlana.Kiritchenko,Colin.Cherry}@nrc-cnrc.gc.ca Abstract The automatic coding of clinical documents is an important task for ... within a document may interact. It is an interesting combination of sen- tence and document- level processing. Formally, we define the document coding task as follows: given a set of documents...
  • 10
  • 397
  • 0
Báo cáo khoa học: "A Probabilistic Model for Fine-Grained Expert Search" pptx

Báo cáo khoa học: "A Probabilistic Model for Fine-Grained Expert Search" pptx

Ngày tải lên : 08/03/2014, 01:20
... CDD-based Formal Model for Expert Finding. In Proc. of CIKM 2007. Hertzum, M. and Pejtersen, A. M., 2000. The informa- tion-seeking practices of engineers: searching for documents as well as for ... of topics transformed from an original query will be obtained and then be used in the search for experts. Table 3 shows five forms of topic discovering from a given query. Forms Description ... 2005. Research on expert search at enterprise track of TREC 2005. In: Proc. of TREC 2005. Craswell, N., Hawking, D., Vercoustre, A. M. and Wil- kins, P., 2001. P@NOPTIC Expert: searching for ex- perts...
  • 9
  • 399
  • 0
Báo cáo khoa học: "A Bag of Useful Techniques for Efficient and Robust Parsing" ppt

Báo cáo khoa học: "A Bag of Useful Techniques for Efficient and Robust Parsing" ppt

Ngày tải lên : 08/03/2014, 06:20
... paper is available as DFKI Research Report RR-94-37. Hans-Ulrich Krieger and Ulrich Sch~ifer. 1995. Efficient parameterizable type expansion for typed feature formalisms. In Proceedings of ... Ivan A. Sag. 1987. Information-Based Syntax and Seman- tics. Vol. I: Fundamentals. CSLI Lecture Notes, Number 13. Center for the Study of Language and Information, Stanford. Stuart M. Shieber. ... German Research Center for Artificial Intelligence (DFKI), Saarbr/icken, Germany. Also in Proc. MT Summit IV, 127-135, Kobe, Japan, July 1993. 480 A Bag of Useful Techniques for Efficient...
  • 8
  • 340
  • 0
Báo cáo khoa học: "Towards an Iterative Reinforcement Approach for Simultaneous Document Summarization and Keyword Extraction" doc

Báo cáo khoa học: "Towards an Iterative Reinforcement Approach for Simultaneous Document Summarization and Keyword Extraction" doc

Ngày tải lên : 17/03/2014, 04:20
... retrieval, document clustering, etc. For example, keywords of a document can be used for document indexing and thus benefit to improve the performance of document retrieval, and document summary ... basic forms based on WordNet before comparison. The precision p, re- call r, F-measure (F=2pr/(p+r)) were obtained for each document and then the values were averaged over all documents for ... topics of the document without any additional clues and prior knowledge. In this paper, we focus on generic document summarization and keyword extraction for single documents. Document summarization...
  • 8
  • 393
  • 0
Báo cáo khoa học: "Packing of Feature Structures for Efficient Unification of Disjunctive Feature Structures" pptx

Báo cáo khoa học: "Packing of Feature Structures for Efficient Unification of Disjunctive Feature Structures" pptx

Ngày tải lên : 17/03/2014, 07:20
... factor of 6.4 to 8.4. For realiz- ing efficient NLP systems, I am currently build- ing an efficient parser by integrating the packing method with the compilation method for HPSG (Torisawa and ... this system. For performance evaluation I mea- sure the execution time for a part of application of grammar rules (i.e. schemata) of XHPSG. Table 1 shows the execution time for uni- fying ... Execution time for unification. Test data shows the word used for the experiment. # of LEs shows the number of lexical entries assigned to the word. Naive shows the time for unification...
  • 6
  • 296
  • 0
Báo cáo khoa học: "Topic Analysis for Psychiatric Document Retrieval" potx

Báo cáo khoa học: "Topic Analysis for Psychiatric Document Retrieval" potx

Ngày tải lên : 23/03/2014, 18:20
... consideration, all personal infor- mation has been removed. A total of 3,650 consultation documents were collected for evaluating the retrieval model, of which 20 documents were randomly selected ... 2000. IR Evaluation Methods for Retrieving Highly Relevant Documents. In Proc. of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages ... similarities at the two lengths. To calculate the similarity at 1027 for relevance estimation. These functions consider word frequencies and document lengths for word weighting. Both the VSM and Okapi...
  • 8
  • 338
  • 0
Báo cáo khoa học: "TOWARDS AN INTEGRATED ENVIRONMENT FOR SPANISH DOCUMENT VERIFICATION AND COMPOSITION" pptx

Báo cáo khoa học: "TOWARDS AN INTEGRATED ENVIRONMENT FOR SPANISH DOCUMENT VERIFICATION AND COMPOSITION" pptx

Ngày tải lên : 24/03/2014, 05:21
... marked. 3. There are additional marks for hyphenation points (for later use by a formatter performing automatic syllable partition), and several other for foreign and Latin words, geographical ... environment for document verification and composition. INTRODUCTION In the field of document processing many tools exist today which allow the user to introduce a text in storage, format it, ... bound to document composition: seve- ral other objectives are also foreseen for the dictionaries and the parser, a computer-assisted verb conjugation system has already been built for Spanish...
  • 4
  • 378
  • 0
Báo cáo khoa học: "A Bottom-up Approach to Sentence Ordering for Multi-document Summarization" ppt

Báo cáo khoa học: "A Bottom-up Approach to Sentence Ordering for Multi-document Summarization" ppt

Ngày tải lên : 31/03/2014, 01:20
... 2006. c 2006 Association for Computational Linguistics A Bottom-up Approach to Sentence Ordering for Multi -document Summarization Danushka Bollegala Naoaki Okazaki ∗ Graduate School of Information Science ... Ishizuka Abstract Ordering information is a difficult but important task for applications generat- ing natural-language text. We present a bottom-up approach to arranging sen- tences extracted for multi -document ... important for such MDS systems to determine a coherent arrangement of the tex- tual segments extracted from multi-documents in order to reconstruct the text structure for summa- rization. Ordering information...
  • 8
  • 239
  • 0
Báo cáo khoa học: "Weakly Supervised Learning for Cross-document Person Name Disambiguation Supported by Information Extraction" potx

Báo cáo khoa học: "Weakly Supervised Learning for Cross-document Person Name Disambiguation Supported by Information Extraction" potx

Ngày tải lên : 31/03/2014, 03:20
... paper focuses on cross -document disambiguation of person names. Previous research for cross -document name disambiguation applies vector space model (VSM) for context similarity, only using ... the context similarity for a context pair is a vector of similarity features, e.g. {VSM_Similairty_equal_to_2, NE _Similarity_ equal_to_1, Relationship_Conflicts_only, No_Sharing _for_ Age, ... Conflict _for_ Affiliation}. Besides the four categories of basic context similarity features defined above, we define induced context similarity features by combining basic context similarity...
  • 8
  • 333
  • 0
Short for Portable Document Format, a file format developed by Adobe Systems

Short for Portable Document Format, a file format developed by Adobe Systems

Ngày tải lên : 15/04/2014, 14:26
... rotation, and conforms to the formula in the previous table. Transformations can be defined in terms of a transform matrix. Such a matrix is stored in a trans- form variable. For example: transform t ; ... familiar for loop' of all programming languages. for i=0 step 2 until 20 : draw (0,i) ; endfor ; As explained convincingly in Niklaus Wirth's book on algorithms and datastructures, the for ... for i=0 upto n-1 : p[i] endfor p[n] ; After seeing if in action, the following for loop will be no surprise: draw origin for i=0 step 10 until 100 : {down}(i,0) endfor ; This gives the zig zag...
  • 376
  • 593
  • 0