Báo cáo khoa học: "Distributional Similarity vs PU Learning for Entity Set Expansion" doc

Báo cáo khoa học: "Distributional Similarity vs. PU Learning for Entity Set Expansion" doc

... 11-16 July 2010.c2010 Association for Computational LinguisticsDistributional Similarity vs. PU Learning for Entity Set Expansion Xiao-Li Li Institute for Infocomm Research, 1 Fusionopolis ... distributional similarity is a classic technique for entity set expansion, this paper showed that PU learning performs considerably better on our diverse corpora. In addition, PU learning also outperforms ... Bayesian Sets (Ghahramani and Heller, 2005), which was designed specifically for set expan-sion. It also does not perform as well as PU learning. We will explain why PU learning per-forms better...

Tài liệu Báo cáo khoa học: "Distributional Similarity Models: Clustering Neighbors" doc

... to train k. Then, the performance of each similarity- based model was evaluated on the test triples for a sequence of settings for k. We expected that clustering performance with respect to ... train-test splits. For each split, test triples were created from the held-out test set. Each model used the training set to calculate all basic quantities (e.g., p(vln ) for each verb and ... convergent iterative reestimation process for p(glc), p(YlC ) and p(C). These distributions form the model for the given/3. It is easy to see that for/ 3 = 0, p(nlc ) does not depend on...

Tài liệu Báo cáo khoa học: "A Maximum Expected Utility Framework for Binary Sequence Labeling" doc

... Association of Computational Linguistics, pages 736–743,Prague, Czech Republic, June 2007.c2007 Association for Computational LinguisticsA Maximum Expected Utility Framework for Binary Sequence ... afﬁliation: Google Inc. Former afﬁliation: Centerof Computational Learning Systems, Columbia University.the prediction or decoding task. In general, decodingcan be a hard computational problem ... Another reason is that theF-scoreis only meaningful for comparing two (multi)sets ortwo binary sequences, but the notation for multisetsis slightly more awkward.All tasks considered here...

Tài liệu Báo cáo khoa học: "GEMINI: A NATURAL LANGUAGE SYSTEM FOR SPOKEN-LANGUAGE UNDERSTANDING*" doc

... constituent parser to populate a chart with edges containing syntactic, seman- tic, and logical form information. Then, a second utterance parser is used to apply a second set of syntactic and ... prediction for nongapped categories. This limited form of left-context constraint greatly reduces the total number of edges built for a very low overhead. In the 5875-utterance training set, the ... allows us to enforce a very coarse form of parse preferences (for example, prefering complete sentences to sentence frag- ments). These coarse preferences could also be enforced by the parse...

Báo cáo khoa học: KCTD5, a putative substrate adaptor for cullin3 ubiquitin ligases docx

... domain.As the BTB domain is responsible for homo-oligo-merization in BTB proteins [3], we addressed whetherKCTD5 might form homo-oligomers. For this purpose,HEK293 cells were transfected with ... Journal compilation ª 2008 FEBSKCTD5, a putative substrate adaptor for cullin3ubiquitin ligasesYolanda Bayoń1, Antonio G. Trinidad1, MarıáL. de la Puerta1, Marıádel Carmen Rodrı´guez1,Jori ... POZ ⁄ BTB protein that is a putativenew substrate-specific adaptor for CUL3-based E3ligases.Results and DiscussionKCTD5 was identified in a yeast two-hybrid screening for the dual-specificity phosphatase...

Báo cáo khoa học: "Domain Adaptation with Active Learning for Word Sense Disambiguation" pdf

... Association of Computational Linguistics, pages 49–56,Prague, Czech Republic, June 2007.c2007 Association for Computational LinguisticsDomain Adaptation with Active Learning for Word Sense ... active learning for domain adaptation for WSD.A similar work is the recent research by Chen et al.(2006), where active learning was used successfullyto reduce the annotation effort for WSD ... (2004) employed active learning for another set of 5 verbs. However, theirwork only investigated the use of active learning toreduce the annotation effort necessary for WSD, but55...

Báo cáo khoa học: "SenseRelate::TargetWord – A Generalized Framework for Word Sense Disambiguation" doc

... disambiguation needs.2.1 Format FilterThe filter takes as input file(s) annotated in theSENSEVAL-2 lexical sample format, which is anXML–based format that has been used for both theSENSEVAL-2 ... textpreprocessing modules, each of which perform atransformation on the input words. For example, theCompound Detection Module identifies sequencesof tokens that form compound words that are knownas ... underscore–connected form(“NewYork City”).Multiple preprocessing modules can be chainedtogether, the output of one connected to the input ofthe next, to form a single preprocessing stage. For example,...

Báo cáo khoa học: "Classifying Biological Full-Text Articles for Multi-Database Curation" doc

... representation to the set doesn’t improve the cross-validation performance. For classifying the documents with better features, we run the algorithm twice. We first start with an empty set and obtain ... performance, which implies that the inferior ones may contain important exclusive information. The cross-validation performance fairly predicts the performance on the test data, except for ... with an initial set containing some or zero representation. For each iteration, we add one representation to the set by picking the one that enhances the cross-validation performance the most....

Báo cáo khoa học: "On-line Language Model Biasing for Statistical Machine Translation" docx

... WorkExisting methods for target LM biasing for SMTrely on information retrieval to select a comparablesubset from the training corpus. A foreground LMestimated from this subset is interpolated ... exists in LM adaptation for SMT. Snover et al. (2008) used a cross-lingual infor-mation retrieval (CLIR) system to select a subset oftarget documents “comparable” to the source docu-ment; bias LMs ... multiple (four) refer-ences for each test input. The Dari and Pashto mono-lingual corpora for LM training are a superset of tar-get sentences from the parallel training corpus, con-sisting of additional...

Báo cáo khoa học: "A Morphologically Sensitive Clustering Algorithm for Identifying Arabic Roots" docx

... shows the results of the Two-stagealgorithm for our data sets. The maximallyeffective cut of point for all sets lies closer.Figures for the first set have to be treated withcaution. The perfect ... phenomenaon which we wanted to concentrate.Table 8: Two-stage Algorithm Test ResultsData set Set 1 Set 2 Set 3 Set 4 Set 5Benchmark: Total Manual Clusters (A)9 267 337 151 190 Multi-word (B)9 130 ... aim for aconservative gauge of how Adamson’salgorithm can yield more exact clusters from afull range of problematic data.Table 3: Adamson's Algorithm Test ResultsData set Set 1 Set 2 Set...

Xem thêm