0

scaling distributional similarity to large corpora

Báo cáo khoa học:

Báo cáo khoa học: "Scaling Phrase-Based Statistical Machine Translation to Larger Corpora and Longer Phrases" pptx

Báo cáo khoa học

... declined to confirm that spain declined to aid moroccodeclined to confirm that spain declined to aid morocco to confirm that spain declined to aid moroccoconfirm that spain declined to aid moroccothat ... fre-8361950472 to aid morocco to confirm that spain declined to aid moroccomoroccospain declined to aid moroccodeclined to confirm that spain declined to aid moroccodeclined to aid moroccoconfirm ... moroccothat spain declined to aid moroccospain declined to aid moroccodeclined to aid morocco to aid moroccoaid moroccomoroccospain declined to confirm that spain declined aidto morocco0 1 2 3...
  • 8
  • 316
  • 0
Tài liệu More Than a Message: Framing Public Health Advocacy to Change Corporate Practices docx

Tài liệu More Than a Message: Framing Public Health Advocacy to Change Corporate Practices docx

Sức khỏe giới tính

... 52)Similar to a frame around a painting, the news frame draws attention to a specific pictureand separates told from untold pieces of the story. Elements in the story are said to be inthe ... Publicationstime. A landscape story pulls back the lens to take a broader view. It may include peopleand events, but it connects them to the larger social and economic forces. News storiesframed in such ... should strive to make stories about the landscape as vivid and interesting as theportrait. This is not easy to do but is crucial. The framing challenge for public health edu-cators is to create...
  • 17
  • 352
  • 0
Tài liệu A Comparison of Approaches to Large-Scale Data Analysis pdf

Tài liệu A Comparison of Approaches to Large-Scale Data Analysis pdf

Cơ sở dữ liệu

... required us to (1) write to a custom tuple object us-ing Hadoop’s API, (2) modify our data loader program to transformrecords to compressed and serialized custom tuples, and (3) refac-tor each ... required DBMS-X to be running in order to adjust them, it was unfortunately easy to lock ourselves out with nofailsafe mode to restore to a previous state.Vertica was relatively easy to install ... asstacked bars, where the bottom segment represents the time it took to execute the UDF/parser and load the data into the table and thetop segment is the time to execute the actual query. DBMS-X...
  • 14
  • 923
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Finding Synonyms Using Automatic Word Alignment and Measures of Distributional Similarity" pdf

Báo cáo khoa học

... discarded. We refer to thisby the term minimum row frequency. The cutoff isused to make the feature space manageable and to reduce noise in the data.35.1 Distributional Similarity Based onSyntactic ... wouldlike to make use of the distributional similarity score to set a threshold that will remove a lot oferrors. The last thing that remains for future workis to find a more adequate way to combine ... Methodology3.1 Measuring Distributional Similarity An increasingly popular method for acquiring se-mantically similar words is to extract distribution-ally similar words from large corpora. The under-lying...
  • 8
  • 516
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Measures of Distributional Similarity" ppt

Báo cáo khoa học

... test triple tokens in the set, and a tie results when both alternatives are deemed equally likely by the language model in question. To perform the evaluation, we incorporated each similarity ... similarity function into a decision rule as follows. For a given similarity measure f and neighborhood size k, let 3f, k(n) denote the k most similar words to n according to f. We define the ... k most similar words according to f are on the whole better predictors than the k most similar words according to g; hence, f induces an inherently better similarity ranking for distance-weighted...
  • 8
  • 338
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Distributional Similarity Models: Clustering Neighbors" doc

Báo cáo khoa học

... belong mostly to the same cluster (dotted ellipse), the two nearest neighbors to A are not the nearest two neighbors to B. like to control the degree of compression of C relative to N, that ... previous two sections, we presented two complementary paradigms for incorporat- ing distributional similarity information into cooccurrence probability estimates. Now, one cannot always draw ... large number of clus- ters in the distributional clustering case results in only the closest centroids contributing sig- nificantly to the cooccurrence probability esti- mate, whereas a large...
  • 8
  • 268
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Finding Parts in Very Large Corpora" pdf

Báo cáo khoa học

... tempered to take into account the quantity of data that supports its conclusion. To put this another way, we want to pick (w,p) pairs that have two properties, p(w I P) is high and [ w, pl is large. ... us to produce better lists, both because the statistics we are currently collecting would be more accurate, but also because larger num- bers would allow us to find other reliable indicators. ... the machines at our disposal, so still larger corpora would not be out of the question. Finally, as noted above, Hearst [2] tried to find parts in corpora but did not achieve good results....
  • 8
  • 351
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Reducing semantic drift with bagging and distributional similarity" pdf

Báo cáo khoa học

... letL1 ncorrespond to the first n terms extracted intoL, and L(N−m) Ncorrespond to the last m termsadded to LN. In an iteration, let t be the next can-didate term to be added to the lexicon.We ... seman-tic drift. We integrate a distributional similarity filter directly into WMEB (McIntosh and Curran,2008). This filter judges whether a new term ismore similar to the earlier or most recently ... lexical-syntactic patterns to labelclusters of distributionally similar terms. Mirkin etal. (2006) used 11 patterns, and the distributional similarity score of each pair of terms, to constructfeatures...
  • 9
  • 339
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Using lexical and relational similarity to classify semantic relations" pptx

Báo cáo khoa học

... de-scribes two complementary approaches for using distributional information extracted from corpora to calculate noun pair similarity. The first model of pair similarity is based onstandard methods for ... According to thislexical similarity model, word pairs (w1, w2) and(w3, w4) are judged similar if w1is similar to w3and w2is similar to w4. Given a measure wsimof word-word similarity, ... con-text, a frequency cutoff to eliminate less commonsubsequences and the Gaussian kernel to comparevectors. While we cannot compare methods di-rectly as we do not possess the large corpus of5...
  • 9
  • 416
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Finding Word Substitutions Using a Distributional Similarity Baseline and Immediate Context Overlap" potx

Báo cáo khoa học

... where antonyms are returned bythe system; in those cases, a very high distribu-tional similarity actually corresponds to oppositemeanings. Producing an output ranked accord-ing to distributional ... Information Content to Evaluate Semantic Similarity in a Taxonomy. InProceedings of IJCAI–95, 1995.Idan Szpektor, Hristo Tanev, Ido Dagan and Bonaven-tura Coppola. 2004. Scaling Web-Based Acquisitionof ... probability that they ap-pear together.PMI is known to have a bias towards less fre-quent events. In order to counterbalance that bias,we apply a simple logarithm function to the resultsas a discount:d...
  • 9
  • 248
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "CS NIPER Annotation-by-query for non-canonical constructions in large corpora" pdf

Báo cáo khoa học

... analysis of large corpora due to a relatively low frequency of instances and whoseidentification requires expert knowledge to distin-guish them from other similar constructions. Ourtool integrates ... expertknowledge to identify instances of linguisticphenomena that are hard to identify by meansof existing automatic annotation tools.1 IntroductionLinguistic annotation by means of automatic pro-cedures, ... knowledge to be annotated. We plan to integrate further automatic annotations and querypossibilities to support such further use-cases.AcknowledgmentsWe would like to thank Erik-Lˆan Do Dinh,...
  • 6
  • 356
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Syntax is from Mars while Semantics from Venus! Insights from Spectral Analysis of Distributional Similarity Networks" ppt

Báo cáo khoa học

... theother eigenvectors corresponding to the signifi-cantly high eigenvalues are important classifica-tory dimensions.Fig 2 shows the plot of the first eigenvectorcomponent (aka eigenvector centrality) ... initial attempt to answer this fundamental andintriguing question, whereby we construct the syn-tactic and semantic distributional similarity net-work (DSN) and analyze their spectrum to un-derstand ... eigenvalue tells us to what extent therows of the adjacency matrix are correlated andtherefore, the corresponding eigenvector is not adimension pointing to any classificatory basis ofthe words....
  • 4
  • 250
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Exploring Distributional Similarity Based Models for Query Spelling Correction" docx

Báo cáo khoa học

... use the evidence of distributional similarity to achieve better spelling correction accuracy. We present two methods that are able to take advan-tage of distributional similarity information. ... that is able to leverage all available features, which could include (but not limited to) tradi-tional character string-based typographical simi-larity, phonetic similarity and distributional ... output to drop by around 2%. The work of Ahmad and Kondrak (2005) tried to employ an unsupervised approach to error model estimation. They designed an EM (Expectation Maximization) algorithm to...
  • 8
  • 309
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Integrating Pattern-based and Distributional Similarity Methods for Lexical Entailment Acquisition" doc

Báo cáo khoa học

... calculated relatively to the total number of cor-rect entailment pairs acquired by both methods together. METHOD P R F Pattern-based 0.44 0.61 0.51 Distributional Similarity 0.33 0.53 ... target noun a scored list of up to a few hundred words with positive distributional similarity scores. Next we need to determine an optimal thresh-old for the similarity score, considering ... investigate automatic acquisi-tion of the lexical entailment relation. For the distributional similarity component we employ the similarity scheme of (Geffet and Dagan, 2004), which was shown to yield...
  • 8
  • 355
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Discovering Relations among Named Entities from Large Corpora" pot

Báo cáo khoa học

... we have set a frequency threshold to re-move those pairs.3.4 Context similarity among NE pairsWe adopt a vector space model and cosine similarity in order to calculate the similarities between ... richlyannotated corpora which are tagged with relation in-stances. The biggest problem with this approach isthat it takes a great deal of time and effort to prepareannotated corpora large enough to apply ... context vector is ex-tremely small due to a lack of content words, the co-sine similarity between the vector and others mightbe unreliable. So, we also define a norm thresholdin advance to eliminate...
  • 8
  • 283
  • 0

Xem thêm