Báo cáo khoa học: "Improving the IBM Alignment Models Using Variational Bayes" pot

Tài liệu Báo cáo khoa học: "Improving the Scalability of Semi-Markov Conditional Random Fields for Named Entity Recognition" pdf

... compared the result of the recog-nizers with and without ﬁltering using only 2000sentences as the training data. Table 5 shows the result of the total system with different ﬁlteringthresholds. The ... “O”,which indicates a non-named entity. For 98.0% of the named entities in the training data of the sharedtask in the 2004 JNLPBA, the label of the preced-ing entity was “O”.In order to incorporate ... divide the label of “O” into “O-protein” and“O” so that they convey the information on the preceding named entity. Figure 1 shows an ex-ample of this conversion, in which the two labelsfor the...

Báo cáo khoa học: "Improving the Use of Pseudo-Words for Evaluating Selectional Preferences" docx

... Given two nouns, the noun with the higher co-occurrence count with the verb is cho-sen. As with the other models, if the two nounshave the same counts, it randomly guesses. The smoothing model ... select the nearest neighbor, the noun with frequency clos-est to the original. These methods evaluate the range of choices used in previous work. Our ex-periments compare the three.5 Models 5.1 ... data by using the web to ﬁrst‘see’ the data. They evaluated unseen pseudo-words by attempting to ﬁrst observe them in alarger corpus (the Web). One modeling differencewas to disambiguate the...

Báo cáo khoa học: "Improving the Interpretation of Noun Phrases with Cross-linguistic Information" doc

... provided or they didn’t know what interpre-tation to give, they had to tag it as “OTHER-SR”, andrespectively “OTHER-PP”3. The details of the anno-tation task and the observations drawn from there ... corpora and the contribution the features exemplified in one baselineand six versions of the SVM model. The baseline isdefined only for the English part of the NP featureset and measures the the contribution ... by the definite arti-cle or as one of the genitival articles a/ai/ale. For ex-ample, the noun phrase the beauty of the girl is trans-lated as frumuset¸ea fetei (beauty -the girl-gen), andthe...

Báo cáo khoa học: "Improving the Accuracy of Subcategorizations Acquired from Corpora" pdf

... lexicon of the two grammars into the training SCFs and the testing SCFs. The words in the testing SCFs wereincluded in the acquired SCFs. When I applymy method to the acquired SCFs using the train-ing ... cm. I then initialize the num-ber of clusters k to the number of cm.I ﬁnally update the acquired SCFs using the ob-tained clusters and the conﬁdence values of SCFsin this order. I call the ... expectations.8Precision=Correct SCFs for the words in the resulting SCFsAll SCFs for the words in the resulting SCFsRecall =Correct SCFs for the words in the resulting SCFsAll SCFs for the words in the test SCFs...

Tài liệu Báo cáo khoa học: "Guiding Statistical Word Alignment Models With Prior Knowledge" pdf

... asso-ciations among all competing hypothesis. The morereasonable constraints are imposed on this process, the easier the task would become. For instance, the most relaxed IBM Model-1, which assumes that ... Constrained Word Alignment Models The framework that we propose to incorporate sta-tistical constraints into word alignment models isgeneric. It can be applied to complicated models such IBM Model-4 ... as a “term”. The number of occurrences of the source word e in the document f is deﬁned as the expected numberof times that f generates e in the parallel corpusunder the word alignment model....

Tài liệu Báo cáo khoa học: "ON THE SYNTACTIC-SEMANTIC OF BOUND ANAPHORA ANALYSIS" potx

... about the right predictions: It is not the antecedent which must c-command the pronoun, but the quantificational NP, the host operator of the antecedent's discourse referent. In (4), the ... Since the discourse referent provided by a book takes its place at the top level of the restriction part of the every-NP, the in- definite should count as a proper antecedent for the pronoun ... slightly different way. The two readings of (7) do not differ in the relative scope order of two quantifiers. Rather, the difference is that on the narrow scope reading, the discourse referent...

Báo cáo khoa học: "Improving On-line Handwritten Recognition using Translation Models in Multimodal Interactive Machine Translation" docx

... direct IBM1 and IBM2 models, and inverse IBM1 -inv and IBM2 -inv models with the inverse dictionary from Eq. 9.However, a more interesting set up than using lan-guage models or translation models ... respect to IBM models. However, linear interpolated models perform the best. In the Spanish test set the result isnot better that the IBM2 since the linear parametersare clearly over-ﬁtted. Other ... This may be due to the fact that the IMT and the on-line HTR systemsuse the same language models (5-gram in the caseof the IMT system). Hence, if the IMT has failed topredict the correct word...

Báo cáo khoa học: "Determining the Specificity of Terms using Compositional and Contextual Information" pptx

... database using the disease names as quires. Therefore, all the abstracts are related to some of the disease names. The set consists of about 170,000 abstracts (20,000,000 words). The abstracts ... information. The methods are formulated as in-formation theory like measures. Because the methods don't use domain specific information, they are easily adapted to terms of other domains. ... distribution of co-occurrence words of the terms, the distribution of predicates which have the terms as arguments, and the distribution of modi-fiers of the terms are contextual information....

Báo cáo khoa học: "Improving data-driven dependency parsing using large-scale LFG grammars" pptx

... with the complex label corresponding to the concatenationof the labels from the multiple head attachments(Complex). The converted dependency analysis inFigure 1 shows the f-structure and the ... from the output of the deep grammars we wish to capture as much of the precise, linguistic generalizations embodied in the grammars as possible, whilst keeping with the re-quirements posed by the ... (‘f-structure’).In the work described in this paper, we employ the XLE platform using the grammars availablefor English and German from the ParGram project(Butt et al., 2002). In order to increase the cover-age...

Báo cáo khoa học: "Resolving Personal Names in Email Using Context Expansion" pot

... from the whole collection and build the identity models. The ﬁrst step in the resolution process is to determine the list of identity models that are viable candidates as the true referent. For the ... Fig-ure 1. In the network, the observed mention l isdistributed conditionally on both the identity c and the name-type t. p(c) is the prior probability of ob-serving the identity c in the collection. ... ranks the candidates based on the estimated probability ofhaving been mentioned. Formally, we seek to esti-mate the probability p(c|m) that a potential candi-date c is the one referred to by the...

Xem thêm