Báo cáo khoa học: "Identifying High-Impact Sub-Structures for Convolution Kernels in Document-level Sentiment Classiﬁcation" doc

Báo cáo khoa học: "A Non-negative Matrix Tri-factorization Approach to Sentiment Classiﬁcation with Lexical Prior Knowledge" potx

... methods have been widely used in sentiment analysis. In particular, the use of SVMs in (Pang et al., 2002) initially sparked interest in using machine learning methods for sentiment classiﬁcation. ... approaches.Does a one-time effort in compiling a domain-independent dictionary and using it for different sentiment tasks pay off in comparison to simplyusing unsupervised methods? In our case, matrixtri-factorization ... domain in mind; which is further motivation for using train-ing examples and unlabeled data to learn domainspeciﬁc connotations.Lexical knowledge in the form of the polarityof terms in this...

Báo cáo khoa học: "A Novel Discourse Parser Based on Support Vector Machine Classiﬁcation" docx

... also evaluated performance usingthe 52 doubly-annotated files present in the RST-DT as test set (see Table 3). In each case, theremaining 340–350 files are used for training. For each corpus evaluation, ... are obtainedthrough automated grid search with n-fold cross-validation (Staelin, 2003) on the training corpus,while a separate test set is used for performanceevaluation. In training mode, ... usingunavailable documents for Marcu’s and a selectionof 21 documents from the RST-DT (distinctfrom RST-DT’s test set) for LeThanh’s. Wetherefore retrained and evaluated our classifier,using...

Tài liệu Báo cáo khoa học: "Translation Model Adaptation for Statistical Machine Translation with Monolingual Topic Information" doc

... easily. In this paper, we propose anovel approach for translation model adapta-tion by utilizing in- domain monolingual top-ic information instead of the in- domain bilin-gual corpora, which incorporates ... data, including 33 docu-ments with 666 sentences, is our test set.To obtain various topic distributions for the out-of-domain training corpus and the in- domain mono-lingual corpora in the ... (word|topic) during training. Besides, HTMMderives inherent topics in sentences rather than in documents, so we can easily obtain the sentence-topic distribution P (topic|sentence) in trainingcorpus....

Báo cáo khoa học: "An Extensible Architecture for Integrating Natural Language Processing Techniques with Wikis" docx

... challenges by using wikipages as input/output documents. For instance,76by running the sentiment analysis component rightfrom within the wiki, its output can be written backto the originating wiki ... high-lighted in green.analysis by a link to a page providing related in- formation such as evaluation datasets. Wikulu sup-ports users in this tedious task by automatically sug-gesting links. Link ... draw-backs: (a) it is inconvenient to copy existing datainto a custom input format which can be fed into theNLP system, and (b) the textual output does not al-low presenting the results in a visually...

Báo cáo khoa học: "Bypassed Alignment Graph for Learning Coordination in Japanese Sentences" doc

... listedabove are concerned mainly with scope disam-biguation, reflecting the fact that detecting thepresence of coordinations in a sentence (Task 1)is straightforward in English. Indeed, nearly 100%precision ... only in the first word. Bothcontain a particle to, which is one of the most fre-quent coordination markers in Japanese—but onlythe first sentence contains a coordinate structure.Pattern matching ... structure in a sentence is repre-sented by a complete path starting from the top-left (initial) node and arriving at the bottom-right(terminal) node in its alignment graph. Each arc in this...

Báo cáo khoa học: "Statistical Machine Translation for Query Expansion in Answer Retrieval" pptx

... pLM(synI1)λLM For estimation of the feature weightsλ deﬁned in equation (4) we employed minimum error rate(MER) training under the BLEU measure (Och,2003). Training data for MER training were ... practice imagination concentration information consciousness different meditationrelaxationqa-translation (-): birth industrial induced inducesparaphrasing (-): way workers inducing employment ... paraphrasing de-ploys large amounts of bilingual phrases as high-coverage information source for synonym ﬁnding.Furthermore, both approaches take the entire querycontext into account when proposing...

Báo cáo khoa học: "Machine-learned contexts for linguistic operations in German sentence realization" doc

... complete, spanning parse: 85.14% of the sentences in the training and parameter tuning set, and 84.59% in the blind test set fall into that category. Most sentences yield more than one training case. ... a machine Computational Linguistics (ACL), Philadelphia, July 2002, pp. 25-32. Proceedings of the 40th Annual Meeting of the Association for learning approach. The linguistically informed ... clauses (e.g., in imperatives), the finite verb is in initial position. Verb-second sentences contain one constituent preceding the finite verb, in the so-called “pre-field”. The finite verb...

Báo cáo khoa học: "A global model for joint lemmatization and part-of-speech prediction" doc

... English indicating presenttense third person singular verb and A–FS-N for Bulgarian indicating a feminine singular adjective in indeﬁnite form. In this work we predict onlymain POS tags for the ... encouraging re-using the same lemma for different words. An ad-ditional feature ﬁres for every distinct lemma, in effect counting the number of assigned lemmas.5.2 Training and inferenceSince ... onlemmatization performance in two settings: (i)when no tags are used in training or testing by thetransducer, and (ii) when correct tags are used in training and tags predicted by the tagging modelare...

Báo cáo khoa học: "ConsentCanvas: Automatic Texturing for Improved Readability in EndUser License Agreements" pot

... command line interface. It then passes this document to four independent submodules for analysis. Each submodule stores the initial and final character positions of a string selected from within ... passed to our rendering system, which inserts the corresponding HTML5 tags at the posi-tions in original plaintext EULA. We append a header to the output document to include the linked stylesheet ... variable-length phrase finding module only incorporates a single correlation function. More will be added, drawing in particular from those documented by Kim and Chan (2004). Machine learning techniques...

Báo cáo khoa học: "Exploiting Feature Hierarchy for Transfer Learning in Named Entity Recognition" ppt

... call tuning), using the previ-ously trained prior for regularization. If we are un-able to ﬁnd a match between features in the trainingand tuning datasets (for instance, if a word appears in the ... trained classiﬁer tomake predictions.246 In the paradigm of inductive learning,(Xtrain, Ytrain) are known, while both XtestandYtestare completely hidden during training time. In this ... cannot be intermin-gled indiscriminately. Line c shows the same com-bination of MUC6 and MUC7, only this time thedatasets have been combined using the HIER prior. In this case, the performance...

Xem thêm