Chinese word segmentation with a maximum entropy approach

... another character; another tag for a character that occurs in the middle of a word; another tag for a character that ends a word; and another tag for a character that occurs as a single-character ... incorporate additional dictionary features based on an external word list, and to use extra training data annotated in other word segmentation standards Corpora of different segmentation standards are ... training data of a different segmentation standard Word segmentation accuracy (F-measure) for bakeoff data obtained from adding additional training data from another corpus of a different segmentation...

Tài liệu Báo cáo khoa học: "Reﬁned Lexicon Models for Statistical Machine Translation using a Maximum Entropy Approach" pptx

... References Yaser Al-Onaizan, Jan Curin, Michael Jahr, Kevin Knight, John Lafferty, Dan Melamed, David Purdy, Franz J Och, Noah A Smith, and David Yarowsky 1999 Statistical machine translation, ﬁnal report, ... to adaptive statistical language modeling Computer, Speech and Language, 10:187–228 Christoph Tillmann and Hermann Ney 2000 Word re-ordering and dp-based search in statistical machine translation ... 1997 A DP-based search using monotone alignments in statistical translation In Proc 35th Annual Conf of the Association for Computational Linguistics, pages 289–296, Madrid, Spain, July K .A Papineni,...

Báo cáo khoa học: "Chinese Segmentation with a Word-Based Perceptron Algorithm" docx

... bigram w1 w2 single-character word w a word starting with character c and having length l a word ending with character c and having length l space-separated characters c1 and c2 character bigram ... each candidate in the source agenda and puts the generated candidates onto the target agenda After each character is processed, the items in the target agenda are copied to the source agenda, ... that this lazy update method was significantly faster than the naive method The Beam-Search Decoder The decoder reads characters from the input sentence one at a time, and generates candidate segmentations...

Báo cáo khoa học: "Exploring Deterministic Constraints: From a Constrained English POS Tagger to an Efﬁcient ILP Solution to Chinese Word Segmentation" ppt

... In AAAI, pages 412–418 C Kruengkrai, K Uchimoto, J Kazama, Y Wang, K Torisawa, and H Isahara 2009 An error-driven word- character hybrid model for joint chinese word segmentation and pos tagging ... on POS taging The proposed constrained taggers as described above can achieve near state-of-art POS tagging accuracy in a much more efﬁcient manner 5.4 Chinese word segmentation Like other tagging ... 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, ACL ’09, pages 513–521 Mitch Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz...

Báo cáo khoa học: "A Cascaded Linear Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging" pdf

... tagging (Collins, 2002), Chinese word segmentation (Ng and Low, 2004; Zhang and Clark, 2007) and so on We trained a character-based perceptron for Chinese Joint S&T, and found that the perceptron ... the POS information and reported the F-measure on segmentation only, while the second performed Joint S&T using POS information and reported the F-measure both on segmentation and on Joint S&T ... higher-order word LM on a larger scale corpus Finally, the word count penalty gives improvement to the cascaded model, 0.13 points on segmentation and 0.16 points on Joint S&T In summary, the cascaded model...

Báo cáo khoa học: "A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging" potx

... F-score performance on the test data Conclusion and Future Work This paper has described a stacked sub -word model for joint Chinese word segmentation and POS tagging We deﬁned a sub -word structure ... 2005) In this work, stacked learning is used to acquire extended training data for sub -word tagging 3.1 Method Architecture In our stacked sub -word model, joint word segmentation and POS tagging ... novel stacked sub -word model Given multiple word segmentations of one sentence, we formally deﬁne a sub -word structure that maximizes the agreement of non -word- break positions Based on the sub-word...

Báo cáo khoa học: "Parsing the Internal Structure of Words: A New Paradigm for Chinese Word Segmentation" doc

... (Section 5) The output of our parser incorporates word structures naturally Evaluation shows that the model can learn much of the regularity of word structures, and also achieves reasonable accuracy ... treebank and check each of them manually Words with non-trivial structures are thus annotated Finally, we install these small trees of words into the original treebank Whether a word has structures ... for the new Chinese word segmentation paradigm Note that in the proposed output, all words are annotated with their part -of- speech tags This is necessary since part -of- speech plays an important...

Báo cáo khoa học: "Automatic Adaptation of Annotation Standards: Chinese Word Segmentation and POS Tagging – A Case Study" potx

... two annotation standards are naturally denoted as source standard and target standard, while the classifiers following the two annotation standards are respectively named as source classifier and ... for Segmentation and Tagging Table also lists the results of annotation adaptation experiments For word segmentation, the model after annotation adaptation (row in upper sub-table) achieves an ... the classification results of several successive characters We leave them as future research Table 2: An example of basic features and guide features of standard -adaptation for word segmentation...

Báo cáo khoa học: "Fast Online Training with Frequency-Adaptive Learning Rates for Chinese Word Segmentation and New Word Detection" docx

... purpose fast online training method, ADF The proposed training method requires only a few passes to complete the training • We propose a joint model for Chinese word segmentation and new word detection ... features + New word detection + ADF training (replacing SGD training) The results are shown in Table 259 As we can see, the new features improved performance on both word segmentation and new word ... aiming to improve both segmentation and new word detection: detected new words are added to the word list lexicon in order to improve segmentation Based on our CRF word segmentation system, we...

CUSTOMER SATISFACTION MEASUREMENT MODELS: GENERALISED MAXIMUM ENTROPY APPROACH

... estimation approach in solving the customer satisfaction models A proposed method can be used t o compute CSI based on statistical information about customer satisfaction measurements model COSTUMER SATISFACTION ... European customer satisfaction index model, which is an economic indicator, represents in Figuer.2 Perceived quality Customer Complaints Perceiv ed Value Customer Expectation Customer Satisfaction Customer ... REMARKS In this article we proposed the generalised maximum entropy (GME) estimation approach to the customer satisfaction models, which provide a better approach as it is meant for situations...

Tài liệu Báo cáo khoa học: "Rethinking Chinese Word Segmentation: Tokenization, Character Classiﬁcation, or Wordbreak Identiﬁcation" pdf

... co-occurrence Word based model In this model, statistical data about word boundary frequencies for each character is retrieved word- wise For example, in the case of a monosyllabic word only two word boundaries ... contextual background providing information about the likelihood of whether each CB is also a wordbreak (WB) In other words, we model Chinese word segmentation as wordbreak (WB) identiﬁcation which ... Ik is marked as word boundary B or N for intervals within words 71 When we consider a particular character c1 in W , there is a word boundary at index −1 and We store this information in a mapping...

Tài liệu Báo cáo khoa học: "Chinese Word Segmentation without Using Lexicon and Hand-crafted Training Data" pdf

... co-occurrence probability of x and y, and p(x), p(y) are the independent probabilities of x and y respectively As claimed by Church(1991), the larger the mutual information between x and y, the higher the ... v, x, y and w: (1) tsv,y(x) > tsx,w(y ) < (x tends to combine with y, and y tends to combine with x) ==> dts(x:y) > ® ® In this case, x and y attract each other The location between x and y should ... for Word Segmentation" , Proc of the 35th Annual Meeting of ACL and 8th Conference of the European Chapter of ACL, Madrid, 1997 [10] Sun M.S., Shen D.Y., Huang C.N., "CSeg&Tagl.0: A Practical Word...

Báo cáo khoa học: "Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling" doc

... so that word length will have a Poisson distribution whose parameter can now be estimated for a given language and word type We describe this in detail in Section 4.3 Nested Pitman-Yor Language ... probabilities over words ? If a lexicon is nite, we can use a uniform prior G0 (w) = 1/|V | for every word w in lexicon V However, with word segmentation every substring could be a word, thus the ... NPYLM Each line is a word consisting of actual words Conclusion In this paper, we proposed a much more efcient and accurate model for fully unsupervised word segmentation With a combination of...

Báo cáo khoa học: "An Error-Driven Word-Character Hybrid Model for Joint Chinese Word Segmentation and POS Tagging" docx

... wordcharacter hybrid model for joint Chinese word segmentation and POS tagging Our approach has two important advantages The ﬁrst is robust search space representation based on a hybrid model in which word- level ... levels of information about words and POS tags Let us introduce some notation We write w−1 and w0 for the surface forms of words, where subscripts −1 and indicate the previous and current positions, ... unknown words and augment their representatives As for search space representation, Ng and Low (2004) found that for Chinese, the characterbased model yields better results than the wordbased model...

Báo cáo khoa học: "Discriminative Pruning of Language Models for Chinese Word Segmentation" ppt

... Stochastic Finite-state Word- segmentation Algorithm for Chinese Computational Linguistics, 22(3): 377-404 Andreas Stolcke 1998 Entropy-based Pruning of Backoff Language Models In Proc of DARPA News Transcription ... appears in Figure Performance Comparison of Combined Model and KLD Model Conclusions and Future Work A discriminative pruning criterion of n-gram language model for Chinese word segmentation was ... Figure Growing Algorithm for Language Model Pruning 3.2 Discriminative Pruning Criterion Given a Chinese character string S, a word segmentation system chooses a sequence of words W* as the segmentation...

Xem thêm