0
  1. Trang chủ >
  2. Ngoại Ngữ >
  3. Tổng hợp >

Chinese word segmentation with a maximum entropy approach

Chinese word segmentation with a maximum entropy approach

Chinese word segmentation with a maximum entropy approach

... another character; another tag for a character that occurs in the middle of a word; another tag for a character that ends a word; and another tag for a character that occurs as a single-character ... incorporate additional dictionary features based on an external word list, and to use extra training data annotated in other word segmentation standards Corpora of different segmentation standards are ... training data of a different segmentation standard Word segmentation accuracy (F-measure) for bakeoff data obtained from adding additional training data from another corpus of a different segmentation...
  • 63
  • 251
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Refined Lexicon Models for Statistical Machine Translation using a Maximum Entropy Approach" pptx

... References Yaser Al-Onaizan, Jan Curin, Michael Jahr, Kevin Knight, John Lafferty, Dan Melamed, David Purdy, Franz J Och, Noah A Smith, and David Yarowsky 1999 Statistical machine translation, final report, ... to adaptive statistical language modeling Computer, Speech and Language, 10:187–228 Christoph Tillmann and Hermann Ney 2000 Word re-ordering and dp-based search in statistical machine translation ... 1997 A DP-based search using monotone alignments in statistical translation In Proc 35th Annual Conf of the Association for Computational Linguistics, pages 289–296, Madrid, Spain, July K .A Papineni,...
  • 8
  • 427
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Chinese Segmentation with a Word-Based Perceptron Algorithm" docx

... bigram w1 w2 single-character word w a word starting with character c and having length l a word ending with character c and having length l space-separated characters c1 and c2 character bigram ... each candidate in the source agenda and puts the generated candidates onto the target agenda After each character is processed, the items in the target agenda are copied to the source agenda, ... that this lazy update method was significantly faster than the naive method The Beam-Search Decoder The decoder reads characters from the input sentence one at a time, and generates candidate segmentations...
  • 8
  • 380
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Exploring Deterministic Constraints: From a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation" ppt

... In AAAI, pages 412–418 C Kruengkrai, K Uchimoto, J Kazama, Y Wang, K Torisawa, and H Isahara 2009 An error-driven word- character hybrid model for joint chinese word segmentation and pos tagging ... on POS taging The proposed constrained taggers as described above can achieve near state-of-art POS tagging accuracy in a much more efficient manner 5.4 Chinese word segmentation Like other tagging ... 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, ACL ’09, pages 513–521 Mitch Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz...
  • 9
  • 425
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "A Cascaded Linear Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging" pdf

... tagging (Collins, 2002), Chinese word segmentation (Ng and Low, 2004; Zhang and Clark, 2007) and so on We trained a character-based perceptron for Chinese Joint S&T, and found that the perceptron ... the POS information and reported the F-measure on segmentation only, while the second performed Joint S&T using POS information and reported the F-measure both on segmentation and on Joint S&T ... higher-order word LM on a larger scale corpus Finally, the word count penalty gives improvement to the cascaded model, 0.13 points on segmentation and 0.16 points on Joint S&T In summary, the cascaded model...
  • 8
  • 445
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging" potx

... F-score performance on the test data Conclusion and Future Work This paper has described a stacked sub -word model for joint Chinese word segmentation and POS tagging We defined a sub -word structure ... 2005) In this work, stacked learning is used to acquire extended training data for sub -word tagging 3.1 Method Architecture In our stacked sub -word model, joint word segmentation and POS tagging ... novel stacked sub -word model Given multiple word segmentations of one sentence, we formally define a sub -word structure that maximizes the agreement of non -word- break positions Based on the sub-word...
  • 10
  • 412
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Parsing the Internal Structure of Words: A New Paradigm for Chinese Word Segmentation" doc

... (Section 5) The output of our parser incorporates word structures naturally Evaluation shows that the model can learn much of the regularity of word structures, and also achieves reasonable accuracy ... treebank and check each of them manually Words with non-trivial structures are thus annotated Finally, we install these small trees of words into the original treebank Whether a word has structures ... for the new Chinese word segmentation paradigm Note that in the proposed output, all words are annotated with their part -of- speech tags This is necessary since part -of- speech plays an important...
  • 10
  • 476
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Automatic Adaptation of Annotation Standards: Chinese Word Segmentation and POS Tagging – A Case Study" potx

... two annotation standards are naturally denoted as source standard and target standard, while the classifiers following the two annotation standards are respectively named as source classifier and ... for Segmentation and Tagging Table also lists the results of annotation adaptation experiments For word segmentation, the model after annotation adaptation (row in upper sub-table) achieves an ... the classification results of several successive characters We leave them as future research Table 2: An example of basic features and guide features of standard -adaptation for word segmentation...
  • 9
  • 404
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Fast Online Training with Frequency-Adaptive Learning Rates for Chinese Word Segmentation and New Word Detection" docx

... purpose fast online training method, ADF The proposed training method requires only a few passes to complete the training • We propose a joint model for Chinese word segmentation and new word detection ... features + New word detection + ADF training (replacing SGD training) The results are shown in Table 259 As we can see, the new features improved performance on both word segmentation and new word ... aiming to improve both segmentation and new word detection: detected new words are added to the word list lexicon in order to improve segmentation Based on our CRF word segmentation system, we...
  • 10
  • 551
  • 0
CUSTOMER SATISFACTION MEASUREMENT MODELS: GENERALISED MAXIMUM ENTROPY APPROACH

CUSTOMER SATISFACTION MEASUREMENT MODELS: GENERALISED MAXIMUM ENTROPY APPROACH

... estimation approach in solving the customer satisfaction models A proposed method can be used t o compute CSI based on statistical information about customer satisfaction measurements model COSTUMER SATISFACTION ... European customer satisfaction index model, which is an economic indicator, represents in Figuer.2 Perceived quality Customer Complaints Perceiv ed Value Customer Expectation Customer Satisfaction Customer ... REMARKS In this article we proposed the generalised maximum entropy (GME) estimation approach to the customer satisfaction models, which provide a better approach as it is meant for situations...
  • 14
  • 549
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Rethinking Chinese Word Segmentation: Tokenization, Character Classification, or Wordbreak Identification" pdf

... co-occurrence Word based model In this model, statistical data about word boundary frequencies for each character is retrieved word- wise For example, in the case of a monosyllabic word only two word boundaries ... contextual background providing information about the likelihood of whether each CB is also a wordbreak (WB) In other words, we model Chinese word segmentation as wordbreak (WB) identification which ... Ik is marked as word boundary B or N for intervals within words 71 When we consider a particular character c1 in W , there is a word boundary at index −1 and We store this information in a mapping...
  • 4
  • 301
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Chinese Word Segmentation without Using Lexicon and Hand-crafted Training Data" pdf

... co-occurrence probability of x and y, and p(x), p(y) are the independent probabilities of x and y respectively As claimed by Church(1991), the larger the mutual information between x and y, the higher the ... v, x, y and w: (1) tsv,y(x) > tsx,w(y ) < (x tends to combine with y, and y tends to combine with x) ==> dts(x:y) > ® ® In this case, x and y attract each other The location between x and y should ... for Word Segmentation" , Proc of the 35th Annual Meeting of ACL and 8th Conference of the European Chapter of ACL, Madrid, 1997 [10] Sun M.S., Shen D.Y., Huang C.N., "CSeg&Tagl.0: A Practical Word...
  • 7
  • 396
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling" doc

... so that word length will have a Poisson distribution whose parameter can now be estimated for a given language and word type We describe this in detail in Section 4.3 Nested Pitman-Yor Language ... probabilities over words ? If a lexicon is nite, we can use a uniform prior G0 (w) = 1/|V | for every word w in lexicon V However, with word segmentation every substring could be a word, thus the ... NPYLM Each line is a word consisting of actual words Conclusion In this paper, we proposed a much more efcient and accurate model for fully unsupervised word segmentation With a combination of...
  • 9
  • 238
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "An Error-Driven Word-Character Hybrid Model for Joint Chinese Word Segmentation and POS Tagging" docx

... wordcharacter hybrid model for joint Chinese word segmentation and POS tagging Our approach has two important advantages The first is robust search space representation based on a hybrid model in which word- level ... levels of information about words and POS tags Let us introduce some notation We write w−1 and w0 for the surface forms of words, where subscripts −1 and indicate the previous and current positions, ... unknown words and augment their representatives As for search space representation, Ng and Low (2004) found that for Chinese, the characterbased model yields better results than the wordbased model...
  • 9
  • 338
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Discriminative Pruning of Language Models for Chinese Word Segmentation" ppt

... Stochastic Finite-state Word- segmentation Algorithm for Chinese Computational Linguistics, 22(3): 377-404 Andreas Stolcke 1998 Entropy-based Pruning of Backoff Language Models In Proc of DARPA News Transcription ... appears in Figure Performance Comparison of Combined Model and KLD Model Conclusions and Future Work A discriminative pruning criterion of n-gram language model for Chinese word segmentation was ... Figure Growing Algorithm for Language Model Pruning 3.2 Discriminative Pruning Criterion Given a Chinese character string S, a word segmentation system chooses a sequence of words W* as the segmentation...
  • 8
  • 294
  • 0

Xem thêm

Từ khóa: a latticebased framework for joint chinese word segmentation pos tagging and parsingadaptive chinese word segmentationconfidencedependent chinese word segmentationjoint chinese word segmentationchinese word segmentation and pos taggingautomatic adaptation of annotation standards chinese word segmentation and pos taggingunified framework of performing chinese word segmentation and part of speech taggingword lattice reranking for chinese word segmentation and partofspeech taggingdeep learning for chinese word segmentation and pos taggingjoint chinese word segmentation pos tagging and parsingbayesian unsupervised word segmentation with nested pitmanyor language modelingimproved statistical machine translation by multiple chinese word segmentationfully unsupervised word segmentation with bve and mdla maximum likelihood approach to neuronal interactions2 cam tu nguyen 2008 vietnamese word segmentation with crfs and svms an investigationchuyên đề điện xoay chiều theo dạngNghiên cứu tổ chức pha chế, đánh giá chất lượng thuốc tiêm truyền trong điều kiện dã ngoạiNghiên cứu tổ hợp chất chỉ điểm sinh học vWF, VCAM 1, MCP 1, d dimer trong chẩn đoán và tiên lượng nhồi máu não cấpNghiên cứu tổ chức chạy tàu hàng cố định theo thời gian trên đường sắt việt namGiáo án Sinh học 11 bài 13: Thực hành phát hiện diệp lục và carôtenôitGiáo án Sinh học 11 bài 13: Thực hành phát hiện diệp lục và carôtenôitGiáo án Sinh học 11 bài 13: Thực hành phát hiện diệp lục và carôtenôitĐỒ ÁN NGHIÊN CỨU CÔNG NGHỆ KẾT NỐI VÔ TUYẾN CỰ LY XA, CÔNG SUẤT THẤP LPWANQuản lý hoạt động học tập của học sinh theo hướng phát triển kỹ năng học tập hợp tác tại các trường phổ thông dân tộc bán trú huyện ba chẽ, tỉnh quảng ninhPhối hợp giữa phòng văn hóa và thông tin với phòng giáo dục và đào tạo trong việc tuyên truyền, giáo dục, vận động xây dựng nông thôn mới huyện thanh thủy, tỉnh phú thọPhát triển du lịch bền vững trên cơ sở bảo vệ môi trường tự nhiên vịnh hạ longNghiên cứu khả năng đo năng lượng điện bằng hệ thu thập dữ liệu 16 kênh DEWE 5000Định tội danh từ thực tiễn huyện Cần Giuộc, tỉnh Long An (Luận văn thạc sĩ)Tìm hiểu công cụ đánh giá hệ thống đảm bảo an toàn hệ thống thông tinChuong 2 nhận dạng rui roGiáo án Sinh học 11 bài 15: Tiêu hóa ở động vậtGiáo án Sinh học 11 bài 14: Thực hành phát hiện hô hấp ở thực vậtGiáo án Sinh học 11 bài 14: Thực hành phát hiện hô hấp ở thực vậtChiến lược marketing tại ngân hàng Agribank chi nhánh Sài Gòn từ 2013-2015QUẢN LÝ VÀ TÁI CHẾ NHỰA Ở HOA KỲ