0

automatic adaptation of annotation standards chinese word segmentation and pos tagging

Báo cáo khoa học:

Báo cáo khoa học: "Automatic Adaptation of Annotation Standards: Chinese Word Segmentation and POS Tagging – A Case Study" potx

Báo cáo khoa học

... efficacy of thismethod in the context of Chinese word segmentation and part -of- speech tagging, where no segmentation and POS tagging standards are widely accepted due to thelack of morphology in Chinese. ... of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, pages 522–530,Suntec, Singapore, 2-7 August 2009.c2009 ACL and AFNLP Automatic Adaptation of Annotation Standards: Chinese ... method we choose Chinese word segmentation and part -of- speech tagging, where the problem of incompatible an-notation standards is one of the most evident: sofar no segmentation standard is widely...
  • 9
  • 404
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "An Error-Driven Word-Character Hybrid Model for Joint Chinese Word Segmentation and POS Tagging" docx

Báo cáo khoa học

... Joint Chinese Word Segmentation and POS Tagging Canasai Kruengkrai†‡ and Kiyotaka Uchimoto‡ and Jun’ichi Kazama‡Yiou Wang‡ and Kentaro Torisawa‡ and Hitoshi Isahara†‡†Graduate School of ... discriminative word- character hybrid model for joint Chi-nese word segmentation and POS tagging. Our word- character hybrid model offershigh performance since it can handle bothknown and unknown words. ... Liu, and Yajuan L¨u.2008a. A cascaded linear model for joint chinese word segmentation and part -of- speech tagging. InProceedings of ACL.Wenbin Jiang, Haitao Mi, and Qun Liu. 2008b. Word lattice...
  • 9
  • 338
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Joint Word Segmentation and POS Tagging using a Single Perceptron" docx

Báo cáo khoa học

... word w11the starting characters c1 and c2 of two con-secutive words12the ending characters c1 and c2 of two con-secutive words13a word of length l with previous word w14a word of ... UK{yue.zhang,stephen.clark}@comlab.ox.ac.ukAbstractFor Chinese POS tagging, word segmentation is a preliminary step. To avoid error propa-gation and improve segmentation by utilizing POS information, segmentation and tagging can be ... proposed a hy-brid model for word segmentation and POS tagging using an HMM-based approach. Word information isused to process known-words, and character infor-mation is used for unknown words...
  • 9
  • 576
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "A Cascaded Linear Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging" pdf

Báo cáo khoa học

... −l), and select for position i a N-best list of candidate results from all these candidates. Whenwe derive a candidate result from a word- POS pairp and a candidate q at prior position of p, ... and joint segmentation and part -of- speech tagging. On the Penn Chinese Treebank 5.0, we obtain an error reduction of 18.5% on segmentation and 12% on joint seg-mentation and part -of- speech tagging ... that segmentation and POS tagging taskis to divide a character sequence into several subse-quences and label each of them a POS tag.It is a better idea to perform segmentation and POS tagging...
  • 8
  • 445
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging" potx

Báo cáo khoa học

... sequence of characters c = (c1, , c#c),the task of word segmentation and POS tagging isto predict a sequence of word and POS tag pairsy = (w1, p1, w#y, p#y), where wiis a word, ... model, joint word segmen-tation and POS tagging is decomposed into twosteps: (1) coarse-grained word segmentation and tagging, and (2) fine-grained sub -word tagging. Theworkflow is shown in ... decoding for POS tagging over sub-words is efficient. Finally, the Chinese language ischaracterized by the lack of morphology that oftenprovides important clues for POS tagging, and the POS tags...
  • 10
  • 412
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Parsing the Internal Structure of Words: A New Paradigm for Chinese Word Segmentation" doc

Báo cáo khoa học

... Liang Huang, and Qun Liu. 2009. Au-tomatic adaptation of annotation standards: Chinese word segmentation and POS tagging – a case study. InProceedings of the Joint Conference of the 47th An-nual ... and HitoshiIsahara. 2009. An error-driven word- character hybridmodel for joint Chinese word segmentation and POS tagging. In Proceedings of the Joint Conference of the47th Annual Meeting of ... modelfor word structure parsing is integrated with con-stituent parsing. There has been many efforts to in-tegrate Chinese word segmentation, part -of- speech tagging and parsing (Wu and Zixin,...
  • 10
  • 476
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Discriminative Pruning of Language Models for Chinese Word Segmentation" ppt

Báo cáo khoa học

... Comparison of Combined Model and KLD Model 5 Conclusions and Future Work A discriminative pruning criterion of n-gram lan-guage model for Chinese word segmentation was proposed in this paper, and ... and word segmentation performance is also discussed. 1 Introduction Chinese word segmentation is the initial stage of many Chinese language processing tasks, and has received a lot of attention ... of the 41st Annual Meeting of Association for Computational Linguis-tics (ACL-2003), pages 272-279. Jianfeng Gao, Mu Li, Andi Wu, and Chang-Ning Huang. 2005. Chinese Word Segmentation and...
  • 8
  • 294
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Rethinking Chinese Word Segmentation: Tokenization, Character Classification, or Wordbreak Identification" pdf

Báo cáo khoa học

... J. and A. Wu and Mu Li and C N.Huang and H. Li and X. Xia and H. Qin. 2004. Adaptive Chinese Word Segmentation. In Proceedings of ACL-2004.Meng, H. and C. W. Ip. 1999. An Analytical Study of Transformational ... N. 2003. Chinese Word Segmentation as Charac-ter Tagging. Computational Linguistics and Chinese Language Processing. 8(1): 29-48Redington, M. and N. Chater and C. Huang and L. Chang and K. Chen. ... that Chinese word segmentation is the classifi-cation of a string of character-boundaries(CB’s) into either word- boundaries (WB’s) and non -word- boundaries. In Chinese, CB’sare delimited and...
  • 4
  • 301
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Chinese Word Segmentation without Using Lexicon and Hand-crafted Training Data" pdf

Báo cáo khoa học

... "~J~:~""7~: ~'"'~}~:~'"'~: ~"should be separated and "~: ~'"'~:~'"'~: [] '"'}~: ~J:" be combined ... dis(locmax, y:z) = dts(x:y)- dts(y:z) Definition 7 Suppose 'vxyzw' is a Chinese 1268 Chinese Word Segmentation without Using Lexicon and Hand-crafted Training Data Sun Maosong, Shen Dayang*, ... Any Chinese word is composed of either single or multiple characters. Chinese texts are explicitly concatenations of characters, words are not delimited by spaces as that in English. Chinese...
  • 7
  • 396
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Exploring Deterministic Constraints: From a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation" ppt

Báo cáo khoa học

... beginning of a word and Iall other positions; and 2) BMES: where B, M and Erepresent the beginning, middle and end of a multi-character word respectively, and S tags a single-character word. ... decoding.3 Chinese Word Segmentation (CWS)3.1 Word segmentation as character tagging Considering the ambiguity problem that a Chinese character may appear in any relative position in a word and the ... (cn−lk+1 cn) represents a segmentation of k words and the lengths of the first and last word are l1 and lkrespectively.In early work, rule-based models find words oneby one based on heuristics...
  • 9
  • 425
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Automatic Evaluation of Chinese Translation Output: Word-Level or Character-Level" doc

Báo cáo khoa học

... ICTCLAS2009. NUS Chinese word segmenter (NUS): The NUS Chinese word segmenter uses a maximum entropy approach to Chinese word segmentation, which achieved the highest F-measure on three of the four ... into words: Translation: 多少_钱_的_伞_吗_? Reference: 这些_雨伞_多少_钱_? The word “伞” is a synonym for the word “雨伞”, and both words are translations of the English word “umbrella”. If a word- level ... the four corpora in the open track of the Second Interna-tional Chinese Word Segmentation Bakeoff (Ng and Low, 2004; Low et al., 2005). The segmenta-tion standard adopted in this paper is CTB...
  • 6
  • 344
  • 1
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Unsupervized Word Segmentation: the case for Mandarin Chinese" doc

Báo cáo khoa học

... segmented Chinese text,most of the tokens are uni- and bigrams but most of the types are bi- and trigrams (as unigrams are oftenhigh frequency grammatical words and trigrams theresult of more ... unambiguous cases of numbers and dates in Chinese script.From h→(x0 n) and h→(x0 n−1) on the one hand, and from h←(x0 n) and h←(x1 n) we estimate theVariation of Branching Entropy ... redefine the sentence segmentation problem as the maximization of the au-tonomy measure of its words. For a character se-quence s, if we call Seg(s) the set of all the possiblesegmentations, then...
  • 5
  • 467
  • 1
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "Finding Synonyms Using Automatic Word Alignment and Measures of Distributional Similarity" pdf

Báo cáo khoa học

... synonyms and other types of semantically related words such asantonyms, (co)hyponyms and hypernyms.We present a method based on automatic word alignment of parallel corpora con-sisting of documents ... context and one using translational context based on word alignment and the combination of both. For bothapproaches, we used a cutoff n for each row in our word- by-context matrix. A word is ... word P(W) is the probability of seeing the word P(f) is the probability of seeing the featureP(W,f) is the probability of seeing the word and the featuretogether.3.3 Word AlignmentThe multilingual...
  • 8
  • 516
  • 0

Xem thêm