... is a hand-crafted Chinese grammar. For such a system, as proba-bly for any parsing system that presupposes seg-mented (and tagged) input, the accuracy of thesegmentation and POStagging analyses ... rules learned in a TBL training run canstraightforwardly be translated into a cascade ofFST rules.3.2.1 Transformation-Based Learning andµ-TBLTBL is a machine learning approach that has beenemployed ... Grace Ngai, Yongsheng Yang, and Ben-feng Chen. 2004. A maximum-entropy Chinese parser augmented by transformation-based learning.ACM Transactions on Asian Language InformationProcessing (TALIP),...
... to avoid error propagation and make useof POS information for word segmentation, segmen-tation and POStagging can be viewed as a singletask: given a raw Chinese input sentence, the joint POS ... used as training and test data for development.The standard F-scores are used to measure boththe word segmentation accuracy and the overall seg-mentation and tagging accuracy, where the overallaccuracy ... tags are assigned to eachcharacter to represent its segmentation and POS. For example, the tag “bNN” indicates a character atthe beginning of a noun. Using this method, POS features are allowed...
... at the same time, we expand boundarytags to include POS information by attaching a POS to the tail of a boundary tag as a postfix followingNg and Low (2004). As each tag is now composedof a ... is a better idea to perform segmentation and POS tagging jointly in a uniform framework. Ac-cording to Ng and Low (2004), the segmentationtask can be transformed to atagging problem by as-signing ... propose a cascaded linear model for joint Chinese word segmentation and part-of-speech tagging. With a character-basedperceptron as the core, combined with real-valued features such as language...
... information for each character.Each character can be assigned one of two possi-ble boundary tags: “B” fora character that begins a word and “I” fora character that occurs in the mid-dle of a word. ... on Nat-ural Language Learning at HLT-NAACL 2003, pages200–203.Nianwen Xue, Fei Xia, Fu-Dong Chiou, and MarthaPalmer. 2005. The penn chinese treebank: Phrasestructure annotation of a large ... representa-tion (Ramshaw and Marcus, 1995) and the Start/Endrepresentation (Kudo and Matsumoto, 2001) arepopular. For example, the label B-NN indicates that a character is located at the begging of a noun....
... dictionary, haveregular POS tags. Character-level nodes have spe-cial tags where position-of-character (POC) and POS tags are combined (Asahara, 2003; Naka-gawa, 2004). POC tags indicate the ... Chinese Word Segmentation and POS Tagging Canasai Kruengkrai†‡and Kiyotaka Uchimoto‡and Jun’ichi Kazama‡Yiou Wang‡and Kentaro Torisawa‡and Hitoshi Isahara†‡†Graduate School of Engineering, ... 277–284.Tetsuji Nakagawa and Kiyotaka Uchimoto. 2007. A hybrid approach to word segmentation and pos tag-ging. In Proceedings of ACL Demo and Poster Ses-sions.Tetsuji Nakagawa. 2004. Chinese and japanese...
... ACL and AFNLPAutomatic Adaptation of Annotation Standards: Chinese Word Segmentation and POSTagging – A Case StudyWenbin Jiang†Liang Huang‡Qun Liu††Key Lab. of Intelligent Information ... and Representation: Bootstrapping AnnotatedLanguage Data.David Chiang. 2007. Hierarchical phrase-based trans-lation. Computational Linguistics, pages 201–228.Michael Collins and Brian Roark. ... liuqun}@ict.ac.cn liang.huang.sh@gmail.comAbstractManually annotated corpora are valuablebut scarce resources, yet for many anno-tation tasks such as treebanking and se-quence labeling there...
... classification. Journal of Machine LearningResearch, 9:1871–1874, June.Jan Hajiˇc, Massimiliano Ciaramita, Richard Johansson,Daisuke Kawahara, Maria Ant`onia Mart´ı, Llu´ıs M`arquez,Adam ... dependency parsers).3 ResultsTable 3 tabulates efficiency and performance for allparsers; UAS and LAS are unlabeled and labeled at-tachment scores, respectively — the standard crite-ria for evaluating ... Computa-tional Linguistics.Alexander M. Rush, David Sontag, Michael Collins, andTommi Jaakkola. 2010. On dual decomposition and linearprogramming relaxations for natural language processing....
... GRAMMARS The grammars which are supported by the parser are a subset of those for Structure Unification Grammar. These grammars are for the most part lexicalized. Each lexicalized grammar ... Categorial Grammar, Lexi- cal Functional Grammar, and Head-driven Phrase Structure Grammar. An SUG grammar is a set of partial descrip- tions of phrase structure trees. Each SUG gram- mar ... In Lawrence Davis, editor, Genetic Algorithms and Simulated Annealing, chapter 11, pages 141- 154. Morgan Kaufmann Publishers, Los Altos, CA. Shastri, Lokendra and Ajjanagadde, Venkat (1990)....
... Kruengkrai, Kiyotaka Uchimoto, Jun’ichiKazama, Yiou Wang, Kentaro Torisawa, and HitoshiIsahara. 2009. An error-driven word-character hybridmodel for joint Chinese word segmentation and POS tagging. ... a sequence of POS tags. The joint approach to word segmentation and POS tagging has been reported to improve word seg-mentation and POStagging accuracies by more than1% in Chinese (Zhang and ... model is fundamentally a com-bination of the features used in the state-of-the-artjoint segmentation and POStagging model (Zhangand Clark, 2010) and dependency parser (Huang andSagae, 2010),...
... Lamar* Division of Applied Mathematics Brown University Providence, RI, USA mlamar@dam.brown.edu Yariv Maron* Gonda Brain Research Center Bar-Ilan University Ramat-Gan, Israel syarivm@yahoo.com ... Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 296–305. Marina Meilă. 2003. Comparing clusterings by the variation of information. In Bernhard ... on accuracy. Evaluation was done against the POS- tag annotations of the 45-tag PTB tagset (hereafter PTB45), and against the Smith and Eisner (2005) coarse version of the PTB tagset (hereafter...
... of a morphological analy- sis program, and also with the single one of those tags that a statistical POStagging program had predicted to be the correct tag (Haji~ and Hladka, 1998). Table ... morphological analyzer. The PDT also contains machine-assigned tags and lemmas for each word (using a tagger de- scribed in (Haji~ and Hladka, 1998)). For evaluation purposes, the PDT has been ... A Statistical Parserfor Czech* Michael Collins AT&T Labs-Research, Shannon Laboratory, 180 Park Avenue, Florham Park, NJ 07932 mcollins@research, att.com Jan Haj i~. Institute...
... inserted. Let's take the question: 'Come si sale da Cervinla al Plateau Rosa ?' 'How can one get on the Plateau Rosa from Cervinla ?' and the grammar: Rule1 : PROD: ... I'° I~.°, I1'.°, I~n I FORM3 = la FORM4 = nota FORM5 = polernica CONSTITUENT5 recognizes 'la nota polemica' 'the polemic note' CONSTITUENT7recognizes 'la ... on a SUN workstation, as the main component of a transportable Natural Language Interface (SAIL = Sistema per I'Analisi e I'lnterpretazione del Linguaggio). Subsets of grammars...
... character tagging so that part-of-speech tagging approaches can be used for word segmen-tation. This approach was also called “LMR” (Xueand Shen, 2003) or “BIES” (Asahara et al., 2005) tagging. ... soon.ReferencesMasayuki Asahara, Kenta Fukuoka, Ai Azuma, Chooi-Ling Goh, Yotaro Watanabe, Yuji Matsumoto, andTakashi Tsuzuki. 2005. Combination of machinelearning methods for optimum chinese word ... test in Bakeoff 2005, somefeatures, such as syntactic information and characterencodings for numbers and alphabetical characters,are not allowed. Therefore, we used the featuresavailable only...
... morfologickdm/AANS6 1A zna~kov£nf/NNNS6 A (/z: n~kdy/Db t~/Db' zvandm/AAI_S6 IA morfologicko /A2 -/Z: syntaktickd/AAIP1 1A )/z: jazykfi/NNIP2 A s/RR 7 bohatou/AAFS7 1A flexf/NNFS7 ... form, its possible tags and the disambiguated tag. The lemmas are ignored for tagging purposes. 4 The tag from the "disambiguated tag" field as well as the tags from the "possible ... Na~e/PSHS1-P1. metoda/NNFS1 A p~itom/Db. vyu~fvi/VB-S 3P-AA- exponenciilnfho/AAIS2 1A pravd~podobnostnfho/AAI $2 1A modelu/NNIS2 A zalo~endho/AAIS2 1A na/P~ 6 automaticky /Dg 1A...
... as it iscalculating the update. Zhang and Clark compareaspects of transition-based and graph-based pars-ing, and end up using a transition-based parser with a combined transition-based/second-ordergraph-based ... a suitable amount of training data, themodel can thus learn to make the correct deci-sion. The dynamic-programming based graph-based parser is designed in such a way that anyscore calculation ... many situations, a transition-based parser is forced to make an attachment decision for a given input word at a point where no or onlypartial information about the word’s own depen-dents (and...