fully unsupervised word segmentation with bve and mdl

Báo cáo khoa học: "Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling" doc

Báo cáo khoa học: "Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling" doc

Ngày tải lên : 17/03/2014, 01:20
... with respect to” actually function as a single word, and we of- ten condense them into the virtual words “UK” and “w.r.t.”. In order to extract “words” from text streams, unsupervised word segmentation ... this paper, we proposed a much more efficient and accurate model for fully unsupervised word segmentation. With a combination of dynamic programming and an accurate spelling model from a Bayesian ... Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, pages 100–108, Suntec, Singapore, 2-7 August 2009. c 2009 ACL and AFNLP Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor...
  • 9
  • 238
  • 0
Báo cáo khoa học: "Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese" potx

Báo cáo khoa học: "Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese" potx

Ngày tải lên : 07/03/2014, 18:20
... q −1 and q −2 respectively denote the last-shifted word and the word shifted before q −1 . q.w and q.t respectively denote the (root) word form and POS tag of a subtree (word) q, and q.b and q.e ... on CTB-6 and CTB-7 accuracies of POS tagging and dependency pars- ing were remarkably improved by 0.6% and 2.4%, respectively corresponding to 8.3% and 10.2% er- ror reduction. For word segmentation, ... Gale, and Nancy Chang. 1996. A stochastic finite-state word- segmentation algorithm for Chinese. Computational Linguistics, 22. Weiwei Sun. 2011. A stacked sub -word model for joint Chinese word segmentation...
  • 9
  • 523
  • 0
Báo cáo khoa học: "Unsupervised Word Alignment with Arbitrary Features" potx

Báo cáo khoa học: "Unsupervised Word Alignment with Arbitrary Features" potx

Ngày tải lên : 17/03/2014, 00:20
... Bouchard-C ˆ ot ´ e, J. DeNero, and D. Klein. 2010. Painless unsupervised learning with features. In Proc. of NAACL. P. Blunsom and T. Cohn. 2006. Discriminative word alignment with conditional random fields. In ... Collins, and T. Darrell. 2004. Condi- tional random fields for object recognition. In NIPS 17. H. Setiawan, C. Dyer, and P. Resnik. 2010. Discrimina- tive word alignment with a function word reordering model. ... be- tween pairs of source and target word types across sentence pairs (Dice, 1945), IBM Model 1 forward and reverse probabilities, and the geometric mean of the Model 1 forward and reverse probabilities....
  • 11
  • 292
  • 0
Tài liệu Báo cáo khoa học: "Unsupervised Discourse Segmentation of Documents with Inherently Parallel Structure" pdf

Tài liệu Báo cáo khoa học: "Unsupervised Discourse Segmentation of Documents with Inherently Parallel Structure" pdf

Ngày tải lên : 20/02/2014, 04:20
... Linguistics Unsupervised Discourse Segmentation of Documents with Inherently Parallel Structure Minwoo Jeong and Ivan Titov Saarland University Saarbr ă ucken, Germany {m.jeong|titov}@mmci.uni-saarland.de Abstract Documents ... story. Our model We evaluate our joint model of seg- mentation and alignment both with and without the split/merge moves. For the model without these moves, we set the desired number of seg- ments ... user interfaces and im- prove the performance of summarization and in- formation retrieval systems. Discourse segmentation of the documents com- posed of parallel parts is a novel and challeng- ing...
  • 5
  • 376
  • 0
Tài liệu Báo cáo khoa học: "Joint Word Segmentation and POS Tagging using a Single Perceptron" docx

Tài liệu Báo cáo khoa học: "Joint Word Segmentation and POS Tagging using a Single Perceptron" docx

Ngày tải lên : 20/02/2014, 09:20
... tag t with word w 2 tag bigram t 1 t 2 3 tag trigram t 1 t 2 t 3 4 tag t followed by word w 5 word w followed by tag t 6 word w with tag t and previous character c 7 word w with tag t and next ... a word starting with char c 0 and containing char c 13 tag t on a word ending with char c 0 and containing char c 14 tag t on a word containing repeated char cc 15 tag t on a word starting with ... sentence, and T is the size of the tag set (T = 1 for pure word segmentation) . It worked well for word segmentation alone (Zhang and Clark, 2007), even with an agenda size as small as 8, and a simple...
  • 9
  • 576
  • 0
Tài liệu Báo cáo khoa học: "Learning Word Senses With Feature Selection and Order Identification Capabilities" pdf

Tài liệu Báo cáo khoa học: "Learning Word Senses With Feature Selection and Order Identification Capabilities" pdf

Ngày tải lên : 20/02/2014, 16:20
... (Pantel and Lin, 2002; Schăutze, 1998), there are other related efforts on word sense discrimination (Dorow and Widdows, 2003; Fukumoto and Suzuki, 1999; Pedersen and Bruce, 1997). In (Pedersen and ... about derivation of feature vectors. A feature for target word here consists of a contextual content word and its grammatical relationship with target word. Ac- quisition of grammatical relationship depends ... case characters, ignoring all words that con- tain digits or non alpha-numeric characters, remov- ing words from a stop word list, and filtering out low frequency words which appeared only once...
  • 8
  • 463
  • 0
Tài liệu Báo cáo khoa học: "Chinese Word Segmentation without Using Lexicon and Hand-crafted Training Data" pdf

Tài liệu Báo cáo khoa học: "Chinese Word Segmentation without Using Lexicon and Hand-crafted Training Data" pdf

Ngày tải lên : 20/02/2014, 18:20
... 1268 Chinese Word Segmentation without Using Lexicon and Hand-crafted Training Data Sun Maosong, Shen Dayang*, Benjamin K Tsou** State Key Laboratory of Intelligent Technology and Systems, ... Chinese word segmentation developed so far, both statistical and rule-based, exploited two kinds of important resources, i.e., lexicon and hand-crafted linguistic resources(manually segmented and ... between mi and dts in depth; and (3) integrating it as a module with the existing Chinese segmenters so as to improve their performance (especially in ability to cope with unknown words and ability...
  • 7
  • 396
  • 0
Tài liệu Báo cáo khoa học: "POS Disambiguation and Unknown Word Guessing with Decision Trees" pot

Tài liệu Báo cáo khoa học: "POS Disambiguation and Unknown Word Guessing with Decision Trees" pot

Ngày tải lên : 22/02/2014, 03:20
... of the tagger and the order of processing: Raw Text I I I words with one tag I I I re°re un~ownl I ~an w°r , 4;; Disambiguator I tags" I &Guesser I I words with one tag Ta ... words, which examines contextual features along with the word ending and capitalization and returns an open-class POS. 3 Training Sets For the study and resolution of lexical ambiguity in M. ... Proceedings of EACL '99 POS Disambiguation and Unknown Word Guessing with Decision Trees Giorgos S. Orphanos Computer Engineering & Informatics Dept. and Computer Technology Institute University...
  • 8
  • 326
  • 0
Báo cáo khoa học: "A Cascaded Linear Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging" pdf

Báo cáo khoa học: "A Cascaded Linear Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging" pdf

Ngày tải lên : 08/03/2014, 01:20
... p (position i −l), and select for position i a N-best list of candidate results from all these candidates. When we derive a candidate result from a word- POS pair p and a candidate q at prior ... sources effectively and obtain accuracy improvements on both segmentation and Joint S&T. 2 Segmentation and POS Tagging Given a Chinese character sequence: C 1:n = C 1 C 2 C n the segmentation ... seg- mentation only and joint segmentation and part-of-speech tagging. On the Penn Chinese Treebank 5.0, we obtain an error reduction of 18.5% on segmentation and 12% on joint seg- mentation and part-of-speech...
  • 8
  • 445
  • 0
Báo cáo khoa học: "Chinese Segmentation with a Word-Based Perceptron Algorithm" docx

Báo cáo khoa học: "Chinese Segmentation with a Word-Based Perceptron Algorithm" docx

Ngày tải lên : 08/03/2014, 02:21
... experiments without such optimization. 845 1 word w 2 word bigram w 1 w 2 3 single-character word w 4 a word starting with character c and having length l 5 a word ending with character c and having length ... characters c 1 and c 2 of two con- secutive words 12 the ending characters c 1 and c 2 of two con- secutive words 13 a word of length l and the previous word w 14 a word of length l and the next word w Table ... characters c 1 and c 2 7 character bigram c 1 c 2 in any word 8 the first and last characters c 1 and c 2 of any word 9 word w immediately before character c 10 character c immediately before word w 11 the...
  • 8
  • 380
  • 0
Báo cáo khoa học: "A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging" potx

Báo cáo khoa học: "A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging" potx

Ngày tải lên : 17/03/2014, 00:20
... token and which class it belongs to. Solvers may use previously pre- dicted words and their POS information as clues to find a new word. After one word is found and classi- fied, solvers move on and ... 2010. Word- based and character-based word segmentation models: Comparison and combi- nation. In Coling 2010: Posters, pages 1211–1219, Beijing, China, August. Coling 2010 Organizing Com- mittee. Andr ´ e ... sub -word model, joint word segmen- tation and POS tagging is decomposed into two steps: (1) coarse-grained word segmentation and tagging, and (2) fine-grained sub -word tagging. The workflow is shown in...
  • 10
  • 412
  • 0
Báo cáo khoa học: "An Error-Driven Word-Character Hybrid Model for Joint Chinese Word Segmentation and POS Tagging" docx

Báo cáo khoa học: "An Error-Driven Word-Character Hybrid Model for Joint Chinese Word Segmentation and POS Tagging" docx

Ngày tải lên : 17/03/2014, 01:20
... deal- ing with this issue. With this search space representation, we can consistently handle unknown words with character-level nodes. In other words, we use word- level nodes to identify known words and character-level ... ACL and AFNLP An Error-Driven Word- Character Hybrid Model for Joint Chinese Word Segmentation and POS Tagging Canasai Kruengkrai †‡ and Kiyotaka Uchimoto ‡ and Jun’ichi Kazama ‡ Yiou Wang ‡ and ... discriminative word- character hybrid model for joint Chi- nese word segmentation and POS tagging. Our word- character hybrid model offers high performance since it can handle both known and unknown words....
  • 9
  • 338
  • 0
Báo cáo khoa học: "Automatic Adaptation of Annotation Standards: Chinese Word Segmentation and POS Tagging – A Case Study" potx

Báo cáo khoa học: "Automatic Adaptation of Annotation Standards: Chinese Word Segmentation and POS Tagging – A Case Study" potx

Ngày tải lên : 17/03/2014, 01:20
... the word containing these characters. In addition, Ng and Low (2004) find that, compared with POS tagging after word segmentation, Joint S&T can achieve higher accuracy on both segmentation and ... representation of Ng and Low (2004). For word segmentation only, there are four boundary tags: ã b: the begin of the word ã m: the middle of the word ã e: the end of the word ã s: a single-character word while ... the ACL and the 4th IJCNLP of the AFNLP, pages 522–530, Suntec, Singapore, 2-7 August 2009. c 2009 ACL and AFNLP Automatic Adaptation of Annotation Standards: Chinese Word Segmentation and POS...
  • 9
  • 404
  • 0