... and applied naive Bayes and decision tree to it. Their accuracy results are worse than(Blaheta and Charniak, 2000). Neither (Blaheta andCharniak, 2000) nor (Lintean and Rus, 200 7a; Lin-tean and ... binary annotationscan again be treated as pseudo function tags and theproposed treeannotator can be readily applied to thisproblem.As an example, the top half of Figure 3 con-tains an Arabic ... Chinese TreeBank: Phrasestructure annotation of a large corpus. Natural Lan-guage Engineering, 11(2):207–238.Kenji Yamada and Kevin Knight. 2001. A syntax-based statistical translation model....
... systemlearns this as a non-transliteration but it is wronglyannotated as a transliteration in the gold standard.Arabic nouns have an article “al” attached to themwhich is translated in English as ... InternationalLanguage Resources and Evaluation (LREC’10), Val-letta, Malta.Sittichai Jiampojamarn, Kenneth Dwyer, Shane Bergsma,Aditya Bhargava, Qing Dou, Mi-Young Kim, andGrzegorz Kondrak. ... usesHidden Markov Models (Nabende, 2010; Darwish,2010; Jiampojamarn et al., 2010), Finite State Au-tomata (Noeman and Madkour, 2010) and Bayesianlearning (Kahki et al., 2011) to learn transliterationpairs...
... interpretation wasconsidered correct if it made sense in some imag-inary context.Lapata and Lascarides (2003) extend Utiyama’sapproach to interpretation of logical metonymiescontaining aspectual ... Thomson AvenueCambridge CB3 0FD, UKEkaterina.Shutova@cl.cam.ac.ukAbstractThe use of figurative language is ubiqui-tous in natural language texts and it is a serious bottleneck in automatic text ... un-derstanding. We address the problem ofinterpretation of logical metonymy, using a statistical method. Our approach origi-nates from that of Lapata and Lascarides(2003), which generates a list...
... because usually a smaller sam- ple may well include more 'central' terms. We may need further study concerning the status of the available terminological corpora. 5.2 Statistical ... frequencies and on related Marko- vian models of discourse." In: Jakobson, R. (ed.) Structure of Language and its Math- ematical Aspects. Rhode Island: American Mathematical Society. ... incorporate the lognormal 'law' (Carrol, 1967), the inverse Gauss-Poisson 'law' (Sichel, 1986), Zipf's 'law' (Zipf, 1935) and Yule-Simon 'law' (Simon,...
... DOM tree alignments, there is substantial re-search focusing on syntactic tree alignment model for machine translation. For example, (Wu 1997; Alshawi, Bangalore, and Douglas, 2000; Yamada and ... grammars and bilingual parsing of parallel corpora. Computational Linguistics, 23(3). Yamada K. and K. Knight. 2001. A Syntax Based Statistical Translation Model. In Proceedings of 39th Annual ... location holding more parallel data. This ap-proach is based on our observation that parallel pages share similar structures holding parallel content, and parallel hyperlinks refer to new par-allel...
... of a morphological analy- sis program, and also with the single one of those tags that astatistical POS tagging program had predicted to be the correct tag (Haji~ and Hladka, 1998). Table ... AStatistical Parser for Czech* Michael Collins AT&T Labs-Research, Shannon Laboratory, 180 Park Avenue, Florham Park, NJ 07932 mcollins@research, att.com Jan Haj i~. Institute ... Other Slavic languages (such as Polish, Russian, Slovak, Slovene, Serbo-croatian, Ukrainian) also show these characteristics. Many European lan- guages exhibit FWO and HI phenomena to a lesser...
... poems as outliers).4 Selection of lexical and syntacticvariablesAny text classification tasks require an object(here a text) to be parameterised into variables,whether qualitative or quantitative. ... suggests that there isno particular order to the CEFR levels.From a practical perspective, things are not soclear. Traditional approaches have usually vieweddifficulty as an interval scale and applied ... correlation coefficient, prediction accu-racy as defined by Tan et al. (2005), and adjacentaccuracy. Adjacent accuracy is defined by Heil-man et al. (2008) as “the proportion of predictionsthat...
... ourknowledge such a combination of goals with dif-ferent attribute values cannot be straightforwardlyhandled by comparable state-of-the-art statistical SDSs which appear in the literature.Crook and Lemon ... dialogue ac-tion (e.g. offer a restaurant, ask for clarification).Recent research in statistical SDSs has success-fully addressed aspects of these problems throughthe application of Partially ... sys-tems computationally tractable.Work in dialogue system evaluation, e.g.Walker et al. (2004) and Lemon et al. (2006),shows that real user goals are generally sets ofitems, rather than a single...
... CANADA: ASTATISTICAL PORTRAIT3431.4% higher than the Canadian average. Thecancer death rate for Nova Scotia women is13% above the national average. Nova Scotiaand New Brunswick have the ... revealing. A particu-larly high percentage of Nova Scotia womenrecord high blood pressure (more than one infive), 80% above the national average, andWOMENS HEALTH IN ATLANTIC CANADA: ASTATISTICAL ... specifically in health,care-giving and social services, volunteering inWOMENS HEALTH IN ATLANTIC CANADA: ASTATISTICAL PORTRAIT20prose literacy are also higher than those ofmales for all age...
... the situation; the math is the same.We assume that we are in an adversarial situation, where an adversarymay have corrupted some of the objects. For example, the adversary mighthave tampered ... Stenger from theNational Election Data Archive Project is now also available; there is also a nice associated audit size calculation utility on a web site [7]. Stanisle-vic [11] also examines the ... votingprecincts may have different numbers of voters. This complicates mattersconsiderably. Stanislevic [11] has a good approach to handling this situation.13On Estimating the Size of aStatistical AuditRonald...
... co-occurrence analysis oper-ate over large corpora, which are typically unavail-able for a lost language.Finally, Knight and Yamada (1999) and Knightet al. (2006) describe a computational HMM-based ... Cunchillos, Juan-Pablo Vita, and Jose-´Angel Zamora. 2002. Ugaritic data bank. CD-ROM.Gregoria del Olo Lete and Joaqu´ın Sanmart´ın. 2004. A Dictionary of the Ugaritic Language in the Alpha-betic ... structural sparsity constraints oncharacter-level mappings. We assume that an ac-curate alphabetic mapping between related lan-guages will be sparse in the following way: eachletter will map to a...
... the tagger with that in a theoretically more powerful model trained on thesame data, such as an incremental statistical parser(Wang et al., 2004; Roark, 2001). In so doing wecan find the places ... Sch¨utze. Foundations of Statistical Natural Language Processing. TheMIT Press, Cambridge, Massachusetts, 1999.B. Roark. Probabilistic top-down parsing and lan-guage modeling. Computational Linguistics, ... probabil-ity re-ranking. That is, the tagger initially favorsthe main-verb interpretation for the ambiguous -edform, and later it makes a repair when the ambigu-ity is resolved as a past-participle.The...
... Table 3: Examples of common character bigrams for each part of speech in the infrequent words character type sequence kanji katakana katakana-kanji kanji-hiragana hiragana kanji-katakana ... kanji-hiragana hiragana kanji-katakana kat akana-symbol-katakana number kanji-hiragana-kanji alphabet kanji-hir agana-kanji-hir agana hiragana-kanji percent 45.1% 11.4% 6.5% 5.6% ... different types of characters other than punc- tuation marks: kanji, hiragana, katakana, Roman alphabet, and Arabic numeral. Kanji which means 'Chinese character' is used for both...