... Ftrg(v))– backward translation probability; it can be esti-mated from a parsed and aligned parallel corpus.To summarize: the task of tectogrammaticaltransfer can be formulated as revealing the valuesof ... Themathematics of statistical machine translation: Pa-rameter estimation. Computational Linguistics.Matthew Crouse, Robert Nowak, and Richard Bara-niuk. 1998. Wavelet- based statistical signal ... tree- shaped analogy to the popular n-gram ap-proaches to Statistical Machine Translation (e.g.(Koehn et al., 2003)), in which translation and lan-guage models are trainable separately too....
... the ACL Student Research Workshop, pages 19–24,Ann Arbor, Michigan.Zoubin Gha hramani and Michael I. Jordan. 1997. Facto-rial hiddenmarkov models. Machine Learning, 29:1–31. A. Haghighi and ... Ohioschuler@ling.osu.eduAbstractThis paper presents a supervised pronounanaphora resolution system based on factorial hidden Markov models (FHMMs). The ba-sic idea is that the hidden states of FHMMsare a n explicit ... morphological features of words trained fromthe corpus and the strings concatenated from the tree leaves are made. This method is about asaccurate as the approach described by Klein andManning...
... is part of the Lancaster Treebank corpusand contains 1473 sentences. Each sentence con-tains hand-labeled syntactic roles for natural lan-guage text. A. 200 A. 400 A. 600 A. 800 A. 1000 A. 1200 A. 14000.860.880.900.920.94B.200B.400B.600B.800B.1000B.1200B.14000.860.880.900.920.940.860.880.900.920.94FC.200C.400C.600C.800C.1000C.1200C.14000.860.880.900.920.940.860.880.900.920.94FFigure ... different model on the Lan-caster Treebank data set. The models used in thisevaluation were trained with observation data fromthe Lancaster Treebank training set. The trainingset and testing set are ... a modified hidden Markov model Lin-Yi ChouUniversity of WaikatoHamiltonNew Zealandlc55@cs.waikato.ac.nzAbstractThis paper explores techniques to take ad-vantage of the fundamental difference...
... in archaea the top and bottom is rep-resented by Haloarcula marismortui (146 proteins) andNanoarchaeum equitans (five proteins). The genomes ofOryza sativa and Xenopus tropicalis have many ... withinthe same range as for other eukaryotes. There are foureukaryotic parasites (Plasmodium falciparum, Plasmo-dium yoelii, Leishmania major and Entamoeba histolyti-ca) for which the ratio of ... 15%,respectively. The bacterial genome of Chlamydophilacaviae also show a dual sites proportion of 15%, whilethe archeal genomes of Thermococcus kodakaraensisand Nanoarchaeum equitans show 17 and 20%, respect-ively....
... documents has a very heavy tail; that is,there are a few heavily-used codes and a largenumber of codes that are used only occasionally.An ideal approach will work well with both high-frequency and ... Ginter, S. Pyysalo, A. Airola,T. Pahikkala, S. Salanter, and T. Salakoski. 2008.Machine learning to automate the assignment of di-agnosis codes to free-text radiology reports: a methoddescription. ... Research Council Canada{Svetlana.Kiritchenko,Colin.Cherry}@nrc-cnrc.gc.caAbstractThe automatic coding of clinical documentsis an important task for today’s healthcareproviders. Though it can...
... International Conferenceon Machine Learning (ICML), pages 1063–1070, SanFrancisco, CA, USA.Marilyn A. Walker, Diane J. Litman, Candace A. Kamm,and Alicia Abella. 1997. PARADISE: A frameworkfor ... have an average of 650 surface realisations,including syntactic and lexical variation, and deci-sions of granularity. We refer to the set of alterna-tive realisations of a semantic form as ... approach performs betterthan greedy or random baselines.1 IntroductionSurface realisation decisions in a Natural LanguageGeneration (NLG) system are often made accord-ing to a language model...
... DOM tree alignments, there is substantial re-search focusing on syntactic tree alignment model for machine translation. For example, (Wu 1997; Alshawi, Bangalore, and Douglas, 2000; Yamada and ... documents. Parallel hyperlinks are used to pin-point new parallel data, and make parallel data mining a recursive process. Parallel text chunks are fed into sentence aligner to extract parallel ... three features, the maximum en-tropy model is trained on 1,000 pairs of web pages manually labeled as parallel or non-parallel. The Iterative Scaling algorithm (Pietra, Pietra and Lafferty...
... characters in the alphabet.• Transition probabilities and initial probabilities are calculated from language model. • Observations and observation probabilities are as before. a mherstbvfo• ... βk(i) What is Covered•Observable Markov Model • Hidden Markov Model •Evaluation problem•Decoding Problem• We can construct a single HMM for all words.• Hidden states = all characters ... Baum-Welch (known as forward backward) algorithm and EM (Expectation maximization) algorithmHMM Assumptions• Markov assumption: the state transition depends only on the origin and destination•Output-independent...
... Computational Linguistics A Tree Transducer Model for Synchronous Tree- Adjoining GrammarsAndreas MalettiUniversitat Rovira i VirgiliAvinguda de Catalunya 25, 43002 Tarragona, Spain.andreas.maletti@urv.catAbstract A ... we assumethat all adjunctions are mandatory; i.e., if an aux-iliary tree can be adjoined, then we need to makean adjunction. Thus, a derivation starting from aninitial tree to a derived tree ... auxiliary tree by a special marker. Traditionally, the root label A ofan auxiliary tree is replaced by A ∅once adjoined.Since we assume that there are no auxiliary treeswith such a root label,...
... Forest-to-String Statistical Translation Rules. ACL-07. 704-711. Daniel Marcu, W. Wang, A. Echihabi and K. Knight. 2006. SPMT: Statistical Machine Translation with Syntactified Target Language Phrases. ... decoding algorithm. It translates each span ite-ratively from small one to large one (lines 1-2). This strategy can guarantee that when translating the current span, all spans smaller than the ... Brooke Cowan, Ivona Kucerova and Michael Collins. 2006. A discriminative model for tree- to -tree transla-tion. EMNLP-06. 232-241. Yuan Ding and Martha Palmer. 2005. Machine transla-tion using...
... 1989. A tree- based statistical language model for natural language speech recognition. IEEE Transactions on Acous- tics, Speech, and Signal Processing, 37:1001- 1008. L. Breiman, J. Friedman, ... by taking a cut through the tree to obtain a set of subtrees. The reason for keeping a hierarchy instead of a fixed partition of the vocabulary is to be able to dynamically adjust the partition ... strong classes by looking at parts of speech and synonyms, it is hard to produce a full hierarchy of a large vocabulary. Perhaps a combination of the expert and data-driven ap- proaches would...
... FOR THE MARKOV CHAIN MODEL Perhaps the most significant advantage of the Markov chain formulation is that one can calculate the number of examples needed to acquire a language. Recall it is ... pn@ai.mit.edu, berwick@ai.nfit.edu Abstract This paper shows how to formally characterize lan- guage learning in a finite parameter space as a Markov structure, hnportant new language ... NSF grant 9217041-ASC and ARPA under the HPCC pro- gram. REFERENCES Clark, Robin and Roberts, Ian (1993). " ;A Compu- tational Model of Language Learnability and Lan- guage Change."...
... probabilities, are encouraging. VARIABLE MEMORY MARKOV MODELS Markov models are a natural candidate for lan- guage modeling and temporal pattern recognition, mostly due to their mathematical simplicity. ... sections, any finite memory Markovmodel cannot capture the recursive nature of natural lan- guage. The VMM can accommodate longer sta- tistical dependencies than a traditional full-order Markov model, ... context, based on Variable Memory Markov (VMM) models. In con- trast to fixed-length Markov models, which predict based on fixed-length histories, variable memory Markov models dynamically adapt...
... multicellulareukaryotes and mammals.Species distribution A closer look at the distribution among the classifiedSDRs in the domains Eukaryota, Bacteria and Archaea(Fig. 2) reveals that more than half ... families are only found among bacteria, wherethe ‘classical’ SDR type is most prominent. The HMM-based classificationis used as a basis for a sustainable and expandable nomenclature system.AbbreviationsAKR, ... we apply hidden Markov models (HMMs) to obtain a sequence-basedsubdivision of the SDR superfamily that allows forautomatic classification of novel sequence data andprovides the basis for a nomenclature...
... ACM International Confer-ence on Multimedia(ACM-MM05), pages 6–11.Ryohei Sasano, Daisuke Kawahara, and Sadao Kuro-hashi. 2004. Automatic construction of nominalcase frames and its application ... 366–369.Daisuke Kawahara and Sadao Kurohashi. 2002. Fertil-ization of case frame dictionary for robust japanesecase analysis. In Proceedings of 19th COLING(COLING02), pages 425–431.Daisuke Kawahara ... similar verb usages(Kawahara and Kurohashi, 2002). An example ofthe automatically constructed case frame is shownin Table 3. For example, “塩を入れる (add salt)”is assigned to ireru:1 (add) and...