... compact as it only stores the base form for each word together with its inflection class. Therefore, the complete morphological information for 324,000 word forms takes less than 2 Megabytes ... lemmata for each word form. Secondly, the tagger determines the grammatical categories of the word forms. If, for any of the lemmata, the inflected form corre- sponding to the word form in ... Table 3: Word forms with several lemmata. Conclusions In this paper, a freely available integrated tool for German morphological analysis, part-of- speech tagging and contextsensitive lemmatiza-...
... dis-tributional context, we make use, where possible, ofthe sentiment label of a document: i.e. sentiment la-bels form part of our context features. This is whatmakes the distributional thesaurus sensitive ... relies upon the availability ofunlabeled data for the construction of a sentiment sensitive thesaurus, we believe that this accounts for our lack of performance on the books domain. How-ever, given ... achievethe maximum performance in most of the cases.To study the effect of source and target domainunlabeled data on the performance of our method,we create sentiment sensitive thesauri using...
... 505–512,Sydney, July 2006.c2006 Association for Computational LinguisticsCreating a CCGbank and a wide-coverage CCG lexicon for German Julia HockenmaierInstitute for Research in Cognitive ScienceUniversity ... cannot havemore than one forward and one backward extraposed elementand one forward and one backward trace. It may be preferableto use list structures instead, especially for extraposition.509Proceedings ... posegreater challenges for syntactic theories (Rambow,1994), and the richer inflectional morphology ofthese languages creates additional problems both for the coverage of lexicalized formalisms suchas...
... Republic, June 2007.c2007 Association for Computational LinguisticsMaking Lexical Ontologies Functional and Context- Sensitive Tony VealeComputer Science and InformaticsUniversity College DublinIrelandtony.veale@ucd.ieYanfen ... to safely identify bona-fide similes. For thisreason, the filtering task is performed by a humanjudge, who annotated 30,991 of these simile in-stances (for 12,259 unique adjective/noun pairings)as ... definitioncan give rise to each of these perspectives in theappropriate contexts. We therefore do not need adifferent category definition for each metaphoricuse of Snake.To illustrate the high-level...
... dyMERGER is unity for an acquiring bank in the year before the merger and zero otherwise. In order to account for the unit root of RISK, all variables are first differenced, before applying the ... index for owner-occupied Claudia Kurz housing in West Germany 1985 to 1998 Johannes Hoffmann 9 2004 The Inventory Cycle of the German Economy Thomas A. Knetsch 10 2004 Evaluating the German ... dyMERGER is unity for an acquiring bank in the year before the merger and zero otherwise. In order to account for the unit root of RISK, all variables are first first-differenced, before applying...
... Nahrung (German noun for: food). The Para-phrase column contains a description of a synset, e.g., for the selected synset the paraphrase is: der essbare Kern einer Nuss (German phrase for: the ... Tübingen, Germany. erhard.hinrichs@uni-tuebingen.de Abstract GernEdiT (short for: GermaNet Editing Tool) offers a graphical interface for the lexicogra-phers and developers of GermaNet ... the main orthographic form prior to the Neue Deutsche Recht-schreibung. This means that Nuß was the correct spelling instead of Nuss before the German spell-ing reform. Old Orth Var contains...
... WebLicht's own data exchange format TCF. 5 The TCF Format The D-SPIN Text Corpus Format TCF (Heid et al, 2010) is used by WebLicht as an internal data exchange format. The TCF format allows the combination ... based data formats were developed beside the TCF format (for example, an encoding for lexi-con based data). In order to avoid any confusion of element names between these different for- mats, ... exchange format, which is preferably based on widely accepted formats already in use (UTF-8, XML). WebLicht uses the RESTstyle API and its own XML-based data exchange for- mat (Text Corpus Format,...
... close link between restricted forms of non-projective dependency languages and mildly context- sensitive grammar formalisms provides a promisingstarting point for future work. On the practical ... languages correspondto different mildly context- sensitive grammar for- malisms. Section 6 concludes the paper.2 PreliminariesThroughout the paper, we writeŒn for the set of allpositive natural ... 2007.c2007 Association for Computational LinguisticsMildly Context- Sensitive Dependency LanguagesMarco KuhlmannProgramming Systems LabSaarland UniversitySaarbrücken, Germanykuhlmann@ps.uni-sb.deMathias...
... dependen-cies improve parsing performance not only for NPs(which is well-known for English), but also for PPs,VPs, Ss, and coordinate categories. The best perfor-mance was obtained for a model that uses ... Results for Experiment 2: performance for models using split phrases and sister-head dependenciesCNP, etc.), a drop in performance of around 1% eachis observed. A slight drop is observed also for ... improves parsingperformance for these languages. As Experiment 1showed, this cannot be taken for granted.7 ConclusionsWe presented the first probabilistic full parsingmodel forGerman trained...
... emphasized. First, the entire WM-formalism for separable verbs has been implemented as described here. The rules forGerman have been formulated and a large dictionary for German (100'000 entries) ... word formation rule the lexicographer chooses for the definition of an individual entry. In the IRules, detachable prefixes are referred to as formatives in the formulae generating the word forms. ... and the same form functioning as part of a separable verb such as auflzOren. Redundancies emerge between the two different entries for aufhOren, one for the continuous and one for the discontinuous...
... features the - The lexicographer can search for a word form, for word forms beginning or ending with a specified string of graphemes or for word forms containing a specified string of graphemes ... source for each information item has to be retrievable to assist the lexicographer in the evulation. The dictionary bank will be a valuable tool not only for the lexicographer but also for ... For all word forms, REFER will provide information on the relative and absolute frequency and the distribution over the texts of the corpus. - The lexicographer hat a choice of options for...
... representations by distinguishing lowerbound performance (random choice of a parse)ADJ165A Cascaded Finite-State Parser for German Michael SchiehlenInstitute for Computational Linguistics, University ... morethan 50% of the dependency structure correct. Iam grateful to Helmut Schmid for discussion andto the reviewers for hints on literature.Thorsten Brants. 1999. Cascaded Markov Models. In Pro-ceedings ... system for retrievalof captioned images. Journal of Natural Language Engi-neering, 7(2):117-142.Sandra Ktibler and Heike Telljohann. 2002. Towards aDependency-Oriented Evaluation for Partial...
... in Context- Free Grammar 3.1 Manually Constructed Context- Free Grammar for Myanmar Syllable Structure Context free (CF) grammar refers to the grammar rules of languages which are formulated ... tree bank which contains evidence for rule expansions for syllable structure and such a resource does not yet exist for Myanmar. And also, the time and cost for constructing a corpus by ourselves ... Such production will be expanded for 33 consonants. X A # Such production will be expanded for 11 medials. X B # Such production will be expanded for 12 vowels. XC D X...
... and [Moortgat, 1987] for a discussion on this matter. 4For more principled approaches see [Hoeksema, 1984; Moortgat, 1987] 185 A Probabilistic Context- free Grammar for Disambiguation in ... always context- free [Magerman and a 2For reasons I will not go into here, the newspaper and dictionary words did not comprise highly frequent words [Nunn and van Heuven, 1993]. 13See for a ... done on context- free probabilistic grammars is done for syntax, and as I hope to have shown that a PCFG yields good results for morphology, it might be interesting to find out if, for one...
... in the Fig.1 This signal has the following form [5]: Vsigsin(ωrt + θsig) where Vsig is an amplitude of signal. The reference signal is of form: VLsin(ωLt + θref) The amplified ... device in the laboratories. Its functionality is adaptible for very low level signal. The design characteristics can be easily modified for various kinds of experiments. The data processing is ... 242 signal is multiplying with the reference signal of the form sin(ωrt + θr +π/2). The outputs from these circuits are put forward to the low pass filter. This filter eliminates the AC...