... Ohio, USA, June 2008.
c
2008 Association for Computational Linguistics
Randomized Language Models via Perfect Hash Functions
David Talbot
∗
School of Informatics
University of Edinburgh
2 Buccleuch ... on perfect hash functions. This
scheme can be used to encode any standard n-gram
model which may first be processed using any con-
ventional model reduction technique.
3 Perfect...
... or even trillions of English words,
huge language models are built in a distributed man-
ner (Zhang et al., 2006; Brants et al., 2007). Such
language models yield better translation results but
at ... explore a dependency language
model to improve translation quality. To some ex-
tent, these syntactically-informed language models
are consistent with syntax-based translation mod...
... Substitu-
tion Grammars as a source of features for na-
tive language detection, the task of inferring
an author’s native language from text in a dif-
ferent language. We compare two state of the
art methods ... and achieves a new best
result at the task of native language detection.
2 Related Work
2.1 Native Language Detection
Work in automatic native language detection has
been m...
... and evaluate hidden understanding models, a
statistical learning approach to natural language
understanding. Given a string of words, hidden
understanding models determine the most likely meaning ... significantly more
internal structure than specialized sublanguage models.
This can be seen in the example in figure 4. The specialized
sublanguage representation requires only seven...
... showing that our model outper-
forms competitive baselines and other neural
language models.
1
1 Introduction
Vector-space models (VSM) represent word mean-
ings with vectors that capture semantic ... inputs to learning
algorithms and as extra word features in NLP
systems. However, most of these models are
built with only local context and one represen-
tation per word. This is proble...
... between many resource-poor languages
and resource-rich languages are ample, motivat-
ing recent interest in transferring linguistic re-
sources from one language to another via parallel
text. For ... other languages,
there are large, well-annotated corpora with a vari-
ety of linguistic information ranging from named
entity to discourse structure. Unfortunately, for
the vast majority of l...
... to accomplish by
employing source -language resources is to rank
the alternative generated texts. This goal can be
achieved by using context -models on the source
language prior to translation. This ... corpus.
Target -language scores On the target side we
used either a standard 3-gram language- model, de-
noted LMT, or the score assigned by the com-
plete SMT log-linear model, which in...
...
accuracy for cluster-based and nearest neigh-
bors distributional models of unseen events.
1
Introduction
In many statistical language- processing prob-
lems, it is necessary to estimate the ... may be different for other tasks.
2 Two models
We now survey the distributional clustering
(section 2.1) and nearest-neighbors averaging
(section 2.2) models. Section 2.3 examines the...
... RAISINS, SULTANAS~ AND CURRANTS: LEXICAL
CLASSIFICATION AND ABSTRACTION VIA CONTEXT PRIMING
David J. Hutches
Department of Computer Science and Engineering, Mail Code ... were sin-
gle lexical items.
I. Motivation and Background
With respect to the processing of language,
one of the tasks at which human beings seem rel-
atively adept is the ability to determine ... whereby the sear...
...
natural of the natural -language modes. ~ence, a
fascination exists with machines thac respond to
spoken commands with synthetic speech responses to
create a natural -language interactive discourse. ... preceding examples are very restricted in
terms of the language used for the interaction with
machines. The problem with unrestricted natural
language for cor-unicacion with machin...