... 2010).
One of the most fundamental parts of the linguis-
tic pipeline is part- of- speech (POS) tagging, a basic
form of syntactic analysis which has countless appli-
cations in NLP. Most POS taggers ... to test the efficacy of
this feature set for part- of- speech tagging given lim-
ited training data. We randomly divided the set of
1,827 annotated tweets into a training set of 1,000
(14,542 tokens), ... standard parts of speech
3
(noun,
verb, etc.) as well as categories for token varieties
seen mainly in social media: URLs and email ad-
dresses; emoticons; Twitter hashtags, of the form
#tagname,...
... membership
of the parts ofspeech within such blocks
reflects the content load of the blocks, on
the basis that open class parts of speech
are more content-bearing than closed class
parts of speech. ... Association for Computational Linguistics
Examining the Content Load ofPartofSpeech Blocks for Information
Retrieval
Christina Lioma
Department of Computing Science
University of Glasgow
17 ... U.K.
xristina@dcs.gla.ac.uk
Iadh Ounis
Department of Computing Science
University of Glasgow
17 Lilybank Gardens
Scotland, U.K.
ounis@dcs.gla.ac.uk
Abstract
We investigate the connection between
part ofspeech (POS) distribution...
... achieving accuracy
of 97.98%, which is a significant improve-
ment over the state -of- the-art for Bulgarian.
1 Introduction
Part- of- speech (POS) tagging is the task of as-
signing each of the words in ... larger
inventory of POS tags, e.g., the Penn Treebank
(Marcus et al., 1993) uses 48 tags: 36 for part-
of- speech, and 12 for punctuation and currency
symbols. This increase in the number of tags
is partially ... four major types of ambiguity:
1. Between the wordforms of the same lexeme,
i.e., in the paradigm. For example, ,
an inflected form of (‘sofa’, mascu-
line), can mean (a) ‘the sofa’ (definite, singu-
lar,...
... values of a large number of (or-
thogonal) features, such as basic part- of- speech (i.e.,
noun, verb, and so on), voice, gender, number, infor-
mation about the clitics, and so on.
2
For Arabic, ... the best-
performing morphological taggerfor Arabic.
2 General Approach
Arabic words are often ambiguous in their morpho-
logical analysis. This is due to Arabic s rich system
of affixation and ... (including part- of- speech tagging) are the
same operation, which consists of three phases.
First, we obtain from our morphological analyzer a
list of all possible analyses for the words of a given
sentence....
... Association for Computational Linguistics
Efficient Optimization of an MDL-Inspired Objective Function for
Unsupervised Part- of- Speech Tagging
Ashish Vaswani
1
Adam Pauls
2
David Chiang
1
1
Information ... second-order partial derivatives are
all zero, as are those of the equality con-
straints.
We perform this optimization for each instance
of (15). These optimizations could easily be per-
formed in ... HMM POS-taggers (when given a
good start). In Proceedings of the ACL.
S. Goldwater and T. L. Griffiths. 2007. A fully
Bayesian approach to unsupervised part- of- speech
tagging. In Proceedings of the...
... w
i
of a supervised part-
of- speech tagger, in our case SVMTool
1
(Gimenez
and Marquez, 2004) trained on Sect. 0–18, and x
2
i
is a prediction on w
i
from an unsupervised part- of-
speech tagger ... C
′
from the new data
set which is a mixture of labeled and unlabeled data
points. See Figure 4 for details.
3 Part- of- speech tagging
Our part- of- speech tagging data set is the standard
data ... semi-
supervised part- of- speech tagging and present
the best published result on the Wall Street
Journal data set.
1 Introduction
Labeled data for natural language processing tasks
such as part- of- speech...
... and
Part- of- Speech Tagging
Wenbin Jiang
†
Liang Huang
‡
Qun Liu
†
Yajuan L
¨
u
†
†
Key Lab. of Intelligent Information Processing
‡
Department of Computer & Information Science
Institute of ... segmentation and
part- of- speech tagging. On the Penn Chinese
Treebank 5.0, we obtain an error reduction of
18.5% on segmentation and 12% on joint seg-
mentation and part- of- speech tagging over ... POS
to the tail of a boundary tag as a postfix following
Ng and Low (2004). As each tag is now composed
of a boundary part and a POS part, the joint S&T
problem is transformed to a uniform boundary-POS
labelling...
... 125-131.
H. Lim, J. Kim, and H. Rim. 1996. "A Korean
Transformation-based Part- of- SpeechTagger
with Lexical information of mistagged Eo-
jeol".
Korea-China Joint Symposium on Ori- ...
"A HMM Part- of- SpeechTaggerfor Korean
with wordphrasal Relations".
In Proceedings
of Recent Advances in Natural Language Pro-
cessing.
1019
editor
Figure 2: The Structure of Proposed ...
M.S. Thesis, McGill University,
School of Computer Science.
G. Lee and J. Lee. 1996. "Rule-based error cor-
rection forstatistical part- of- speech tagging".
Korea-China Joint Symposium...
... each tag consists of a
letter code for the general classification (i.e.
noun, verb, etc.) of the word, and another for the
sub-classification according to the particular con-
text. For example, when ... Fluidity in Chinese and its Implications for
Part- of- speech Tagging
OiYeeKwong
Benjamin K. Tsou
Language Information Sciences Research Centre
City University of Hong Kong, Kowloon, Hong Kong
{rlolivia, ... Applications. In
Proceedings of the ICCLC
International Conference on Chinese Language Comput-
ing,
Chicago, pages 233-238.
Xia, F. 2000.
The Part- Of- Speech Tagging Guidelines for
the Penn Chinese...
... Association for Computational
Linguistics.
Alexander Clark. 2003. Combining distributional and
morphological information forpartofspeech induc-
tion. In Proceedings of the tenth Annual Meeting of the
European ... systems.
The HMM ignores orthographic information,
which is often highly indicative of a word’s part-
of- speech, particularly so in morphologically rich
languages. For this reason Clark (2003) extended
Brown ... USA. Association for Computational
Linguistics.
Sujith Ravi and Kevin Knight. 2009. Minimized models
for unsupervised part- of- speech tagging. In Proceed-
ings of the Joint Conferenceof the 47th Annual...
... 2011.
c
2011 Association for Computational Linguistics
A Stacked Sub-Word Model for Joint Chinese Word Segmentation and
Part- of- Speech Tagging
Weiwei Sun
Department of Computational Linguistics, ... the lack of morphology that often
provides important clues for POS tagging, and the
POS tags contain much syntactic information, which
need context information within a large window for
disambiguation. ... s
k
= {c[i : j]} denote the
set of all segments of a partition. Given multiple
partitions of a character sequence S = {s
k
}, there
is one and only one merged partition s
S
= {c[i : j]}
s.t.
1....
... pipelined
approach, which predicts part- of- speech
tags before lemmatization.
1 Introduction
The traditional problem of morphological analysis
is, given a word form, to predict the set of all of
its possible morphological ... top lemmas for word w
i
given tag t. An
assignment of a tag-set and lemmas to a word w
i
consists of a choice of a tag-set, ts
i
(one of the
possible k tag-sets for the word) and, for each tag
t ... part- of- speech tag by
appending it to each feature, thus the context fea-
ture es → e may become es → e, VBZ. To en-
able communication between the various parts -of-
speech, a universal set of...
... AFNLP
Minimized Models for Unsupervised Part- of- Speech Tagging
Sujith Ravi and Kevin Knight
University of Southern California
Information Sciences Institute
Marina del Rey, California 90292
{sravi,knight}@isi.edu
Abstract
We ... new methods for un-
supervised part- of- speech tagging. We adopt the
problem formulation of Merialdo (1994), in which
we are given a raw word sequence and a dictio-
nary of legal tags for each word ... In
Proceedings of the ACL.
K. Toutanova and M. Johnson. 2008. A Bayesian
LDA-based model for semi-supervised part- of-
speech tagging. In Proceedings of the Advances in
Neural Information Processing...