... advantage of recent work in transducer
induction, we have chosen to represent rules as
subse-
quential finitestate transducers.
Subsequential finite
state transducers are a subtype offinitestate ... may, seemingly at random, insert or delete se-
quences of four or five phonemes, something which is
10
Automatic InductionofFiniteStateTransducers for Simple Phonological Rules
Daniel Gildea ... destination state
After the process of merging states terminates, a deci-
sion tree is induced at each state to classify the outgoing
arcs. Figure 9 shows a tree induced at the initial stateof
the...
... the current state is a final state, we go
back to the start state with the remaining string as
the input.
88
5.1.1 Results
The performance of this system measured in terms
of the number of times ... character
of first word do
5: S = next state from the start state on
encountering X;
6: Y = first character of the result of the
rule;
7: transition T = current state, S, Y, rule;
8: Add T into the ... last character of the
result of the rule.
2: for each transition in the FST transition table
do
3: if next state is a final state then
4: for all rules where I is the last character
of first word...
... in natural language engineering.
2 Transducers and Parameters
Finite- state machines, including finite -state au-
tomata (FSAs) and transducers (FSTs), are a kind
of labeled directed multigraph. For ... Probabilistic Finite- State Transducers
∗
Jason Eisner
Department of Computer Science
Johns Hopkins University
Baltimore, MD, USA 21218-2691
jason@cs.jhu.edu
Abstract
Weighted finite -state transducers ... models with finite state
supervision. In A. Kornai, ed., Extended Finite State
Models of Language. Cambridge University Press.
Emmanuel Roche and Yves Schabes, editors. 1997.
Finite- State Language...
... who read the book slept
These heuristics consist of morphological infor-
mation like existence of a “PRESPART” morpheme
in (8), and part -of- speech of the word. However,
there is still a problem ... relation, such as the use of comma in coordi-
nation. The label “Sentence” links the head of the
sentence to the punctuation mark or a conjunct in
case of coordination. So the head of the sentence
is ... in the lexicon induction process to avoid wrong
predicate argument structures (Section 3.5).
3 Algorithm
The lexicon induction procedure is recursive on the
arguments of the head of the main clause....
... languages, and
which is the focus of much of this paper. The stem
of a Semitic verb consists of a root, essentially
a sequence of consonants, and a pattern, a sort
of template which inserts other ... pattern
Now consider the source of most of the complex-
ity of the Tigrinya verb, the stem. The stem may
be thought of as conveying three types of infor-
mation: lexical (the root of the verb), derivational,
and ... implementation
of Tigrinya verb morphology is described.
1 Introduction
1.1 Finitestate morphology
Morphological analysis is the segmentation of
words into their component morphemes and the
assignment of...
... composed of features
like POS, the number of complements, category
of each complement, and the position of comple-
ments. In their view, structural disambiguation
is simply another type of lexical ... during the reading of relative
clause sentences. Journal of Verbal Learning
and Verbal Behavior, 20:417–430, 1981.
A. K. Joshi and B. Srinivas. Disambiguation of
super parts of speech (or supertags): ... issues, University of Geneva
University of Toronto:109–135, 2002.
J. King and M. A. Just. Individual differences in
syntactic processing: The role of working mem-
ory. Journal of Memory and Language,...
... incorporation
of MT models and ASR models using
finite -state automata. We also propose
some transducers based on MT models for
rescoring the ASR word graphs.
1 Introduction
A desired feature of computer-assisted ... lexicon-based
transducer. But instead of a target word on each
arc, we have the target part of a phrase. The weight
of each arc is the negative logarithm of the phrase
translation probability.
This ... model of
the ASR system can be characterized as follows:
• recognition vocabulary of 16716 words;
• 3 -state- HMM topology with skip;
• 2500 decision tree based generalized within-
word triphone states...
... ed-
it distance from state 0 to state j, and the cost(i,j) is
the cost of insertion, deletion or substitution from s-
tate j to state i. The equation means the minED of
state i can be computed ... the cost of substi-
tutions is less than that of insertions and deletion-
s. Here, we assume that the cost of substitutions is
based on the similarity of the two words. Then with
the help of different ... scoring Automatic
Speech Recognition (ASR) transcription
as they are error sensitive and unsuitable
for the characteristic of ASR transcription.
Therefore, we introduce a framework of
Finite State...
... prob-
lem ofautomatic word sense induction. Proceedings
of ACL (Companion Volume), Barcelona, 195-198.
Schütze, Hinrich (1993). Part -of- speech induction from
scratch. Proceedings of ACL, Columbus, ...
found that for word sense induction the local clus-
tering of local vectors is more appropriate than the
global clustering of global vectors, for part -of-
speech induction our conclusion is ... Computed parts of speech for each word.
5 Summary and Conclusions
This work was inspired by previous work on word
sense induction. The results indicate that part of
speech induction is possible...
... consist of a se-
quence of a start state, reading states, a crossover
state, prefinal states, and a final state. The excep-
tion to this is a path accepting the empty string,
which has a start state, ... the same states,
start state and final states. Its
configurations
are
triples Is, a, w) of a state, a stack and an input
string. The stack is a sequence of pairs / s, X) of a
state and ... e-transition, there
is a sequence of G-transitions leading to the final
state [$' * S.]. Hence ~" has the following kinds of
states: the start state, the final state, states with
terminal transitions...
... node v of a
derivation tree is a finite set F of pairs of at-
tribute names and their values. F is called the f-
structure
of v. An lfg G consists of a cfg Go called
the
underlying cfg
of ... where (1) Q is a finite set of
states,
(2) ~ is an
input ranked alphabet,
(3) A is
an
output alphabet,
(4) q0 E Q is the
initial state,
and (5) R is a finite set of
rules
of the form
q[c~(xl, ... a finite set of
at-
tributes,
and (6) A~tm is a finite set of
atoms.
An equation of the form T
atr =~ (atr •
Nat,) is called an
S (structure synthesizing)
schema,
and an equation of...
...
The derivation of E2* manifests first-degree
center embedding of the category S*, as a result
of the treatment of S as both a prefix and a suf-
fix in G2*. However, no derivation of an affixed ...
pretation of the expression as a whole.
This completes our demonstration of the abil-
ity of affixed strings to represent the structural
descriptions of the acceptable sentences of a na-
tural ... North-Holland.
Jackendoff, Ray S. (1977) X-Bar Syntax. Cam-
bridge, Mass.: MIT Press.
Langendoen, D. Terence (1975) Finite- state par-
sing of phrase-structure languages and the
status of readjustment...