... web mining
scheme for parallel data acquisition.
Based on the Document Object Model
(DOM) , a web page is represented as a
DOM tree. Then a DOM tree alignment
model is proposed to identify the ... DOM and
Our Document Tree
Despite these minor differences, our document
tree is still referred as DOM tree throughout this
paper.
491
4.1 DOM Tree A...
... used, the SDB model gives a probability to the
span s covered by the rule, which estimates the
extent to which the span is bracketable. For the
unary SDB model, we only consider the features
from ... cross
boundaries of the lower VP on the right,
therefore CBMF is “VP-RC”.
3.3 The Integration of the SDB Model into
Phrase-Based SMT
We integrate the SDB model...
... by (Hale et al., 2006) works
because the information about the transition to an er-
ror state is propagated up the tree, in the form of the
-UNF tags. As the parsing chart is filled in bottom
up, ... repair rule set, and so at the top of the tree
the EDITED hypothesis is much more likely. How-
ever, this requires that several fluent speech rules
from the data set be m...
...
This is the basic function of the channel model
for the phrase-based SMS normalization model,
where we used the maximum approximation for
the sum over all segmentations. Then we further
decompose ... able to model the three transfor-
mations through the normalization pair
(, )
k
ka
s
e
,
36
with the mapping probability . The fol-
lowings show the scenari...
... whether the proposed GVSM can aid
the VSM performance, we executed the GVSM
in the same retrieved documents. The interpo-
lated precision-recall values in the 11-standard re-
call points for these ... VSM, for the first 4 recall points.
For TRECs 4 and 6 we have done the same for the
first 9 and 8 recall points respectively.
As shown in figure 3, the proposed GVSM may...
... morphological information in the
target. These relations are best captured in a target-
side model because they are mostly unobserved (from
lexical clues) in the English source.
The agreement model scores ... available; the MT
system has yet to generate the rest of the translation
when the tagging features for a position are scored.
Therefore, we only define emission featu...
... standard. For the
whole task, both the boundaries and the POS tag
have to be correctly identified.
4.2 Performance of the Coarse-grained Solvers
Table 3 shows the performance on the development
data ... “C:±3 T:±1” model performs the same as the
“C:±3 T:±2” model. However, the sub-word clas-
sification accuracy of the “C:±3 T:±1” model is
higher, so in the followi...
... in two trees, the forward tree and
the backward tree. Branches correspond to letters,
and nodes are annotated with the total corpus fre-
quency of the letter sequence from the root of the
tree ... The second step
is the affix acquisition step, during which a set of
morphemes is identified from the corpus data. The
third step uses these morphemes to segment words.
3...
... some distortion
model;
3. translate each of the ¯e
i
into French phrases ac-
cording to a model P(
¯
f | ¯e) estimated from the
training data.
Other phrase-based models model the joint distribu-
tion ... It translates the
above example almost exactly as we have shown, the
only error being that it omits the word ‘that’ from (6)
and therefore (8).
These hierarchical phra...
... omitted by the treebank. Also
shown is F
1
for the induced PCFG. The PCFG shows higher
accuracy on small spans, while the CCM is more even.
at random from the set of binary trees.
4
This is
the unsupervised ... example is the experiments from (Carroll and
Charniak, 1992). They restricted the space of gram-
mars to those isomorphic to a dependency grammar
over the POS s...