... (1) The parameters of these models are the distortionprobabilities d(aj| aj−1, j) and the translation prob-abilities t( fj| eaj). The three models differ in theirestimation of d, but the ... concern ushere. All three models, as well as IBM Models 3–5,share the same t. For further details of these models, the reader is referred to the original papers describ-ing them (Brown et al., 1993; ... 1996).Let θ stand for all the parameters of the model. The standard training procedure is to find the param-eter values that maximize the likelihood, or, equiv-alently, minimize the negative...