... obtain automatic word classificationsfor large vocabularies (>1 million words) us-ing such large training corpora (>30 billion to-kens). The resulting clusterings are then used in training ... class-based language model as well as a word-based modelas separate feature functions in the log-linear com-bination in Eq. (11). The weights are trained usingminimum error rate training (Och, ... com-bined with word-based language models (Martin etal., 1998; Whittaker and Woodland, 2001). However, in the area of statistical machine translation, espe-cially in the context of large training...