... self -learning (Celeux and Govaert 1992; Yarowsky 1995), co-training (Blum and Mitchell 1998), information-theoretic regularization (Corduneanu and Jaakkola2006; Grandvalet and Bengio 2004), and ... sequences to be labeled, and be a random variable over corresponding label sequences. Allcomponents, , of are assumed to range overa finite label alphabet . For example, mightrange over sentences andover ... straightfor-ward, since each feature is an indicator, we have211that , and therefore the diag-onal terms in the conditional covariance are justlinear feature expectationsas before. For the...