... that the value of is adap-tively chosen according to the reduction of ExpLoss during training. The algorithm starts with a large initial , and then at each forward step the value of ... aggre-gates the decisions of local linear models via a dynamic program. In the CMM, the local linear models are trained independently, while in the CRF model, the local models are trained jointly. ... Following pre-vious work (Ratnaparkhi, 1996), we assume that the tag of a word is independent of the tags of all pre-ceding words given the tags of the previous two words (i.e., =2 in the...