... aggre-gates the decisions of local linear models via a dynamic program. In the CMM, the local linear models are trained independently, while in the CRF model, the local models are trained jointly. ... training examples, while Andrew (2006) uses only features that occur in some positive training example. Second, we used the last 4K sentences of the training data to select the weight of the ... Following pre-vious work (Ratnaparkhi, 1996), we assume that the tag of a word is independent of the tags of all pre-ceding words given the tags of the previous two words (i.e., =2 in the...