... (i) occur after in the in the validation set, (ii) did not occur after in the in the training set, and (iii) occurred at least 10 times in the training set Because their training set unigram ... mechanisms (in addition to history length and classes) and incorporate them into the model, e.g., similarity and topic Finally, training classes on unique events is an extreme way of highly weighting ... 8:1:1 into training, validation and test sets The training set contains • Unique-event bigram clustering The cluster corpus is the set of all sequences of two bigrams ∈ B2 that occur in the training...