... documents and find that the fea-ture vectors corresponding to some of these docu-ments (particularly the short ones) have all zeroes in them. In other words, none of the bigrams from the training set ... normalization.7FollowingPang et al. (2002), we use frequency as presence. In other words, the ith element of the documentvector is 1 if the corresponding unigram is present in the document and 0 otherwise. The ... available.612Finally, previous work has also investigated fea-tures that do not fall into any of the above cate-gories. For instance, instead of representing the polarity of a term using a binary value,...