... occurrence in the training data.There are, of course, possibilities for furtherimprovements. One possibility could be to com-bine the tokens in the headlines and keywords in the same way as the full-text ... that of the key-words. The outline of the paper is as follows: in Section2, we present the algorithm used to automaticallyextract the keywords. In Section 3, we present the corpus, the learning ... different ways of rep-resenting keywords, either as unigrams or intact. In addition, we investigate the effect of using the headlines — represented as unigrams — as input,537to compare their performance...