... clusters of size >=3son dataset in the WePS corpus contains a total of 100 documents, and 10 of them belong to a Britishpolitician named James Patterson. The WePS-2corpus contains a total of ... (affiliation, e-mail, n-grams of various sizes, etc.) we consider all candidatesfound in at least one document from the cluster, and pick up the one that leads to the best harmonicmean (Fα=.5) of ... remove tokens and n-grams beginning or ending with a stopword. Usingthe Stanford Named Entity Recognition Tool3weobtained the lists of persons, locations and organi-zations mentioned in each...