... extracted several types of features from eachdocument. First, we applied a simple preprocess-ing to the HTML documents in the corpus, con-verting them to plain text and tokenizing. Then,we extracted ... while avoiding con-fusion with other people sharing the same name.But the utility of clustering is more obvious for re-call oriented queries, where the goal is to mine the web for information ... person. In a typicalhiring process, for instance, candidates are eval-uated not only according to their cv, but also ac-cording to their web profile, i.e. information aboutthem available in the...