... language or manually annotating a body of text to be used as a training set for a statistical engine or machine learning. In this paper, we focus on using the multilingual Wikipedia (wikipedia.org) ... which contains little content but typically lists a number of entries which might be what the user was seeking. For instance, the page “Franklin” contains 70 links, including the singer “Aretha ... prepare the text by removing exter-nal links, links to images, category and interlingual links, as well as some formatting. The main pro-cessing of each article takes place in several stages, whose...