... In an effort to estimate the amount of bilingual data on the web, (Ma and Liberman 1999) surveyed web pages in the de (German web site) domain, showing that of 150,000 web- sites in the .de ... improve sentence alignment performance on the web data, the similarity of the HTML tag struc-tures between the parallel web documents should be leveraged properly in the sentence alignment model. ... that, using the new web mining scheme, the web mining throughput is increased by 32%; (ii) The quality of the mined data is improved. By lever-aging the web pages’ HTML structures, the sen-tence...