Filippo Ricca, Paolo Tonella, Emanuele Pianta and Christian Girardi,

Experimental Results on the Alignment of Multilingual Web Sites


Abstract

Institutions and companies that are based in countries where the main language is not English typically publish Web sites that offer the same information at least in the local language and in English. However, the evolution of these Web sites may be troublesome, if the same pages are replicated for all supported languages. In fact, changes have to be propagated to all translations of a modified page.

Algorithms that help ensure the consistency of multilingual Web pages exploit Natural Language Processing (NLP) methods for the comparison of the content in the pages to be aligned. Since such methods are quite expensive from the point of view of the involved linguistic resources as well as of the computation time, a trade off should be considered between the benefits of more advanced techniques and the costs of their implementation. In this paper, an empirical evaluation is conducted to establish the proper NLP methods, combined with structural comparison methods, to use in Web page alignment.

Postscript version of the paper.