Restructuring Multilingual Web Sites

Abstract

Current practice of Web site development does not address explicitly the problems related to multilingual sites. The same information, as well as the same navigation paths, page formatting and organization, are expected to be provided by the site independently from the chosen language. This is typically ensured by adopting personal conventions on the way pages are named and on their location in the file system. Updates are then performed manually and consistency depends on the ability of the programmers not to miss any impact of the change.

In this paper an extension to XHTML, called MLHTML (MultiLingual XHTML), is proposed as the target representation of a restructuring process aimed at producing a maintainable and consistent multilingual Web site. MLHTML centralizes the language dependent variants of a page in a single representation, where shared parts are not duplicated. Existing sites can be migrated to MLHTML by means of the algorithms described in this paper. After classifying the pages according to their language, a page alignment technique is exploited to identify corresponding pages and to eliminate inconsistencies. Transformation into MLHTML can then be achieved automatically.

Postscript version of the paper.