MLHTML: Multi Lingual XHTML



  Quick reference:

      MLHTML is a simple extension of XHTML which supports the creation of multi lingual Web pages with information in all languages inserted into a single source file. MLHTML documents are XML compliant, so that it is possible to handle them with any content management tool able to process XML documents. XML parsers running on the server side can also be used to generate HTML from MLHTML dynamically (for example, by means of the XML parser available with the PHP language libraries). Finally, XML enabled browsers can accept MLHTML directly (accompanied by a proper XSL file).

MLHTML adds just one new tag:

<ml lang="L">
to XHTML. The tag includes the mandatory indication of a language identifier, L, and is matched by the associated closing tag:
</ml>
The text between opening and closing tags is assumed to be in language L. For example, a multi lingual page providing information in English (en) and Italian (it) may include the following lines, mixed with normal XHTML code:
<title>
<ml lang="en"> Publication List </ml>
<ml lang="it"> Lista delle pubblicazioni </ml>
</title>

For the specification of multi-lingual references to objects (other HTML pages, images, etc.), the following patterns can be used:

<ml lang="en"> <img align="top" src="books-en.gif"/> </ml>
<ml lang="it"> <img align="top" src="books-it.gif"/> </ml>
<ml lang="en"> <a href = "ref-page.ml?lang=en"> English description </a> </ml>
<ml lang="it"> <a href = "ref-page.ml?lang=it"> Descrizione in Italiano </a> </ml> 
where in the latter case another MLHTML page (with extension ml) is referenced and the language is selected through the parameter lang.


  Migration to MLHTML:

       Given an existing site in which different pages have been written for the different supported languages, it is possible to restructure it into MLHTML, provided that pages in different languages are aligned. This work is performed by the tool PageMerger. The result of tool execution is an MLHTML page ready for publication on the Web. Some minor edits may be required for the multilingual hyperlinks.

The tool PageMerger is distributed both as Java source code (PageMerger.java) and as compiled byte code (PageMerger.class).

An example of its execution is the following:

      java PageMerger page.en.html en page.it.html it > page.ml

Since the input HTML files are required to be compliant with XHTML, so that legal XML code can be generated from them, it may be necessary to preliminarly convert HTML files which are not already in XHTML. A tool that can be used for such a conversion is Tidy :

       tidy -asxml file.it.html > file.it.xhtml

Tidy can also be used to pretty-print the output of page merging:

       tidy -xml file.ml > newFile.ml





  Page generation from MLHTML:

       Once a multi lingual Web page p.ml has been coded in MLHTML, HTML pages in any requested language can be produced in three different ways:

For the offline generation of static HTML in all supported languages, it is possible to use any of the available XSL processors (for example, Saxon or XT), providing them with the XSL stylesheet ml.xsl for language selection. With Saxon, the command to issue is:

java  com.icl.saxon.StyleSheet p.ml ml.xsl lang=it > p.it.html
java  com.icl.saxon.StyleSheet p.ml ml.xsl lang=en > p.en.html

For the dynamic generation of the HTML pages by the Web server, the PHP script ml.php can be run on p.ml. With the Apache Web server, it is possible to declare that every file with extension ml has to be preprocessed by ml.php, by inserting the following two lines into the file .htaccess:

AddType application/x-httpd-parse .ml
Action application/x-httpd-parse "/ml.php?file="

Finally, XML enabled browsers can directly process p.ml (possibly renamed p.xml), accompanied by the stylesheet ml.xsl.




  Try MLHTML!

Here is an example of MLHTML page (books.ml) processed by ml.php:

Request English version of books.ml :

Request Italian version of books.ml :

To see the source code of this page, follow this link.




  Distribution:

The MLHTML software is distributed under the GNU General Public License.

To submit any bug or change, send an e-mail to: tonella@fbk.eu
Thanks very much!







Maintainer:  Paolo Tonella
Mail:        tonella@fbk.eu