Recovering Traceability Links in Multilingual Web Sites
Abstract
In this paper the problem of verifying the consistency between Web site
portions devoted to different languages will be investigated. The purpose
is to support the activity of the site maintainer, who is responsible for
the alignment between different site versions. Anomalies that typically
occur in such situations include the absence of pages in some languages,
differences in the page structure in different languages, missing
information and parts not translated.
The approach we propose to recover traceability links so as to simplify the
update of the site to a consistent state is based on a mix of structural
and textual information extracted from the page. The syntax trees of the
pages to be compared drive the page matching process. When structurally
corresponding nodes are encountered during the tree visit, their text
attributes are considered to see if they are each other's translation.
Postscript version of the paper.