P. Tonella, F. Ricca, E. Pianta, C. Girardi,
G. Di Lucca, A. R. Fasolino, P. Tramontana,
Evaluation Methods for Web Application Clustering
Abstract
Clustering of the entities composing a Web application (static and dynamic
pages) can be used to support program understanding. However,
several alternative options are available when a clustering technique is
designed for Web applications. The entities to be clustered can be described
in different ways (e.g., by their structure, by their connectivity, or by
their content), different similarity measures are possible, and alternative
procedures can be used to form the clusters. The problem is how to evaluate
the competing clustering techniques, in order to select the best for
program understanding purposes.
In this paper, two methods for clustering evaluation are considered, the
gold standard and the task oriented approach. The advantages and
disadvantages of both of them are analyzed in detail.
Definition of a gold standard (reference clustering) is difficult and prone
to subjectivity. On the other side, an evaluation based on the level of
support given to task execution is expensive and requires careful
experimental design. Guidelines and examples are provided for the
implementation of both methods.
Postscript version of the paper.