Variable Precision Reaching Definitions Analysis


During reverse engineering and reengineering of large legacy systems, reaching definitions computation is an important step, from which successive analyses, such as slicing and impact analysis, can produce useful views of the code linkages for the programmer. The involved activities are interactive, thus program analysis tools may be asked for fast answers by the maintainer. Therefore the control on the trade-off between accuracy and efficiency should be given to the user. Furthermore, real world programs (specially in languages like C) make much use of pointers, so that an efficient points-to analysis should be integrated with the data dependences computation.

This paper proposes three different approaches to interprocedural reaching definitions analysis based on different levels of increasing precision, dependently on the sensitivity to calling context and control flow. A lower precision degree produces an overestimate of the reaching definitions in a program. The result is conservative (all dependences that hold are definitely reported), and faster than their more accurate counterparts. To be applicable to real programs written in modern programming languages, these analyses need to efficiently handle pointers, and to integrate pointers analysis with reaching definitions computation. Results on a test suite show that an almost 2000:1 reduction factor can be gained in execution time by the less accurate analysis, but an average 41 \% extra dependences are added. The intermediate variant is much more precise than the less accurate one (2 \% extra dependences), and remains more than 30 times faster than the most accurate analysis. Therefore while on medium size systems the intermediate variant could be a good compromise, on large systems the less accurate variant becomes extremely valuable, being the only feasible.