Reverse Engineering 4.7 Million Lines of Code

Abstract

The ITC-Irst Reverse Engineering group was charged with analyzing a software application of approximately 4.7 million lines of C code. It was an old legacy system, maintained for a long time, on which several successive adaptive and corrective maintenance interventions had led to the degradation of the original structure. The company decided to re-engineer the software instead of replacing it because the complexity and costs of re-implementing the application from scratch could not be afforded, and the associated risk could not be run. Several problems were encountered during re-engineering, including identifying dependencies and detecting redundant functions that were not used anymore.

In order to accomplish these goals, we adopted a conservative approach. Before performing any kind of analysis on the whole code, we carefully evaluated the expected costs. To this aim a small but representative sample of modules was preliminarly analyzed and the costs and outcomes were extrapolated so as to obtain some indications on the analysis of the whole system. When the results of the sample modules revealed useful as well as affordable for the entire system, the involved resources were carefully distributed among the different reverse engineering tasks to meet the customer's deadline.

This paper summarizes that experience, discussing how we approached the problem, the way we managed the limited resources available in order to complete the task within the assigned deadlines, and the lessons we learned.