Reverse Engineering Legacy Applications
September 25, 2012 § 1 Comment
This article is in the Book Review chapter. Reviews are intended to provide you with information on books – both paid and free – that others consider useful and of value to developers. Read a good programming book? Write a review!
This review was first published by the author in the Sept 2012 issue of Software Developer’s Journal.
Object Oriented Reengineering Patterns, written by Serge Demeyer, Stéphane Ducasse, Oscar Nierstrasz is now out of print but is available free for download from http://scg.unibe.ch/download/oorp/. This book covers a topic that, as Martin Fowler points out in the Forward, software development from a clean slate is “…not the most common situation that people write code in.” Rather: “Most people have to make changes to an existing code base, even if it’s their own.”
This article is a review of the book, which I found contains gems of wisdom, and I have also taken the liberty to embellish the ideas presented in the book from my own personal experiences and to add what I feel is not given sufficient coverage.
The book is divided into four sections:
- Reverse Engineering
Each section has several subsections in which the reader will discover a topic called “Forces.” This is a very useful narrative of the factors that need to be considered within that section of engineering patterns. For example, forces might involve different stakeholder agendas, risk analysis, prioritization, cost, etc.
The “patterns” presented in the book are not what is typically encountered as coding patterns, such as “Abstract Factory”, “Singleton”, etc. Rather, the patterns identified are common problems. The majority of the book consists of:
- A problem statement
- A solution
- Known Uses
- Related Patterns / What’s Next
The consistency of this approach allows the reader to easily scan the book to fit his/her particular set of problems and focus on the suggested solutions, as well as engage in a discussion of the tradeoffs (Pros and Cons), the rationale (History) of the pattern, where the pattern has been used before (Known Uses) and to navigate to related patterns for further investigation.
Reverse Engineering, Reengineering, and Forward Engineering
In the Introduction, a key distinction is made between three different activities. I found that this distinction is very useful as it identifies activities of analysis, corrective rework, and new work. By separating tasks into these three categories, a more complete picture of the legacy system can be developed, one which then provides critical information to be applied to decisions such as budgets and schedules and required expertise.
Figure 1: Setting Direction (pg. 20)
The authors provide a concise statement for what is reverse engineering: “reverse engineering is essentially concerned with trying to understand a system and how it ticks.” This section of the book has several high-level topics:
- Setting Direction
- First Contact
- Initial Understanding
- Detailed Model Capture
each of which provides a group of patterns to aid in solving problems in that section.
I found the discussion of Forces in each section to be valuable – reverse engineering and reengineering requires a level of constant vigilance that are brought to consciousness reading through the Forces section. I would actually recommend that the Forces sections be re-read weekly by all team members and management at the beginning of any reverse / reengineering effort. There are also some omissions in the discussion, which I will address next.
What About the QA Folks?
One of the most valuable sources of information that I have found when working with legacy applications is talking with the Quality Assurance folks. These are people that know the nuances of the application, things that even the developers don’t know. While this might be inferred from the sub-section “Chat with the Maintainers”, the focus seems to be on the people maintaining the code–for example: “What was the easiest bug you had to fix during the last month?” This is, in my opinion, a significant omission of the book.
Regulations and Compliance Certification
One of the stumbling blocks I once encountered in a reengineering project was that the existing code had been certified to meet certain compliances. Re-certifying new code would be a costly and time consuming process. This raises an issue that should not be ignored but which unfortunately the book completely omits – are there third party certifications that the software must undergo approval before it can be reengineered or forward engineered? What are those certifications and how were they achieved in the past?
Reverse Engineering is About More Than Code
I found the book to be a bit too code-centric in this section. For example, under the section “Detail Model Capture”, the subsections:
- Tie Code and Questions
- Refactor to Understand
- Step Through the Execution
- Look for the Contracts
are all very code-centric. Several discussions seem to be lacking:
- Tools are available to aid in the reverse engineering process
- Reverse engineering the database
- Documenting user interface features
These are points that are critical to detail reverse engineering. Tools that generate class and schema diagrams and reverse engineering code into UML diagrams can be invaluable in a detail capture of the application. Over time, the user interface probably has all sorts of shortcuts and interesting behaviors that have been patched as users have made requests, and missing these behaviors will alienate the user from any new application.
I found that tools to support the documentation process could have been discussed. For example, I have set up in-house wikis for companies to provide a general repository for documenting legacy applications and providing a forum for discussion on many of the excellent points the book raises regarding communication and problem analysis.
Another tool I have often found lacking in companies maintaining legacy applications is source control (I kid you not.) Most legacy systems are still being updated, often with bug fixes, during the reverse / re-engineering effort. A source control system is critical as it developers can ensure that any new implementation mirrors the changes in the legacy system. It also provides a source of documentation – when a change is made to a legacy system, the developer can add comments that aid in the reverse engineering – why was the change made, what was discovered in making the change, how was the change made, and so forth.
Figure 2: Migration Strategies (pg. 182)
The authors define reengineering as “Reengineering, on the other hand, is concerned with restructuring a system, generally to fix some real or perceived problems, but more specifically in preparation for further development and extension.” This section of the book has five sub-sections:
- Tests: Your Life Insurance!
- Migration Strategies
- Detecting Duplicate Code
- Redistribute Responsibilities
- Transform Conditionals to Polymorphism
again, each of which provides a group of patterns to aid in solving problems in that section.
While reengineering definitely involves testing and data migration, again I found this section to be overly code-centric. With legacy applications, regarding the database I often encounter:
- non-normalized databases
- obsolete fields
- missing constraints (foreign keys, nullability)
- missing cascade operations, resulting in dead data
- fields with multiple meanings
- fields that no longer are used for the data that the field label describes
- repetitive fields, like “Alias”, “Alias1”, “Alias2”, etc., that were introduced because it was too expensive to create additional tables and support many-to-one relationship
Reengineering a database will break the legacy application but is absolutely required to move forward to supporting new features and requirements. Thus the pattern “Most Valuable First” (pg 29) in which it is stated:
“By concentrating first on a part of the system that is valuable to the client, you also maximize the commitment that you, your team members and your customers will have in the project. You furthermore increase your chances of having early positive results that demonstrate that the reengineering effort is worthwhile and necessary.”
can be very misleading. The part of the system that is valuable to the client often involves reengineering a badly designed / maintained database, and reengineering the database will take time – you will simply have to bite the bullet that early positive results are simply not achievable.
Lastly, the authors make a distinction between reengineering and new engineering work: “Forward Engineering is the traditional process of moving from high-level abstractions and logical, implementation-independent designs to the physical implementation of a system.” Forward engineering is the further development and extension of the application, once one has been adequately prepared by the reverse engineering and reengineering process.
The reader will notice that subsequent to the Introduction, there are two sections describing reverse engineering patterns and reengineering patterns, but there is no section describing forward engineering patterns. There is a very brief coverage of common coding patterns in the Appendix. Certainly there are enough books on forward engineering best practices, but in my experience, there is a significant step which I call “Supportive Engineering”, that often sits between reverse engineering and the reengineering / forward engineering process.
Figure 3: Bridging Legacy and Forward Engineered Products (author)
Reengineering of legacy applications often requires maintaining both old and new applications concurrently for a period of time. What I have termed “Support Engineering” are those pieces of code necessary to bridge data and processes while in this concurrent phase of product support. Depending on the scope of the legacy system, this phase may take several years! But it should be realized that basically all of the “bridge” code written will eventually be thrown away as the legacy application is replaced.
Commercial and In-House Tools
Supportive engineering also includes the use of commercial tools and the in-house development of tools that support the reverse and reengineering efforts. For example, there are a variety of unit test applications readily available, however the developers must engage in writing the specific unit tests (see the section “Tests: Your Life Insurance” in the book.) The legacy applications may utilize a database, and it will probably not be normalized, requiring tools to migrate data from the legacy database to a properly normalized database.
Concurrent Database Support
Furthermore, during the concurrent phase, it may be necessary to maintain both the re-engineered database and the legacy database. This doesn’t just involve migrating data (discussed in the book under the section “Make a Bridge to the New Town.) It may involve the active synchronization between legacy (often non-normalized) and reengineered (hopefully normalized) databases. Achieving this can itself be a significant development effort, especially as the legacy application is probably lacking the architecture to create notifications of data changes, most likely requiring some kludge (PL/SQL triggers, timed sync, etc.) to keep the normalized database in sync.
Another issue that comes up when having to maintain both legacy and reengineered databases is one of data compatibility. Perhaps the purpose of the reengineering, besides normalization, is to provide the user with more complicated relationships between data. Or perhaps what was a single record entry form now supports multiple records – for example, the legacy application might allow the user to enter a single alias for a person, while the new record management database allows the user to enter multiple aliases. During the concurrent phase, it becomes a significant issue to determine how to handle “data loss” when synchronizing the legacy system with data changes in the new system, simply from the fact that the legacy system does not support the same flexibility as the new, normalized database.
It Will All Be Thrown Away
Remember that the tools, tests, and software designed to bridge between reengineered and legacy applications during the concurrent support phase will become obsolete once the legacy application is completely phased out. The costs (time and money) should be a clearly understood and communicated to all stakeholders of the reengineering effort – management, developers, users, etc.
The book Object Oriented Reengineering Patterns offers some excellent food for thought. One of the most positive things about this book is that it will give you pause to think and hopefully to put together a realistic plan for reengineering and subsequently forward engineering a legacy application. For example, the advice in the section “Keep it Simple”:
“Flexibility is a double-edged sword. An important reengineering goal is to accommodate future change. But too much flexibility will make the new system so complex that you may actually impede future change.”
is useful as a constant reminder to developers, managers, and marketing folks. However, I think the book is too focused on the reengineering of code, leading to some gaps with regards to databases, documentation tools, and certification issues, to name a few. I don’t necessarily think that the concept of “reengineering patterns” was adequately presented – the book is more of a “leading thoughts” guide, and from that perspective, it has some very good value.