Primarily Vinod's techno babble and secondarily his rants and hopefully some useful insights. A medium intended for learning by sharing experiences to hone programming skills and hopefully also network with like-minded enthusiasts.

Saturday, July 30, 2011

Approaches to refactoring

If you are a programmer in a software services company chances are you would at some point in time maintain an app written by a different team/company years back. And if you had accumulated bad karma like me, the code would be plain 'sloppy' to say the least. I am responsible for one such app from a technical perspective.
It was high time to do some refactoring to clean-up the code base. Having made that decision the next question was how do we go about it? After some thought the following approaches seemed to make some sense:
  1. attack the public, then protected, default and finally private methods as a step by step approach. While at it, write unit tests to ensure the code behaves like it was intended to
  2. instrument the current codebase with Emma (or Cobertura) preferably while integration testing to look at the code coverage and take up those methods that are being called most often
  3. identify the critical methods that are involved in normal typical workflows (get that from the biz analysts). Then for such workflows identify from the code the methods involved and refactor such methods. Yup, it appears to be a tedious approach indeed.
  4. identify the classes that have undergone the maximum revisions in the source control system. This may give an idea about which parts of the app are buggy and need greater attention
While all the above approaches to refactoring probably have some merits, it still felt inefficient and incomplete.

As I was investigating the code base, it dawned on me that the purpose of refactoring was to shrink the code-base, make it more manageable and efficient. How could it be made smaller and efficient? You already know the simple answer:- by identifying patterns and writing abstractions in the form of components and services which would promote re-use of those abstractions in specific unique situations with little or no overhead.
If I had opted for the 1st approach, what is the point in writing unit tests for methods that may not even be needed? And another challenge was how would you write tests for methods that were more than a thousand lines long? It would be akin to trying to boil the entire ocean, and this ocean was murky and stinking too!
If I were to take the 2nd approach, then I would have to wait for a round of integration testing to happen, before I had some info. And even then, some code might not get covered.
The 3rd approach obviously is quite tedious. Though it would identify the pain points in the app, it would not offer much ideas about solutions to such problems.
The 4th approach was something that caught my interest and I wrote a program to talk to SVN to get those statistics. And it really did help in identifying the problem classes. Another blog post on that with code snippets will follow soon! Infact am considering writing a bunch of such classes which could talk to SVN and release them as open source stuff which others might want to use prior to undertaking their refactoring effort.

Coming back to the approaches to refactor, now the aim is to identify code re-use areas. It was some degree of fun trying to identify such areas muddled and mired under brain numbing lines of code. It becomes far easier to remove the cruft when you have identified what code you really need. Then code shrinks, unit testing becomes feasible and a bug needs to be fixed only once in one place. Eventually all these lead to better quality and maintainability, thereby reducing long evenings@office and promoting better sleep at night!

As always would be glad to know your educated opinions and advice!