July 04, 2016

The WWWW Guide to Refactoring

If you are working on an ever-evolving codebase, refactorings are something you're sure familiar with. Martin Fowler’s Refactoring book is a must-read for any developer. Lets start with his definition of refactoring:
Refactoring is the process of changing a software system in such a way that it does not alter the external behavior of the code, yet improves its internal structure.” [1]
The book describes the process of refactoring and spends most of the time explaining how to do the various refactorings. In this post I will talk about the essential wwww questions:


WHY should we refactor?
WHEN should we refactor?
WHO is involved in a refactoring?
WHAT should be refactored?


Refactoring is an important technique to improve understandability, maintainability and extensibility and to foster a good structure of source code. Therefore, new features and requirements can be implemented in a more efficient way with reduced costs and less difficulty. 
However, many managers are hesitant to allow time for refactoring because no new features are created in the process of refactoring. The work seems to have no visible output and no visible benefit for the manager. On the other side, developers may also hesitate because of their fear of introducing new bugs to a stable system. [2] 

This problem can be solved by
1.    Using automated refactoring tools to reliably enhance code quality by safely performing refactoring tasks;
2.    Using an automated test suite before doing any changes

WHY

At the beginning, most code bases are small and well designed. Over time, the size and complexity increases and the code starts to “smell”. There are various types of code smells, which I will cover in more detail in the “WHAT” section of this post.

If developers are finding it difficult to maintain the code, the value of the code base becomes a liability. Alternatively, the original developer may regret certain design decisions, and now knows there is a better way. But always bear in mind that existing code, which has been tested thoroughly in a production environment, has a value, regardless of whether it needs refactoring. If your company is doing a lot of manual regression testing, all this work has to be done again and you are running the risk of introducing new bugs or old, already fixed bugs to the old – bad – code base.

Well-structured code is less error-prone when it comes to extending it but the real benefits come in the long term. These benefits are:

  • Reduced time that developers spend on debugging and maintenance work
  • Improved extensibility and robustness of the code
  • Code duplication is reduced
  • Code reuse is fostered
  • Overall maintenance and development cost should come down
  • The team's implementation speed for change requests is improved

WHEN

We are more reluctant to refactor if it proves difficult and may take some time, than when we find it easy to apply refactoring operations. Ideally, however, it is part of a continuing quality improvement process.

Refactoring code while adding new features increases the chance that management will approve, since it will not require an extra phase of testing. But when you have a deadline for a feature, it is probably not the best time to do it. In the worst case you don't even have time to create a clean design for your new code – then, at least, you shouldn't forget to come back later and fix the code in question when time permits it.

If a developer finds it difficult to understand the code, this is a good starting point to ask questions such as what is bad and what could be done better, to create more maintainable, readable and robust code. Or do you have a recurring task that you could speed up by refactoring from which your whole team could benefit over time? If so, it should definitely be done. In case you are unsure whether now is the right time for a refactoring, just ask yourself the following question: “If I pass on doing the refactoring now, how long would it take to do later?” [3]

If you can answer this question with “under a day” and doing it then won't take you longer than doing it right away, you can postpone the refactoring and do it later. However, refactor right away if the cost of refactoring is smaller than the cost of not refactoring.

Kent Beck [Fowler, p.60] notes that “refactoring adds to the value of any program that has at least one of the following shortcomings:


  • Programs that are hard to read are hard to modify.
  • Programs that have duplicate logic are hard to modify.
  • Programs that require additional behavior that requires you to change running code are hard to modify.
  • Programs with complex conditional logic are hard to modify.” [2]

WHO

Considering all the benefits of refactorings the manager should not only reward new features which have business value, but pay high attention as well to code quality by introducing mandatory code reviews and setting up code standards.

When a developer wants to refactor some badly structured code, his/her manager may oppose any attempt to modify working code based on the “If it is not broke, don't fix it.” philosophy. These concerns must be addressed by both developers and managers. Refactoring that takes only up to a few hours is just part of the job. It should be a part of the every-day discipline of development.

Refactoring that takes several days or longer is not refactoring, it is redesigning. Don't be tempted to mix up these two distinctive kinds of tasks. They have very different costs and risks. If you opt for large-scale refactoring there should be a specific business case. [4]

WHAT

Code smells in your code base indicate what should be refactored. I will provide a short overview of code smells, grouping them into five general categories:
(For a more detailed explanation have a look at the following resources:  [5][6], [7])

1. Bloaters - Represents code that has grown so large that it can’t be handled effectively anymore.

  • Long Method - more than 10 lines
  • Large Class - too many methods, fields...
  • Primitive Obsession - small objects instead of primitives, constants...
  • Long Parameter List - no more than 3 to 4 parameters per method
  • Data Clumps - values that belong together (f.e. adress data) should be moved to a separate class

2. Object-Orientation Abusers - This category represents code which incorrectly or incompletely uses object orientation principles.

  • Switch Statements - like complex switch blocks or sequences of if statements
  • Temporary Field - classes should not contain lots of optional or unnecessary fields
  • Refused Bequest - inheriting from a class, but never or rarely using any of the inherited functionality
  • Alternative Classes with Different Interfaces - two methods in different classes with different names but doing the same functionality might share the same interface

3. Change Preventers - If you need to make a small change in one class which would lead to having to change many other places and classes too.

  • Divergent Change - if many changes to unrelated methods are needed while making changes in a class
  • Shotgun Surgery - many changes need to be done to many classes when starting to modify
  • Parallel Inheritance Hierarchies - this means a duplicated class hierarchy

4. Dispensables - This type of code smell represents something unnecessary that should be removed to make the code cleaner and easier to understand.

  • Lazy class - many classes increase the complexity of a project. If there are classes that effectively don't do much they should be deleted.
  • Data class - classes that are not doing enough like the lazy class smell
  • Duplicate Code - two or more fragments doing the same
  • Dead Code - unused classes, methods and variables
  • Speculative Generality - unused classes, methods and variables written for “what if” cases which never arise in practice
  • Comments - generally: aim to write code that hardly needs comments

5. Couplers - The smells in this group represent the problem of excessive coupling and excessive delegation.

  • Feature Envy - methods which make extensive use of another class. Consider moving them into that particular class
  • Inappropriate Intimacy - using the internal fields and methods of another class. Classes should know as little as possible about each other.
  • Message Chains - calling methods in chain from different classes, coupling them together to get the needed data from the last class within the chain.
  • Middle Man - if a class is only a wrapper which delegates all of its work, there is no need for it to exist.

CONCLUSION

Doing refactoring is not that hard, which is why it should be done every time it is necessary.
It is a well-defined process that improves the quality of code which has grown hard to maintain, without throwing away the existing source and starting again. When code becomes hard to maintain or difficult to understand, small steps of refactoring will already help. Code smells indicate what should be refactored and are a very good starting point. 

[1] Martin Fowler. Refactoring: Improving the Design of Existing Code. Addison-Wesley,
Boston, MA, USA, 1999. ISBN 0-201-48567-2.
[7] Mäntylä, M. V. and Lassenius, C. "Subjective Evaluation of Software Evolvability Using Code Smells: An Empirical Study". Journal of Empirical Software Engineering, vol. 11, no. 3, 2006, pp. 395-431.

1 comment: