Java’s Refactoring Tax
It’s harder to refactor a program in Java than it is in a “lightweight language” such as Lisp, JavaScript, Python, or Smalltalk. The reason is that Java’s feature set causes some common refactorings to end up more verbose, instead of less: this discourages refactoring (because it doesn’t buy you as much), and increases the maintenance cost of a refactored program.
Nested (inner) functions and inferred types make refactoring easier. Checked exceptions, manifest types, and the final restriction on the visibility of variables to inner classes make it harder.
These problems certainly don’t make refactoring impossible. Plenty of refactoring tools are available for Java. They just raise the bar on how bad a code smell has to get before it’s worth the cost of refactoring. This leads to programs that are more verbose, and have a higher degree of redundancy, in Java than they would in more easily refactorable languages.
An Example
Here’s an example that I ran into last week. This is in the implementation of an XML document processor. The processor contains two checks for the deprecated attribute names “init” and “constraint”. The actions triggered by these attributes are similar, but not identical. Here’s the original code:
void compile(Element element) {
...
if (printDeprecationWarnings && constraint != null) {
String message = "..."; // string that mentions the name "init" and "immediately"
if (!element.getParent().getName().equals("class")) {
message += "..."; // string that mentions the name "immediately"
env.addWarning(message, element);
}
value = constraint;
when = IMMEDIATELY;
}
if (printDeprecationWarnings && init != null) {
String message = "..."; // string that mentions the name "constraint" and "always"
// etc.
value = init;
when = ALWAYS;
}
}
This is code with a “bad smell": it’s a case of Duplicate Code.
Dry Run: Nested Functions
In a language with nested function definitions, I might factor the two if clauses into a nested function parameterized over the expressions that vary between them:
==
void compile(Element element) {
...
void adjustDeprecatedAttribute(attrValue, oldname, newname, evalTime
if (printDeprecatedWarnings && attrValue != null) {
...
value = attrValue;
when = evalTime;
}
}
adjustDeprecatedAttribute(init, “init”, “immediately”, IMMEDIATELY);
adjustDeprecatedAttribute(constraint, “constraint”, “always”, ALWAYS);
}
This refactoring comes with two bits of overhead. (1) Function and parameter names are necessary to implement the plumbing — to match up the single function definition against its two invocations. This is inherent in the use of functions for abstraction — and therefore, for refactoring — in the first place. (2) Punctuation distinguishes the new form as a function definition. This is the overhead, beyond the identifier names, of adding a new function definition.
This first draft works in a language with implicit types. (An implicitly typed language doesn’t require type declarations on its variables. The compiler may compute the types at compile-time like Haskell does, or the runtime may verify that the types of values match the operations that are applied to them like Python, JavaScript, and Smalltalk do.) Each new function definition adds a bit more overhead, in an explicitly typed language such as Java or C++:
==
void compile(Element element) {
...
void adjustDeprecatedAttribute(String attrValue, String oldname, String newname, EvalTime evalTime) {
if (printDeprecatedWarnings && value) {
...
}
}
adjustDeprecatedAttribute(init, “init”, “immediately”, IMMEDIATELY);
adjustDeprecatedAttribute(constraint, “constraint”, “always”, ALWAYS);
}
The explicit types make the new function a bit more verbose, and it’s verbosity with an price. The type of init is now declared twice, once in the type declaration for the variable (which I haven’t shown here), and once in the type declaration of the function that is applied to it. This duplication comes with an ongoing maintenance cost. If I change init from String to Object or AttributeValue, there’s now an additional type declaration that I have to track down and change. These changes aren’t relevant to the contract for adjustDeprecatedAttribute, which just cares that attrValue can be distinguished from null and that has a string representation.
This isn’t a big deal, but it pushes up the maintenance cost of the program marginally, to the point where I might want to wait until a block of duplicated code is two or three lines to refactor instead of one or two. It does this by counterbalancing the original motivation for refactoring the code: to eliminate duplication. In Java, the best I can do is substitute one kind of duplication for another.
Beyond Nested Functions
But wait, there’s more.
Java doesn’t have nested functions.
There’s three tacks I could try at this point. (1) I could make the new function a member function of the class — a sibling of compile, which calls it. (2) I could create a new class to hold the new function. Or (3) I could declare an interface and use it, together with an anonymous nested function, to fake a nested class.
Making a New Member
Making a New Class
Faking a Nested Function
More Radical Solutions
In Conclusion
———————————————————————————————-
In Java, I have these choices:
- Use conventional refactoring to create a new method: add a new private method
checkDeprecatedAttribute to the class (or to another class such as that of env).
- Use an anonymous inner class to host a nested function.
- Create a non-inner class with a single method.
- Use intermediate variables. Change the two clauses to set parameters that are used downstream from both of them to create the error messages.
- Make the code data-driven.
- Leave the code as is.
Each of these choices has problems. I ended up choosing the last of them — the one with the code smell — as the most maintainable alternatives. In Java, it’s the best of a bad lot.
Refactoring
void checkDeprecatedAttribute(Element element, String value, String oldname, String newname) {
if (printDeprecatedWarnings && value) {
...
}
}
void compile(Element element) {
...
checkDeprecatedAttribute(element, init, “init”, “immediately”);
...
checkDeprecatedAttribute(element, constraint, “constraint”, “always”);
}
Refactoring increases the complexity of the class by adding to its list of methods. The private keyword can indicate that checkDeprecatedAttribute isn’t part of the class’s public API, but there’s no way to indicate that this method is private to compile: that it’s used only by the compile method, and can (and should) be ignored by anyone reading or working on another part of the class implementation.
Refactoring also introduces a long-distance dependency between the call sites within compile and the definition of checkDeprecatedAttribute. Change a type in compile, and the change has to be propogated to checkDeprecatedAttribute. This is because refactoring increases the number of type declarations and disperses them; this is a dependency that isn’t present in the unfactored version. (It’s one of the penalties of manifest types: the type declarations metastasize on refactoring.) It’s compounded by the fact that there’s no way to keep the method definition textually close to (or better yet, inside of the function that contains) the method invocation.
It’s as though the only way to factor the common subexpression out of this:
function float f(float x) {
return x*x + sqrt(x*x);
}
were to introduce a new class member:
private float xsqr;
function float f(float x) {
xsqr = x * x;
return xsqr + sqrt(x*x);
}
Imagine how that would discourage the use of temporary variables!
Refactoring tools help create the long-distance dependency. They don’t help maintain it, and they don’t make the resulting program more readable.
Faking Nested Functions I
interface checker
void compile(Element element) {
}
Faking Nested Functions II
Intermediate Variables
Data-Driven Architecture
The Status Quo