How to refactor code and why you should

Blackboard with A and B written in chalk with wiggly arrow between the letters

(Note: there are code examples in this post. They're all in JavaScript, but none of the code is complicated and most of the points apply to most other programming languages too)

Refactoring is when you take existing, working code and rewrite all or part of it to improve it in some way that doesn't change its behaviour. Another way to phrase this is that it's about optimising the code for some given value. In this post we'll focus on optimising the code for readability, and generally giving our fellow developers the best possible time when dealing with it.

It's very easy to get new programmers on board with the idea that they should be refactoring code, but trying to convince them that they can is where you might meet some resistance. Fears about breaking something or upsetting another developer by changing their code, and not trusting ourselves to know what needs refactoring or how to improve it are all common and understandable reasons for this. But they're bad reasons!

If you don't feel confident that refactored code has the same behaviour as it had pre-refactor then that code is lacking tests. Write a couple of unit tests to capture the existing behaviour, make your refactor, then make sure the tests still pass. Easy!

If another developer ever gets upset that you've changed their code, that is a problem with them, not with you or what you've done. If you're a junior developer or even an experienced developer who's in a new team, knowing you're in the right doesn't necessarily help; you don't want your teammates to be angry with you, no matter what their reason. So, consider this: what reason would they have to be angry? You found some code which was unclear so you spent some time improving it for the next person to come along and work with it. They should be pleased about that! Heck, they might even be the person to benefit from your refactor when they come back to work on their spaghetti code and instead find your well-structured, easy-to-read refactor instead. As with the point about tests, if you really think it's unlikely that your teammates would share this outlook, it suggests a separate issue. This one isn't so easy to fix by yourself, but it's even more important. Talk to your manager; they will have a very clear motivation to encourage you and your teammates to keep a well-maintained codebase!

If your issue is that you lack confidence in your ability to recognise code that needs refactoring or to determine how to improve it, then I have great news for you: it's really easy. To become a confident refactorer, you just need to follow these steps:

Pick a broadly applicable rule of thumb about good, clean code
Apply it at every single opportunity you have; write it on a post-it stuck to your monitor if that helps to keep it at the forefront of your mind
Do not worry about over-applying the rule of thumb—at worst you'll make a neutral change, and by the time you're applying any given rule as part of your normal coding process, you'll have figured out the edge cases where it's not so useful anyway
Once you feel confident with one rule, add another!

Without further ado, let's learn some rules of thumb.

(The more experienced you are as a programmer, the more likely you are to read one of these rules of and think "Hey! That isn't true 100% of the time!" and frankly you'll be absolutely right. But they're each a great starting point because they will provide some value so much of the time, and will have an actual negative effect almost none of the time.)

Variables: the more the merrier

Variables are very powerful. Every time you declare a variable you're increasing the proportion of your code that describes your intent. The expression [{name: "one", numeralString: '1'}, {name: "two", numeralString: '2'}, {name: "three", numeralString: '3'}].map(obj => parseInt(obj.number, 10)) evaluates to an array of numbers. It's not hard to figure that out, but using that expression as the argument to a function certainly isn't as clear as spending one extra line of code to assign the expression to a variable called numbers and passing numbers into the function.

This is fairly intuitive for most types of variable, but it's easy to overlook for functions. In the above example we passed an anonymous function into the .map() call, but we could have done the following:

const toNumber = (obj) => parseInt(obj.number, 10)
const numbers = [
  { name: 'one', numeralString: '1' },
  { name: 'two', numeralString: '2' },
  { name: 'three', numeralString: '3' },
].map(toNumber)

Now our code tells us exactly what it's going to do to each member of the array: map to number. This is a really easy way of adding clarity to code.

Consider this example:

const checkUserProfilesComplete = async (userIds) =>
  (
    await Promise.all(
      userIds.map((id) => getUserService('afkHd98JUcdd6LiDgC').get(id))
    )
  ).reduce(
    (acc, curr) => acc && curr.hasName() && curr.hasAge() && curr.hasPicture(),
    true
  )

And now contrast it with this refactor:

const areUserProfilesComplete = async (userIds) => {
  const userService = getUserService(process.env.CONNECTION_KEY)
  const userPromises = userIds.map((id) => userService.get(id))
  const users = await Promise.all(userPromises)

  return users.reduce(areComplete, true)
}

const areComplete = (prevUpToDate, user) =>
  prevUpToDate && user.hasName() && user.hasAge() && user.hasPicture()

At the cost of two extra lines, we've gained a huge amount of clarity just by adding a few variables. That's a pretty good deal!

Hopefully you noticed that we swapped "afkHd98JUcdd6LiDgC" for process.env.CONNECTION_KEY. Not only is it a good idea to store secrets and configuration as environment variables for practical reasons like not having to commit secrets into source code, it lets us replace a magic value (one which conveys no information beyond its value) for a symbolic one (one which does provide more information).

The refactoring rule of thumb here is: if there's a lot of stuff going on and it's hard to keep track, adding more variables will often make it easier to follow.

You might have also spotted that we've renamed some variables that already existed. That brings us nicely onto our next rule of thumb!

Better variable names means more information and less confusion

This rule is very easy to implement because you probably already know it. In fact, it's one of the first things most beginner coding courses and tutorials teach:

Function names start with a verb
Non-function names are a noun (or noun phrase)
Except from booleans, which always start with "is", "has", "can" etc—anything you can rephrase as a question

We all know these rules, but we're also all imperfect and sometimes even a little lazy, so we don't follow them as well as we could. This makes it a great refactoring rule, because you'll find lots of code that breaks it and it's really easy to fix!

The other thing to bear in mind with variable names is that we should value brevity, but not over clarity. The most important thing is that a name gives all the information it needs to. Only make it as short as it can be while still doing so. And always be mindful that some conventions make very short names very clear in certain contexts. Inside a .map(), i is inescapably clear as the name for the index of the current member of the array being mapped, and that's only one character! Similarly, you can always shorten maximumSize to maxSize etc.

Finally, include units where applicable! It's so much easier to talk about hoursSpent and minsSpent than timeSpent, not least because two related functions might each refer to something they call timeSpent, while they're each dealing with different units of time entirely. Situations like that are just asking for trouble.

The refactoring rule of thumb here is: If a variable name is vague, ambiguous, misleading about its type or longer than it needs to be, change it!

Your choice of array method communicates your intent

JavaScript arrays have lots of methods that let you iterate through them and act on each member with a function you define. Each method will use the function differently and their return values will differ a lot, but because they all give you access to each array member, it's often possible to successfully operate on an array without using the correct method. But it's never a good idea.

As a quick example, let's say I have an array of arrays and each inner array contains a sequence of integers. I want to add the next integer to them each.

const numberArrays = [
  [1, 2],
  [1, 2, 3],
  [1, 2, 3, 4],
]
numberArrays.every((array) => array.push(array[array.length - 1] + 1))
console.log(numberArrays)
// output: [ [ 1, 2, 3 ], [ 1, 2, 3, 4 ], [ 1, 2, 3, 4, 5 ] ]

The MDN docs say .every() "...tests whether all elements in the array pass the test implemented by the provided function. It returns a Boolean value." Based on that description, you'd look at my code and think I wanted a boolean. Why else use .every(), right? But wait a second, I haven't actually captured the return value of .every() into a variable or done anything else with it. And wait another second; .push() returns the new length of an array, and because in my code the new length is always greater than 0, it will be truthy and .every() will coerce it into true (because .every() will always return a Boolean). Now you're thinking OK, this isn't the clearest way to find out whether all the arrays have a length greater than 0 (and you're right!), and thinking something must be wrong elsewhere in my code if the fact that I'm not using the return value hasn't caused an obvious problem. But no! I just wanted to add some numbers to some arrays. And I managed that. Just ... in the worst way possible.

Here's what I should have done:

const numberArrays = [
  [1, 2],
  [1, 2, 3],
  [1, 2, 3, 4],
]
const updatedNumberArrays = numberArrays.map((array) => {
  const lasttNumber = array[array.length - 1]
  return [...array, lastNumber + 1]
})
console.log(updatedNumberArrays)
// output: [ [ 1, 2, 3 ], [ 1, 2, 3, 4 ], [ 1, 2, 3, 4, 5 ] ]

MDN says .map() "... creates a new array with the results of calling a provided function on every element in the calling array." Perfect! This code is a lot easier to read and understand because it doesn't throw us any red herrings like the first version did.

I've also cheekily stopped using .push() because it mutates the original array, something always worth avoiding if you can because if we ever look at the value of numberArrays further down the line, it will have the value we can see in the declaration.

The refactoring rule here is: if a different array method fits the use case more specifically, change it.

Ternaries: no ifs or buts

We're all familiar with if-else statements. A ternary is like an if-else expression. Rather than executing one of two statements based on a condition you define, the whole ternary expression simply evaluates to one of two values based on the condition.

Example time. Here we're assigning a value to a variable based on a condition using if and else:

let state
if (5 > 6) {
  state = 'maths is broken'
} else {
  state = 'all good'
}

But here we're using a ternary:

const state = 5 > 6 ? 'maths is broken' : 'all good'

Reducing the amount of code we're writing without reducing clarity like this is great! But it's even better when you can reduce the number of return statements in a function:

const demonstrateIfs = () => {
  ...
  if (5 > 6) {
    return "Maths is broken"
  } else {
    return "All good"
  }
}

const demonstrateTernaries = () => {
  ...
  return 5 > 6
    ? "maths is broken"
    : "all good"
}

Multiple return statements in different branches of a function's logic can make it many times more difficult to understand what value it will return for any given input. Sometimes it can't be avoided, but often it can. Another benefit of this pattern is that you can use the term "returnary", which is fun to say.

Sometimes you'll see ternaries used unnecessarily, however.

const greaterThan = a > b ? true : false
const greaterThan = a > b

These two lines of code are equivalent, because the expression a > b itself evaluates to a Boolean.

The refactoring rule of thumb here is: If you can reduce the number of return statements or if/else blocks by using a ternary, do. If you can remove a ternary without changing the behaviour of the code, do that too.

And the bonus extra rule is: Use the phrase "returnary". (I'm really struggling to make it take off for some reason.)

A brief interlude: primitive types vs reference types

Before we talk about the next rule, let's make sure we're all comfortable with primite and reference types.

Booleans, numbers, strings, undefined and null are all primitive types, which means they contain values. Objects and arrays are reference types, which mean they contain references to a location in memory; that's where the actual value is.

This is why weird stuff like this works:

const a = {}
const b = a
a.foo = 'bar'
b.foo === 'bar' // true

a contains a reference to an object. b contains that same reference. So when you add a property to the object, the change will be reflected whether you access it using a or b.

With that out of the way, let's talk about pure and impure functions.

Pure functions are less confusing

A pure function is one which will always return the same value given the same input and which has no side effects, i.e. it has no effect on anything outside itself. All other functions are impure.

Consider this function:

const haveABirthdayImpure = async (user) => {
  user.age++
  user.canDrive = await canUserDrive(user.age)

  return user
}

The user argument is a copy of the value that was passed in when haveABirthdayImpure was called. That's how all function arguments work, regardless of their type. So if you pass the number 4 into a function then inside that function you have a different number 4. You can increment the one in the function and it won't affect the one outside the function. Great! But what if we pass an object into a function, like in haveABirthdayImpure? We get a copy of user, but user contains a reference to a location in memory, and two copies of a reference both point at the same memory location. Therefore, when we increment user.age inside our function, we're mutating an object outside the function.

And haveABirthdayImpure is async. Other things can be happening while it's running. What if they reference user too? They might get unexpected values because haveABirthdayImpure mutated them.

We avoid this problem with a pure function:

const haveABirthdayPure = async (user) => {
  const newAge = user.age + 1

  return {
    ...user,
    age: newAge,
    canDrive: await canUserDrive(newAge),
  }
}

In the pure version we never mutate user itself, so we don't affect anything outside the function.

The refactoring rule of thumb here is: If you see an impure function, make it pure.

Comments are a last resort

Lots of people have lots of feelings about comments. Some developers will tell you that you should avoid them at all cost. Some will tell you that you should comment everything. But we want a broadly applicable rule of thumb, so here you go: use comments! But only when they're needed. Pretty radical stuff, right?

Consider this code:

const name = getName(emailAddress) //take everything before the @, split on '.' and capitalise first letters

Is this a good comment? Is it needed? It offers some information. We might see a call to getName() and wonder how it works. But then wouldn't we just read the code? In fact in any decent editor we can probably click getName() and be taken right to the code in question. So why repeat the code again in English? It's a waste of time and space.

How about this one?

// get users with these IDs and check their profiles are complete
const areUsersUpToDate = async (userIds) =>
  (
    await Promise.all([
      userIds.map((id) => getUserService('afkHd98JUcdd6LiDgC').get(id)),
    ])
  ).reduce(
    (acc, curr) => acc && curr.hasName() && curr.hasAge() && curr.hasPicture(),
    true
  )

You might recognise this code from earlier. We already know it's confusing. But now we've got a comment describing what it does, which might allow us to save the time we'd have spent figuring it out for ourselves. And this time the comment is next to the function definition, which seems like an improvement on the last example because it's easy to get to the definition from anywhere the function is called. In the last example we might miss that comment if we came across getName() from a different place where it's called.

However, isn't our original response to this code a lot better? To refactor it and make it clear enough that it doesn't need a comment to explain it? (Yes, it is.)

So here's the refactoring rule of thumb for comments: If a comment is repeating code, delete the comment. If a comment is being used in place of clear code, rewrite the code (using the previous rules of thumb!) and then delete the comment.

Arrays aren't the only data structure

Arrays are pretty fundamental. You're going to use them a lot. But sometimes you'll come across code (or write code) where the fact that your data is stored in an array actually gets in your way a bit.

Consider a support desk application. There are a series types of support ticket that a user can raise. Hardware fault, Software fault, and Service outage. Every ticket has a type field, which contains the ID of one of the types. The UI of our application also displays each ticket's name in a specific colour acording to its type. So we keep a collection of types in an array like this:

const types = [
  { id: 1, name: 'Hardware fault', priority: 2, color: 'purple' },
  { id: 2, name: 'Software fault', priority: 3, color: 'orange' },
  { id: 3, name: 'Service outage', priority: 1, color: 'brown' },
]

And when we want to display a ticket, we do something like this:

<h1 style={{ color: types.find((type) => type.id === ticket.type).color }}>
  {ticket.title}
</h1>

Now let's ask ourselves: should this data be in an array, or in an object?

const typeMap = {
  1: { name: "Hardware fault", priority: 2, color: "purple" },
  2: { name: "Software fault", priority: 3, color: "orange" },
  3: { name: "Service outage", priority: 1, color: "brown" }
}

<h1 style={{ color: types[ticket.type].color }}>
  {ticket.title}
</h1>

We've made our code a little bit shorter, a little bit more readable, and a little bit more efficient. We've gained efficiency because now we don't need to iterate through our array with .find() every time we display a ticket title. As you can imagine, this makes a much bigger difference when you're dealing with bigger arrays!

The other benefit is more subtle, but it's important nonetheless. Arrays are ordered. Their members are accessed primarily by their index. Objects on the other hand aren't ordered. Their members are accessed by their key (in our case, the IDs). And as with our choice of array methods earlier, our choice of data structure here communicates something to the next person to read our code. When we use an object, we communicate that the order isn't important and that whatever we use as our keys are unique (if two ticket types had the same ID, one would overwrite the other in the object). Similarly, using an array might suggest it's the order that's important rather than a unique key.

You can write good code without paying attention to these more subtle bits of information that you're giving your fellow developers, but it's a heck of a lot easier to write great code if you do.

The refactoring rule of thumb here is: If the order of your data isn't important and you can access it by some sort of ID, use an object. If the order is important or there's no unique ID, use an array.

Let's recap. We've now got a list of "code smells" which we know suggest a refactor might be on the cards:

Few variables
Unclear names
Could use a more specific array method
Lots of return statements inside if/else statements
Ternaries for Booleans
Using an array for data when we care about an ID, not about the order
Impure functions
Lots of comments

And we know how to refactor code with any of those problems, too. That's pretty good going!

Remember, you can just pick one of these rules and really focus on it when you're writing your own code and reading other people's. Don't worry about the others until you feel like you're nailing that first one. Before long you'll be using all of these rules and a bunch you've figured out on your own too.