A few weeks ago Steve Yeggeposted an article about code base size and it’s negative effect on projects. While I agree that his example of a 500k LOC project is horrid I’d have split some hairs here and say that lines of code (or bloat as he refers to it) isn’t the problem. The problem is a horrible application architecture and lines of code is just one of the symptoms. Other symptoms may include execution speed, memory consumption, lack of encapsulation, security vulnerabilities or a host of other issues. I’m curious as to what political/managerial/architectural situation arose that allowed a single application code base to grow so large.

Do you have an application architect? Why didn’t the application get carved up into distinct loosely coupled systems?

Do you have sufficient business representation that prevents the “just add it to application X” problems? How does the scope of an application grow so much with no oversight?

The application bloat I’ve seen typically takes three forms:

  1. Very little code reuse is taking place. How many separate logging infrastructures do you have? Using multiple ORM tools in the same project? How long does it take a developer to understand all that?
  2. Reinventing the proverbial wheel. This is loosely related to #1 but I’ll call it out here because it’s so important. If you wrote from scratch any of the following for your project you need to seriously justify your decision IMHO: ORM, logging, web framework. If your language of choice doesn’t already include suffucuent choices for these utilities then you need to reconsider your platform choices. Avoid the “not built here so I don’t trust it” syndrome.
  3. Feature-itis. Knowing when to say “no” is a very valuable skill. Knowing when to say “yes” conditionally is also important. It’s ok to add that new feature but require some refactoring time so it doesn’t just add to the collective mess. Think of architectural trade offs as a karma based system. In other words for every shortcut you take now you make things harder on yourself later. For example: I can add your feature in 10 days with 40% code duplication (+ 20 days of cleanup later and every other coding task has to slow down to work around this ugliness) or I can add it in 20 days with 0% code duplication (and little to none of the other slowdowns). It’s acceptable to tell the business that the feature, if done properly, will take 20 days to implement. In fact it’s your duty to recommend the slower approach. If you have a boss who doesn’t understand those trade-offs you need to start looking for another job.

I’ve seen (and worked at) both kinds of mistakes and they all contribute to code bloat but that isn’t their biggest problem. Bringing new talented developers up to speed on the whole system is a monstrous adventure. No single developer understands enough of the entire codebase to affect significant change. The systems become so brittle over time that making small changes involves enormous regression testing challenges.

You are a developer. It’s your duty to protect the integrity of the code base you are working on. That’s part if what your company is paying you to do. Yes there will be pressures to deliver faster. Yes there will be pressures to do things that violate your developer principals. That’s part of life. Accept it. Now that you can take a deep breath what are you doing to protect your code base? Do you have a solid architectural plan that you can show to your PHB that lets him know how this feature fits into the greater system? Do you have at least some high-level design documents that allow you to justify your position? Are you prepared to defend your position to your stakeholders. “While it’s technically possible to add feature X in 10 days that will have a negative net impact on the whole project and here’s why…”.

Also, don’t think that merely switching to/from a waterfall, agile etc shop will magically fix these issues. It won’t I’ve seen both kinds of development shops make all of these mistakes.