Fixing Poor Cohesion

July 30, 2017

Introduction

In 2013-2016 I worked on a mid-size financial software, which had been quickly scrambled together at cost of maintainability. As the software was backbone of the business, it is what could be called a successful software project.

The software needed quality improvements. When I stepped in to the development, there was a lot of going back and forth with issues. The complexity of the software caused a situation where it was hard to see the full picture. So one part would be fixed, introducing issues to another part. Fortunately, quality got a first priority from management, and the ensuing several years of refactoring and bug-fixing did much to change the situation.

Cohesion: how tightly bound or related its internal elements are to one another

In retrospective, most of the issues could be narrowed down to being issues of poor cohesion, or to quote a colleague of mine back then after saying the word too many times: “every time you say ‘cohesion’, a kitten dies”. Edward Yourdon has throughly described levels of cohesion already in 1979.

Coincidential Cohesion

(Worst) Module elements are unrelated.

When starting from the poorest level of cohesion, the first one is coincidental cohesion.

As a simple, but real example, a get() method would suggest that it is used to fetch some data, but instead, that method would change state of some object. This can cause a situation, where it would be perfectly normal for a developer to assume a correct implementation:

$one = new Number(1);
$three = $one->add(1);

However, we could run into some unexpected situations, if implementation of add looked like this:

public function add($number) {
    $this->value / $number;
}

In this case, a division operation is grouped under an add function. There is only one relationship between name of the function and the operation contained herein, and it is the fact that the operations are part of the function definition. There is no logical connection.

A module with coincidental cohesion does not yield any benefits of modular design, in fact it is far worse than a procedural program would be. Any maintainer would have to read every function definition to understand what the code does. The benefits of modular design come from hiding complexity, but in this case, only more complexity is added without getting any benefits. My guess is that such code implies the author either not getting enough sleep and/or being overly stressed or distracted.

Logical Cohesion

Elements perform similar activities as selected from outside module; i.e. by a flag that selects operation to perform.

Several years ago, when MVC frameworks were a new thing in the PHP community, there were many examples of controllers that mingled displaying an input form, and processing of data from the form, such as this:

public function action() {
    $param1 = $this->getParam('param1');
    $param2 = $this->getParam('param2');
    if ($this->getMethod() === 'post' && $this->isValid()) {
        // Persist data
    }
}

In a small example, this is quite fine. However, this can become a problem when more actions are added to the same action. Let’s say we add a page that returns JSON data:

public function executeTermsAndConditions() {
    if ($this->method() === 'get') {
        // show one thing
    }
    if ($this->contentType() === 'application/json') {
        // show another thing
    }
    if ($this->method() === 'post') {
        // do some processing
    }
}

Each block is doing a different thing:

first shows the form,
second shows a JSON feed
third can show a form, it can also do processing and redirect to another page. Every blocks does a different thing, at a different time. This code has a low level of cohesion called logical cohesion, where each item is only logically related. A better way would be abstract common logic to models, have several thin controllers, and let the router decide which bit of logic to execute.

public function executeTermsAndConditionsForm() {
    // show one thing
}

public function executeTermsAndConditionsFeed() {
    // show another thing
}

public function executeTermsAndConditionsSubmit() {
    // do some processing
}

This has many advantages. First, it becomes clear what functionality there is. In a function of several hundred lines, it might take a long time to understand that a function can be called in several different contexts. Secondly, with less lines, there is a lower chance of having many different variables, such that it becomes difficult to keep track of each variable’s value. It is almost impossible to not introduce bugs to code that you cannot reason about. Additionally, it is much easier to test code that is responsible for one thing only.

Temporal Cohesion

Operations related only by general time performed (i.e. initialization() or FatalErrorShutdown())

Another common theme in controllers was to have huge initialization methods. In general, the controllers looked like this:

public function indexAction()
{
    public function executeLoan() {
        //
    }
    public function executeTermsAndConditions() {
        //
    }
    public function executeLoanApplication() {
        //
    }
}

Some of these actions had special pre-conditions:

public function __construct() 
{ 
    // setup
    if ($action === 'loan' || $action == 'termsAndConditions') {
        $this->executeCreditScoreVerification();
    }
}

Of course, the pre-conditions had much deeper, nested logic. Due to their complexity, they were usually ignored, unless something was broken.

This type of cohesion, which is grouped only by it’s time of execution, is called temporal cohesion. All the temporal bits of pieces need to be glued together to understand the final behavior.

A good solution is to lazy-fetch data when it is needed, instead of using initilization. In this case, the operations could be moved from the initialization module to each of the methods. Abstraction will be required in more complex cases to avoid code duplication.

public function indexAction()
{
    public function executeLoan() {
        $this->executeCreditScoreVerification();
        //
    }
    public function executeTermsAndConditions() {
        $this->executeCreditScoreVerification();
        //
    }
    public function executeLoanApplication() {
        //
    }
}

The main benefit of this change was better understandability, as all of the related code would be grouped together (cohesion!) Of course, it’s also more efficient to compute, because a lot of the code in the initialization dealt with determining the context: is it loan, terms and conditions, or loan applicatiion controller? In the action methods we already know the context.

Procedural Cohesion

Elements involved in different but sequential activities, each on on different data (usually could be trivially split into multiple modules along linear sequence boundaries).

Reproducing a bug is important to fix bugs. It helps when code is split to small, testable modules; if the bug can be isolated to a small module, it becomes easy to reason about it, and to create tests to verify the bug is actually fixed. Things become more complicated when components have some side effects.

Our team had a lovely case where the automated test suite would fail under some very specific circumstances, such as last Friday of the month, if the next Monday is a holiday.

To make things more interesting, it was not possible to execute this particular module without running a (mock) credit check, creating a loan payment schedule, and sending a welcome e-mail to the customer.

public function createLoan()
{
    $this->checkCreditScore();
    $this->createPaymentSchedule();
    $this->sendActivationMail();
}

Breaking up the module to different sub-modules made it dramatically easier to test each of the components. Not to mention, the required effort to reason about these modules became much smaller. One of the resulting components was open sourced, and can be found at https://github.com/cashongo/chronoShifter.

Summary

Cohesion, along with coupling is one of the guiding principles in object oriented design, and most other principles are applications of the concept of high cohesion and low coupling. Not following these concepts will not give any benefits of object oriented programming, but rather the opposite; object oriented programming does come at a cost, and when not following the principles correctly, you will get the cost, but not any benefits!