Gravid Banner

Post Office Horizon Scandal

Organisational Culture

It is nothing to do with me, but I have been thinking a lot in the last few days about the Post Office Horizon scandal, and the ability of large software systems to ruin people’s lives. As I write, Paula Vennells is finishing her wholly inadequate testimony to the independent inquiry.

I was never employed by the Post Office, but I did work a former subsidiary company which had inherited much of its management culture. I can’t say I’m very surprised by her comments or performance. Nearly every office in our department had a photocopied sheet on the wall listing the known shortcomings of communicating to senior management (The Plan). “Promoted to their level of incompetence” was a phrase I heard often to describe a manager or executive who was failing in their role. In the ranks, management was seen as more of an enemy than the competition. I coined the catch-phrase “communication is our business not our policy” to describe the curious cold war that existed in the business.

Managers made no effort to understand systems

I lost count of the times I traveled to significant meetings at which IT systems were an important agenda item to sit through hours of drivel with IT not being discussed. Such meetings ended with the phrase “we’ll leave the IT questions for the boffins”, requiring a second meeting to be convened with no non-IT staff present.. Senior management made no secret of the fact that, not only did they not understand IT, they had no desire to do so. The only time it interested them was when it was either costing them too much or taking too long.

The spineless middle manager

In fact many senior managers seemed only interested in good news, or how they could couch bad news as not so bad news. In situations where a senior executive is a bully many if not most of their immediate reports become spineless yes-men.

I once had the privilege of walking around a field with two senior managers who both agreed with me that their boss’s plan was clearly bonkers and someone needed to put an immediate stop to it. Not two hours later I was an observer at the board meeting (where I as an underling was forbidden to speak) and watched both fail to offer any challenge and meekly acquiesce to the bonkers scheme.. Said scheme unravelled several years later at a not-insignificant cost in employee good-will, and loss of talent to the business.

Where are the accountants?

The biggest question that I have about Horizon, however, is not about IT or management at all. It is about accountants. Or rather the lack of them. Horizon was at its heart a finance system. The scandal is a finance scandal. So where are the accountants? I have worked on finance systems, and with every such system I have worked with accountants. I don’t understand accounts as well as they do and they know less about software engineering than your average teenager but we worked on the same team. My experience of accountants is that even if they have no clue about software, they are usually pretty sharp at tracking down suspicious transactions.

In the evidence we have seen to date there has been a wealth of email and anecdotal evidence – but precious few spreadsheets. Maybe no-one was analysing the underlying data but I find that extremely hard to believe.

Bugs in the system

To say Horizon had no bugs is clearly wrong. No software of any size has no bugs – but not all bugs are created equal. A glitch which makes a pixel green instead of red is clearly less serious than an error which subtracts a number instead of adding it. For any bug which affects the bottom line, there were probably many thousand of issues affecting the user experience or system performance.

As any good software engineer will know bugs are usually conditional. A piece of code can be called hundreds of times and display no defects but still contain a branch which in the right circumstances will lead to an error.

In software we talk about a bug being reproducible. Just because I cannot reproduce a reported issue doesn’t mean it isn’t there. It means I don’t know the exact the sequence of events which occurred in order for it to occur. Users can often find bugs that developers do not simply by using the system in a different way. Two users can be sat next to each other and experience different behaviours because their work practices are different.

I learned the hard way that whilst users may whinge excessively and call you or your software rude names (you have to be quite thick skinned as a lead developer), but when they report bugs they are generally telling the truth. The trick is not to dismiss bugs as being user error but to track them down, or at the very least fix any data corruption which has occurred as a result of them. In a finance system that is where the accountants come in .

What kind of bug is this?

As the Horizon scandal has unfolded I have been wracking my brains as to how it all went wrong to start with. At the risk of sounding like Paula Vennells, I can’t recall ever working on a finance system where the software itself introduced erroneous transactions. But I have seen all sorts of weird behaviour even in previously robust systems. I recall once being tasked to find out why a messaging router which had performed without obvious defect for many years had intermittently started not sending messages to key systems. It turned out a previous colleague had hard-coded a limit of 64 addresses into the system – and these were randomly assigned on reboot. When system number 65 was added one system would not receive any messages intended for it. On the occasional day when the router itself became address number 65 nothing received any messages at all.

Whilst I accept that allowing super-users an unauditable back-door into the system represents a potentially criminal risk I find it difficult to believe that the many thousands of pounds of mismatch were all the result of third party interventions, unless there were a lot of prolifically negligent or outright malicious super-users.

I’ve seen finance systems exhibit issues from user error – such as putting a foreign currency amount into a sterling field but these tend to be isolated issues and relatively easy to track down. So how can a massive multi-user system exhibit systemic data corruption for a subset its user-base? Another question I have is that all the incidents we have seen reported relate to shortfalls. Where there no instances where branches encountered the opposite? More money in the safe than in the system?

For a financial transaction to enter a system – unless a rogue developer has written a script to randomly create entries – there has to be a trigger. Either a manual entry or adjustment, an entry from a connected peripheral such as a card scanner, a request to reconcile an account or a regular house-keeping routine. All sorts of possible bugs exist, from putting the wrong sign on a transaction, recording a single transaction multiple times, applying an incorrect percentage adjustment (eg VAT calculation), crediting the transaction to the wrong account (eg to a different office), applying a non-cash value (eg a date) into a cash field etc. The thing is, all such errors should be possible to trace with a bit of persistence and a decent audit log.

Software causes errors, but it is people who make judgements

Whilst the software may have been the source of the issues, the persistence of them is entirely human error. Most bugs, however well hidden, should have been traceable by walking through a balance report – although these seem to have been curiously absent. In the Post Office the purpose of audit seems not to have been spotting and fixing errors in the accounting – but apportioning blame and pursuing fictional losses.

For all the denials of senior management, there seems to have been a systemic failing at a middle management level. How is it possible that people who repeatedly called helplines to point out issues were not involved in a routine inspection of a day’s transactions to identify where or how money was leaking from the system? My guess is that several people know the answer to that question both within Fujitsu and the Post Office – and they are gratefully hiding in the shadows whilst larger fish flounder in the spotlight. Middle-management and local level accountants have to have been involved in all of these cases. Choosing to hide behind ‘the software is correct’ might seem the easy option when one person is having issues that no-one else is seeing but becomes increasingly indefensible as the number of affected users grows.

Sadly with so many years under the bridge we will probably never know how the system got to be so unreliable – but hopefully the scandal will serve as an object lesson to other companies that no software system or management structure can ever be unassailably deemed to be right and that whistle-blowers and complainers are worth paying attention to.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *