COBOL 2020 – why are we still using it?

With the 2020 Covid-19 Pandemic going on, it has brought to light just how old many computer systems are running on our government. It is shocking to people to learn that today’s computer systems are running v, a 60+ year old programming language, mostly on IBM mainframes.

What is more shocking is just how much many of the news papers are getting wrong. But it isn’t just their fault. When we have college professions stating:

“There’s really no good reason to learn COBOL today, and there was really no good reason to learn it 20 years ago,” says UCLA computer science professor Peter Reiher. “Most students today wouldn’t have ever even heard of COBOL.

FastCompany Article: What is Cobol

The reality is many company’s still use Cobol. And while things like Java, JavaScript and Python are really “hot” languages right now, the reality is that many, many fortunate 500 companies still run it for a lot of their critical systems. Further it was errant when FastCompany stated that “[o]ne key reason for the migration is that mobile platforms use newer languages, and they rely on tight integration with underlying systems to work the way users expect.” However, the overall trend in technology has been t de-couple your code, not tightly-couple it.

Further evidence to this fact is that most legacy airlines are also running equally legacy code, yet they still have performant Web 2.0 and mobile interfaces. They do what everyone else has been doing which is layering modern technology on top of older frameworks using API. Currently we use fancy things like GraphQL and REST API, but the concept of an API is nothing new. SOAP interfaces have been around a long time (1998). Or how about POXIS (Portable Operating System Interface – aka IEEE 1003) from 1988.

Before I get started, let me stay that I’ve been involved in technology since 1990 to one degree or another, and remember fondly the days of working on those ‘green screen dumb terminals’. I’ve personally done work on COBOL and other mainframe style systems like the AS400 which some were written using a version called COBOL/400. I have experience in mainframe systems from airlines to manufacturing/ERP. As well as more modern operating systems from Microsoft Windows and Red Hat Linux. And we as the novel web development frameworks and stacks (PHP, JavaScript, etc).

Why do we continue to use COBOL? Because it is ‘relatively’ rock solid compared to most programs you see today. The uptime on these systems have been measured in years – not days or hours. We’re going well beyond things like 5-nines uptime (99.999%). And this isn’t using fancy cloud based, fault tolerant systems. But rather just one clunky old IBM mainframe. The software simply works, and works well. However, what it doesn’t do is scale all that well. And often, what we’re seeing isn’t the failure of COBOL, per-se, but often the modern interfaces that people have layered onto of COBOL failing.

Reliable technology is essential to businesses that expect to be working for decades, who invest millions or billions into the software.

And its not just “old stuff” that is using older hardware/software. We can look at things being made brand new this year, such as the Boeing 737MAX series, which is running hardware equivalent to the 1990’s NES (Nintendo Entertainment System). It reason is that it is battle tested and extremely reliable. It isn’t broken and it has more than enough computing power for the task.

Forget about tech startups for a moment. If you were building a new system that you need to be still working 20 years from now without ‘patching bugs’, but simply needed to continue to perform exactly the same things – would you choose a system that is new/novel that may or not be supported, or would you go with a system which literally has been supported for decades and is in-part propped up by the fact that most fortune-500 companies are also in the same boat as you?

Perhaps now, it starts to make a lot of sense.

And for that reason it seems that Mr. Reiher is rather out of touch with reality. Yes, there isn’t a huge growth market for COBOL engineers – if anything the year over year need is probably shrinking – but also the number of programmers are retiring even faster. Creating not only a great need, but also a fantastic pay opportunity with nearly zero competition.

It is also a really simple language to learn and is objected oriented which most people should be familiar with as it’s used in a lot of modern languages. The challenge for the emergent issues is the experience needed to understand and reverse engineer someone else’s code. A short hello world program is easy in just about any language, but of course, what is needed is mastery. vMany, many businesses have tried to migrate away from old mainframe technologies, without success. There is just too much build in business logic, that is sitting there, unrealized, but extremely important. When they try to reverse engineer it, and rewrite it into a more modern language, features always drop away.

And it just isn’t COBOL and those who use it are stuck. Here are a few other examples:

  • Microsoft has attempted to get away from their “DLL Hell” something Microsoft has tried, and failed to get away from since day one – but still even the later Microsoft Windows 10 still has linger legacy code hardening back to Windows 95.
  • Adobe Software tried to reinvent their products to be web based instead of purely installed applications – even after 5 years of development on products like Photoshop and Lightroom has resulted in product which have only a small fractions of the legacy features – sure some neat new things, but a lot of the old functionality is lost.
  • Airlines who spend millions of dollars each year on licensing to GDS (Global Distribution Systems) which also run legacy code, are trapped using ancient COBOL like technology. It is the primary reason why in 2020 you still are limited to buying no more than 9 seats at a time – the underlying ticketing system can only accept a single digit number.
  • State Farm Insurance has been built on COBOL – and when I was 16 years old I worked on their old green screen terminals. Over the last 30+ years they’ve been working to transition to modern tech stack. For a period in the early 2000 what they did was bring PC’s in to the agent offices, and you had access via a separate terminal window to basically the mainframe system. In the 10’s the introduced a web interface where it was more modern interface, but at the end of the day, not only was COBOL the underlying database and performing the business logic, there is still certain things that can only be performed by going back into the dumb terminal.

One way to look at it is this — for the last 20, 30, 40 years a company has been investing into feature enhancements and tweaks. That is a LOT of code, and business logic that has been changed. This is muddled in with a lot of bad, legacy code that might not do anything anymore. Worse, over the course of time there has been bad developers come along and instead of fixing or addressing an issue properly wrote an obscure bit of code to work around something they didn’t understand.

Has anyone successfully migrated?

There is one company who did successfully completely rewrite their system which comes to mind – around 2000 Apple Computer completely replaced their operating system for the Mac. When it changes from OS 1,2,3..etc., to OS X – it has never been the same again. And along the way it broke just about everything. Apple changed both the hardware and software. And therefore older pre OS X hardware couldn’t run OS X, and most software was not compatible either. It was basically a cut-your-losses, which their was many. And Apple hardly started from scratch, rather it was based on Unix. So it wouldn’t count as a migration, but they did the change.

Can’t we do that with our legacy unemployment systems?

Absolutely it is possible, but extremely expensive. Every state has different custom rules, so a software company who has a competitive alternative will not only have a big price tag, but also an even larger cost to customize it to make it similar to your existing system. Often these costs are more than 10 years of operating expenses continuing to use COBOL.

What would I do if I was the Director of Technology for a company still using a COBOL based system?

As someone who has experience maintaining legacy code, as well as projects to completely re-write a system — here is what I would do. To ensure the greatest possible uptime and reliability, I would first decide the language framework I’m going to use. It likely would be moving to objective-C or something similar, possibly Java (not to be confused with JavaScript) or maybe PHP. I would build out a decoupled system with a modern front-end framework (Vue, Angular, React, etc), and then use that to access my “modern” controller/model, which would start by just transparently passing through to the “legacy” system. I would progressively start moving the business logic from the legacy to modern system. Until we’ve eventually moved everything over to the modern system.

This looks a lot like what I believe State Farm Insurance is doing currently. I would expect this project to easily be a decade long process or longer. Something no politician would like and it wouldn’t win any popularity vote as being seen as ‘addressing the problem’. But IMHO it is the best route forward.

The alternatives are to throw ungodly amounts of money at purchasing a new system outright and then customizing it, and having a LOT of broken things along the way. I’d rather take years to move over each system of an unemployment system and get it right, versus trying to flip the switch on a new system and mess up peoples unemployment checks.

The end result is a more affordable, reliable and stable change – that takes time, versus another expensive quick-fix.

But what about the people suffering now?

What people are looking for is a reactionary measure instead of a response. The reality is that in a few days to weeks all of the backlog will be solved. Also realize one of the biggest reasons for the backlog isn’t the technology but the staffing levels. But beyond that, regardless of the various reasons it will be worked out in days-to-weeks. However, as someone who as implemented large scale systems serving millions of end users, something new cannot be implemented overnight – it would be a month’s long project. Therefore, throwing money at the problem will not make a meaningful difference for individuals right now. Same thing if we on boarded double the number of COBOL programs nationwide, you’d see only an incremental increase in the processing of claims. Rather, the focus today should be on how to respond to this situation, not react. What do I need to do so that 10, 20, 30 years from now the choices made today will ensure continued success.