Managing IT systems at VA is a massively challenging endeavor.

VA is the largest civilian organization in the US government. We’re also the largest integrated healthcare provider and one of the largest distributors of benefits in the US. As you can imagine, this creates huge complexity in our base infrastructure and networks, as well as the over one thousand systems that we operate. On top of that, we support over two thousand locations with IT services and over 700,000 end user devices. Keeping an enterprise like this operating well is incredibly complex. Add to this the many stakeholders we must satisfy, the ever-changing regulatory environment, and the rapidly evolving tech landscape, and the problems can appear insurmountable.

To tame the complexity, the absolutely most important priority is that we be vision-driven, as I’ve discussed elsewhere. We break the problem down into pieces (in our case “portfolios”), we make sure we have a clear vision for each that is aligned to our stakeholders’, and we define roadmaps that we execute on incrementally, measuring our progress via well-defined OKRs (objectives and key results). Our feature roadmaps are relentlessly prioritized, creating 1-to-N lists and a cut line. The feature or capability above the cut line is “the last ‘yes,’’’ and the one below it is “the first ‘no.’”  

We also focus relentlessly on operational excellence, looking at performance statistics for how our systems are operating and where we need to up our game, so that they function more reliably and effectively going forward. We create a culture of “embrace the red” in all we do, focusing not to blame but to improve as a team.

Despite this growing engineering rigor, we face a modernization challenge. It bears repeating: we have over a thousand systems in our inventory supporting “businesses” of massive scope. Our funding, although substantial, is lower than what’s typical in the commercial space, and lower as a percentage of discretionary spend than our peer federal agencies. We operate systems that can be quite dated, even including some mainframe systems still written in COBOL. As a result, we have a very large modernization debt that we must prioritize.  

We also live in an environment that undervalues investments in modernization. We have a huge backlog of feature and capability requirements. Modernization is perceived as a cost with little reward. But when ignored for too long, systems become so antiquated that the path forward seems unclear. We begin hearing the Siren’s call to “rip-and-replace,” feeling that new technology will be so much better than what’s in place and will deliver a massive amount of incremental benefit. Our IT industry promotes this by both extolling the virtues of the new technology as well as setting an aggressive end-of-life timeline for existing technologies. So, if you wait long enough, you face the difficult choice of going without complete support for systems that, while not perfect, are at least meeting the needs of the customer.

“Rip-and-Replace,” or “Big Bang” as it’s sometimes called, is fraught with peril – so much so that I believe it’s at the heart of the majority of IT project failures. The benefits of the new system are oversold. The new capabilities are described as a perfect match. The interoperability complexities are minimized. The date for the delivery is way too optimistic. And the whole thing is promised at a price tag that’s well below the ultimate cost. On top of this, our stakeholders often don’t understand the uncertainties involved in software development. You never know the complete cost of a software project until you get deep into it and uncover all the intricacies of the business rules it embodies and the interoperability requirements with the rest of your system.

Because of these perils of rip-and-replace, we in OIT are working to avoid it wherever possible. We do this through several efforts:

Set goals on system retirement, especially as we transition systems to the cloud. Systems pile up and never go away. It takes conscious effort to do an inventory of systems to determine which ones can removed or consolidated into other systems. We set explicit goals on the amount of system retirement we’ll accomplish each year. As systems are migrated to the cloud, this is a great time to do this inventory. Decide which systems will be migrated, which will be rewritten, and which will be end-of-lifed.

Modernize incrementally, creating evergreen systems. The best way to avoid a big bang modernization is to modernize incrementally over time. Choose the oldest part of the system and modernize it. If you do this with conscious forethought and modernize a portion each year, you’ll end up with a system that is continuously being re-envisioned and is up to date. Moreover, as the industry and technology change, the design you choose will change with it. This is in contrast to a big bang modernization where you’re making a bet on a solution at a single point in time, freezing many of your future technology choices.

When you must build a completely new system, start with a small deployment footprint to gain confidence. There are times when you have no choice but to build a new system. This is especially the case when you’ve avoided modernization for so long that the existing platform is being end-of-lifed, or the fundamental capabilities of the system no longer meet the most basic needs of stakeholders. Even in this case, it’s best not to go “all in” with the new solution until you’ve proven to a good degree of confidence that the new system meets the need. In such cases, we seek to define the “Minimum Viable Product (MVP)” and make sure that the first one or two milestones involve validating that the MVP can truly meet the need. If we’re purchasing a commercial-of-the-shelf solution, we starting to define those first milestones contractually to put pressure on the vendor to prove that the system can meet the needs. We’re using this more and more at VA when we do large system replacement.

Question whether you have the right investment level of all systems. With a thousand systems at VA, I’ve found a number of cases where a troubled system was categorized as being in “deep sustainment.” That means the system is just being kept operational. This is a dangerous place to be, particularly for a system that serves a critical function. A close cousin to such systems are ones where there’s only modest investment, and the team keeps slipping release timelines as a result. You ask if more resources could help, and the response may be “there are no other resources available.” In all cases, the resource investment for every system in the portfolio must be evaluated against the business strategy and continuously evaluated for sufficiency as conditions change. We are just now coming to grips with this in OIT. We need to dust off the assumption of even the most obscure system to see if the investment path for that system still makes sense.

Don’t ignore modernizing your core infrastructure. Investing in systems is exciting because it provides new functionality to end users and stakeholders. At the same time, if you ignore your underlying infrastructure, you’re building on a crumbling base.  Operating systems go past their end-of-life, often including the commitment by providers to supply essential security updates. Storage systems no longer meet capacity requirements. Data transport protocols become vulnerable to security exploits. Moreover, these are some of the hardest components to upgrade. They often have broad footprints, and everything above them in the stack introduces massive dependencies that can require extensive code rewrites and validation testing as part of the deployment process. Because this is some of the most insidious technical debt, we are renewing our commitment to proactively updating our core infrastructure, so that we get current and stay current, and the huge modernization overhang that we face now can be avoided in the future.


A final word on managing our vast IT infrastructure: as we break down the complexities of our systems into manageable portfolios, we must explicitly address our funding strategy. That includes earmarking resources for modernization initiatives every year, underpinned by a well-defined investment strategy; but we must tread carefully and recognize that modernization investments can impact our ability to fund new feature development.

And in our dynamic environment, where new mandates and feature requests often come without the funding to implement them, we must advocate for the necessary funding, or note where a funding gap will imperil our broader modernization strategy and then adjust the organization’s strategic expectations.

One of the ways we’re doing this is through our flexible 1-N priority list, a tool that allows us to continuously rank our priorities and then essentially move a cutline up or down based on the available funding. With new mandates and unfunded requirements, that cutline goes up and planned investments fall off. This is the relationship you must understand to pursue modernization — explicitly tying the budget to what you think you’re going to accomplish. It also helps immeasurably in communicating with stakeholders about the tradeoffs you’re making every day and how one more set of requirements pushes critical functionality “below the cutline.”

Part 9. Embrace the Red
VA CIO Kurt DelBene on stage at the DigitalVA Expo speaking in front of in-person attendees.What does a great product manager look like in the federal government?

Continue reading