In the Office of Information and Technology (OIT), we often talk about the idea of delivering on VA’s digital transformation by getting “Back to Basics.”
That can seem like we’re going backward versus moving new innovation forward. What we mean by this is that over the years, amid evolving mandates, shifting priorities, scope creep, and a host of other systemic issues that impact large, service-oriented IT organizations like ours, it can be all too easy to lose sight of what makes us the IT organization that VA requires—or rather, our core mission.
In OIT, our core mission is to deliver IT products and services to VA and our Nation’s Veterans. To accomplish this, especially in the constrained funding environment we find ourselves in, we must ensure that every ounce of energy we expend and every resource we deploy is focused on this essential mission. When we say we’re getting “back to basics,” we’re talking about pursuing excellence in organizational strategic planning, deploying a strategy to secure and protect VA’s vital infrastructure and our Veterans’ information, and examining how we engineer and maintain key IT products and services, avoid downtime, and service our VA customers.
I’d like to share some examples of this approach as it relates to how we’re getting back to basics here in OIT.
Daily Incident Review
The first is our daily incident review, an 8am standing meeting in which a small team gathers to review and triage any critical systems failures that VA has experienced in the previous 24 hours. (Any sci-fi fans might think of it a bit like the “damage reports” on Star Trek.)
Running VA’s extensive IT infrastructure is a sprawling operation. We serve more than 20 million Veterans through health care and benefits systems, requiring nearly 2 million pieces of IT equipment. The scale and complexity of the job requires our close attention, and it requires that we embrace a culture of continuous improvement. That’s why we huddle each morning for a comprehensive review and diagnosis of all incidents occurring over the previous day — network outages, responsiveness issues, facilities closed for unplanned maintenance, and virtually anything else you can imagine. This practice ensures a holistic view of systems health and helps us identify patterns early on that could indicate potential issues or instabilities.
Every critical or high-severity incident undergoes a root cause analysis, leading to the identification of preventive actions. These reports are now integrated into our daily operational status reviews, enhancing overall transparency and accountability.
I believe this practice has contributed significantly to our improved rigor and our recognition as an IT leader in the federal government. Moreover, these daily stand-up discussions reinforce the value we place on continuous improvement every day.
I’m proud of the results we have seen as we continue to scrutinize our systems and close any gaps in our security and reliability posture. For example, after excess temperature resulted in several outages and extensive downtime of our IT systems last year, we developed a solution that automatically alerts us when any device begins to overheat. We used this innovation to avoid potential downtimes more than 320 times over the past year.
We have also invested significant effort in improving how quickly we can detect and respond to IT incidents, putting in place an automated major incident workflow that has helped us proactively identify 19 major incidents. Our internal data also shows that we have become 40% faster in responding to major incidents compared to the previous year.
Code Yellow
The second example of how we’re getting back to basics is our “Code Yellow” process. We stood up this process to address defects in our critical IT services, focusing relentlessly on solving issues that span the VA organization, and including building additional continuous monitoring to catch such issues in the future.
We established clear and comprehensive exit criteria for declaring an end to a Code Yellow to ensure all issues are resolved. An example is our Code Yellow for recent glitches with claims processing on VA.gov, our “digital front door” for Veterans.
We had a few key goals under the Code Yellow that we stood up for VA.gov:
- That the most important applications and features on VA.gov that Veterans and their loved ones depend upon are monitored in real time for issues.
- That the health of these most important applications and features are all accessible from a single dashboard to allow us to easily detect any issues or outages.
- And that a government employee is aware of and responding to significant issues within 24 hours of detection.
So far, we are moving in the right direction. We have created a unified “watch tower” in which all monitors are consolidated. We’ve also established a standard operating procedure to ensure that alerts are triaged as they take place.
As of December 2023, 80% of VA.gov’s most critical features are monitored. We’re also on track to complete automatic monitors on these top features early next year.
This approach ensures that OIT has a comprehensive view of VA.gov’s health, ensuring that any potential glitches are both identified and addressed promptly. Moving forward, we are committed to closely watching and tackling any challenges that threaten to serve as roadblocks for Veterans in need of services and benefits.
For us, getting “Back to Basics” is foremost a recognition that we are an engineering-focused organization. We build products and services focused on the needs of veterans and our internal stakeholders, and we continuously seek to operate these with a high degree of rigor and precision. The daily standup and Code Yellows are two examples, but there are many others that span the entire product development lifecycle, including:
- Segmenting our work into understandable portfolios and product lines
- Defining and driving each against a clear vision, roadmap, and measures of success
- Focusing our work and measuring our success based on OKRs (Objectives and Key Results) that we refine each semester
- Ruthlessly prioritizing our work to focus it on VA’s highest priorities, maintaining 1-N lists of work that is “below the line” (not yet funded)
- Holding regular Operational Status (aka OpStat) reviews, in which we drill into the performance of our “Critical 100” systems, focusing on those not performing to standards and defining get well plans
- Pushing forward the state of the art in how we work through our Engineering Excellence forum
- Ensuring that OIT team members have clear career ladders and training to enable them to build their skills and their careers.
These efforts collectively create an environment that helps us get better at what we do each day. It helps us deliver services that are always available and accessible to our Veterans and members of the VA team — the largest civilian organization in the U.S. government, and the country’s largest integrated health care provider. This massive responsibility dictates that we not only deliver on the services we create with high reliability, but that we get better at what we do each and every day.
That’s what our Veterans, their families, and their caregivers expect — and deserve.
In this article
Continue reading
3 weeks ago
Is AI Overhyped?
A conversation from the chat logs of the Department of Veterans Affairs Assistant Secretary of Information and Technology and Chief Information Officer Kurt DelBene and Chief Technology Officer and Chief Artificial Intelligence Officer Charles Worthington.
2 months ago
Reducing Complexity in Government IT
VA is one of the best places — if not the best — to be in federal IT. It has an incredibly inspiring mission, and it has great people committed to service.
6 months ago
Focusing Our Efforts with OKRs
Ever wondered why some organizations consistently outperform others? It’s all about setting and measuring the right goals. In OIT, our OKRs keep us on track and drive excellence. Learn how we’re making strides to be the top IT organization in the Federal Government!
April 15, 2024
OIT’s Approach to Daily Standups
OIT's daily standup? It's where we crack the code on incidents and plan ahead—making things better for our Veterans! Learn how we do it.
March 29, 2024
Leading by Example: Creating Exceptional Digital Experiences at VA
Digital transformation across industry over the last decade has profoundly impacted consumers’ lives and their expectations of access to services. Increasingly, people expect that same sort of fast, intuitive interaction with their government.