A characteristic of virtually all great software engineering organizations that I’ve experienced is that they relentlessly focus on operational excellence. The software and hardware systems we create are remarkably complex. They were built by mere mortals, so many defects were unintentionally designed in. In addition, while software doesn’t age, hardware does, inducing even more failures. And even if software doesn’t age, over the passage of time our software systems will be introduced to unforeseen scenarios that weren’t anticipated. All this means that the complex systems we all run are prone to failure. The only logical approach to combating this is, and the one Spock would surely take, is to be constantly worrying that we are always on the verge of the next failure, and to embrace a culture of continuously and relentlessly searching out vulnerabilities and remediating them. We exercise this approach in the VA, constantly seeking to improve our operational excellence.
We do this through the following:
- We conduct standups every morning where we triage all critical system failures in the past 24 hours, ensuring they are being adequately addressed, seeking for the underlying cause of the failure, and developing an approach to ensure it doesn’t happen moving forward.
- We define OKRs (Objectives and Key Results) for the organization overall and for each of the teams within it. These OKRs are the critical improvement results we seek to accomplish in the next several months. They span from the high level (e.g., we will have a published vision document for all portfolios) to the very specific (e.g., we will reduce the incidence of change-based failures from X% to Y%).
- We strive to build systems that are highly resilient, and we track the key aspects of that resiliency. We establish explicit uptime goals linked to the system’s criticality. Higher criticality systems have higher uptime goals, and we work to ensure the architecture of these systems will support these goals.
- We constantly make priority-based tradeoffs, ensuring that we are focusing our efforts on the most impactful work we could be doing. We constantly have discussions like “Yes, we could fix this, but is the juice worth the squeeze?”
- We are equally focused on our resource allocation. We stack rank the work we are doing and the work we don’t have the resources to do, so that we’re always focused on the most important work of the organization to serve our Veteran stakeholders. When more resources become available, we fund the next project off the list.
- We create and maintain scorecards and dashboards that report the status of our systems and the work we do. Our scorecards are designed not to tell the wonderful story of the work we do, but rather to identify the places where there is more work to be done.
- We exercise Operational Excellence in our approach to Cybersecurity. Nowhere else is there a more complex landscape than the cybersecurity landscape of an organization like VA. We are guided by a “zero trust” north star, and we use this north star to define our priorities. We continuously improve our rigor in identifying and engineering out vulnerabilities and improving our monitoring of and response to threats.
- We’ve established an Engineering Excellence community of practice, where we can identify best practices and propagate them through the organization.
- In all we do, we practice a culture of “embrace the red.” We don’t seek to assign blame. We seek to solve problems.
Operational excellence and continuous improvement are at the heart of all we do. We haven’t achieved nirvana yet. Because of this, we also constantly seek to improve our approach to operational excellence itself.
Continue reading
1 week ago
Is AI Overhyped?
A conversation from the chat logs of the Department of Veterans Affairs Assistant Secretary of Information and Technology and Chief Information Officer Kurt DelBene and Chief Technology Officer and Chief Artificial Intelligence Officer Charles Worthington.
1 month ago
Reducing Complexity in Government IT
VA is one of the best places — if not the best — to be in federal IT. It has an incredibly inspiring mission, and it has great people committed to service.
5 months ago
Focusing Our Efforts with OKRs
Ever wondered why some organizations consistently outperform others? It’s all about setting and measuring the right goals. In OIT, our OKRs keep us on track and drive excellence. Learn how we’re making strides to be the top IT organization in the Federal Government!
April 15, 2024
OIT’s Approach to Daily Standups
OIT's daily standup? It's where we crack the code on incidents and plan ahead—making things better for our Veterans! Learn how we do it.
March 29, 2024
Leading by Example: Creating Exceptional Digital Experiences at VA
Digital transformation across industry over the last decade has profoundly impacted consumers’ lives and their expectations of access to services. Increasingly, people expect that same sort of fast, intuitive interaction with their government.