Veteran Crisis LineGive Us Feedback
  • An official website of the United States governmentHere's how you know

    Official websites use .gov
    A .gov website belongs to an official government organization in the United States.

    Secure .gov websites use HTTPS
    A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

PlayBook

To open a link in a new tab, hold down the "Ctrl" key (Windows) or "Command" key (Mac) while clicking on the link.

Requires Network Access

Operational
Excellence

Introduction

We reuse shared solutions, foster a learning culture of continuous improvement, and ensure security is built into every product by making sure teams have access to the tools and resources needed to deliver secure, reliable, and scalable solutions for Veterans.

By leveraging shared platforms and capabilities, we make it easier to access what you need to deliver efficiently. The VA Way isn’t about checking off tasks. It’s about continuously improving and finding smarter ways to meet Veterans’ needs.

Read the attributes below to access the detailed guidance, tools, and resources for each.

Female veteran standing proudly in front of the camera
Man learning about electrical engineering.
Veteran male doctor standing in front of the camera in a hospital
VA Website loaded up on a mobile phone

Operational Excellence Attributes:

Collapse All

2.1 We measure and continuously improve performance using transparent, common metrics across teams

Consistent performance tracking is integral not only to successful product delivery, but also to enable effective distribution of product team efforts. To ensure our products are operating at the highest standards for Veterans, their families, and their caregivers, every product team will track a common suite of performance metrics.

Ultimately, VA strives for all systems to achieve “three-nines” of uptime (i.e., application or system is functional 99.9% of the time). Product teams must rigorously monitor their systems to catch issues in real-time and conduct preventative maintenance.

  • Best Practices

    • Measure operational performance using an approved application performance monitoring tool

    • Ensure you are collecting metrics that can indicate the operational health of the product, especially the four golden signals latency, traffic, errors, and saturation.

    • Regularly review metrics and incorporate them into daily workflows in forums like standup meetings or staff meetings to engage staff in meeting metrics
    • Use a Custom Development platform with out-of-the box monitoring, or integrate approved application performance monitoring (APM) tools with applications
    • Develop robust incident management protocols for systems, including communication materials and templates to be shared with end-users 
    • Conduct thorough root cause analyses (RCAs) when errors are identified in production environments and communicate the results with end-users 
    • Continuously track indicators of reliability over time (e.g., CPU usage, web performance, error rate) as well as other performance metrics to quickly identify and resolve issues as they occur
    • Discuss metrics frequently and use results to drive the direction of future development work 
    • Transparently document where metrics are stored and how to access them
  • Guiding Questions

    • Is there a minimum bar for performance that you would expect within a particular metrics category?
    • How are other teams measuring performance of similar products? What metrics are being tracked for similar products in the Critical 100?
    • Is there a clear way for everyone on your team to access these metrics (e.g., dashboard, regular reporting cadence)?
    • What does performance in a particular metric tell you about the current distribution of resources?
    • Where else can you prioritize efforts to improve on metrics?

2.2 We reuse solutions whenever possible, prioritizing efficiency

Reusing existing software enables OIT to deliver high-quality software faster, maintain existing solutions, and allocate resources more efficiently. Reuse can take several different approaches, like leveraging an existing SaaS or PaaS product, or adding a feature or capability to an existing Custom application where about most of the code can be reused.

Teams should prioritize reuse first. If no existing product can be used or modified to support the requirement, teams should explore if there is a SaaS solution that could be purchased or if custom development using a PaaS or custom software platform is appropriate.

Key Contacts:

  • Contact 1
  • Contact 2
  • Contact 3

2.3 We use shared platforms, capabilities, and common utilities as our default approach

OIT’s continuously improving suite of platforms, capabilities, and utilities supports efficient, high-quality software development. Collectively, these shared resources automate processes and provide features and solutions out-of-the-box.

Teams are expected to use shared resources as the default approach to:

  1. Reduce cost by avoiding duplicate work, and
  2. Ensure they are driving towards the best outcomes for end-users across reliability, security, and performance

Key Contacts:

2.4 We treat every mistake as a learning opportunity, fostering a blameless culture by embracing the red and collaborating across teams to prevent future problems

A culture of hiding mistakes can have a profound impact on an organization, including slower reaction times and reduced use of institutional knowledge to solve issues. OIT encourages teams to bring mistakes forward by creating a transparent culture from the top down.

  • Best Practices

    • If you make a mistake, own it and share it with the team as soon as possible
    • As a leader, if someone on a team flags a mistake they made, acknowledge their integrity in raising the error and transition to fixing the issue and preventing similar ones
    • Participate in OIT’s major incident process, proactively declaring major incidents when there is a significant issue with your product
    • If someone discovers a mistake made by someone else, shift the conversation to finding a solution rather than pointing blame
    • Immediately flag when products or features are or may become delayed so leadership can appropriate resources asneeded
    • Evaluate work based on final output relative to final cost, rather than mistakes made along the way
    • Product teams must develop robust incident management teams and protocols to ensure that critical product errorsare resolved as fast as possible
  • Guiding Questions

    • How can you encourage a culture of “embracing the red” daily?
    • Does your team know how to participate in the Major Incident Process, including how to declare ongoing or recently addressed incidents?
    • Have you defined a protocol or process to make it easier for individuals to report mistakes?
    • Do opportunities exist to flag delays in product development? Where can you create more opportunities to do so?
  • Use Case

    In FY24, the Enterprise Command Center (ECC) began an aggressive continuous service improvement initiative to investigate 100% of Major Incidents within five business days to identify opportunities to improve monitoring instrumentation.

    A Critical-100 system experienced a Major Incident caused by a change in November 2024, resulting in customers unable to access the system. The monitoring instrumentation was successful, and all alerting functioned as designed. Furthermore, upon learning of the change, ECC updated the components for the application’s monitoring instrumentation, which reduced the average number of alerts per day for failed connections by 45%. By reducing alert noise, System Owners and Event Management Eyes on Glass can quickly analyze and respond to the remaining alerts that are most critical to the business.

2.5 We treat security as part of the Veteran experience and a key outcome of the products we deliver

Security is the most important functionality teams can incorporate into their development plan. At VA, a cyberattack may compromise the reliability and privacy of the critical services upon which our nation’s Veterans, their families, and their caregivers depend. We should seek to treat Veterans’ identities with the same reverence we treat Veterans themselves.

VA is implementing a Zero Trust Architecture (ZTA) to stay ahead of ever-adapting cyber threats. There are four core ZTA principles:

  1. Never trust, always verify
  2. Enforce least privileged access
  3. Continuously and pervasively monitor
  4. Assume breaches

OIT expects that all product teams are familiar with ZTA and remain compliant and vigilant across all software development efforts.

2.6 We maintain a clear, shared, and up-to-date technical strategy for every product

Product teams should define transparent and modern technical strategies to ensure all systems are ready for continuous improvement. Teams should consider opportunities to coordinate with broader enterprise, product line, or portfolio strategy while also making technical roadmaps clear and transparent.

  • Best Practices

    • Review key Enterprise Technology Guidelines when planning technical strategy
    • Prioritize modernization of critical systems where possible
    • Coordinate with leadership to ensure technical strategy is consistent with broader strategies (e.g., across tooling, tech stack)
    • Share technical strategy for review and regularly update as needed
  • Guiding Questions

    • Is your broader technical strategy aligned to the overarching ambition to create more modern and modular systems? Where and why does it diverge?
    • How can you incorporate regular updates to your technical strategy on a day-to-day basis?
    • If you are a product line manager or portfolio lead, what does your technical strategy look like? Are your teams aware of and aligned to this strategy?
    • Should the systems under your purview be using the same technology stack? Is that possible?

2.7 We ensure our teams know how to find the tools, documentation, and information needed to deliver products successfully

Teams should leverage resources like Product VA, CODE VA, Design VA, and the Digital VA website to ensure everyone on the team has the most up to date guidance and tools.

The VA Way is made up of all the best work being done in OIT and teams are encouraged to document their work thoroughly to continue to expand institutional knowledge and enable others to follow their example.

  • Best Practices

    • Help new team members onboard by centralizing system-specific resources in an accessible location
    • Create new-hire onboarding resources that identify “must reads” that will help new team members get up to speed quickly on key areas
    • For custom applications, ensure there is up-to-date documentation that can help a new developer begin productively contributing code to the project quickly
    • Contribute back to the OIT community by uploading and updating content or providing feedback to microsite administrators
  • Guiding Questions

    • What learnings, tools, design, software, etc. might be helpful to share with the broader community?
    • After tracking down a hard-to-find process or piece of information, what can you do to make that information easier for future team members to find?
    • What tools, documentation, and information might be needed for a new individual joining your product team? How might you store this information in an accessible way?
    • How can you solicit feedback around tooling / docs from my team day-to-day?
    • Where are there gaps for OIT to improve on the current state tools, documentation, and / or product delivery information?

We’re here anytime, day or night - 24/7

If you are a Veteran in crisis or concerned about one, connect with our caring, qualified responders for confidential help. Many of them are Veterans themselves.

Get more resources at VeteransCrisisLine.net.