RUN vs BUILD: How to Handle a Production Crisis Without Sacrificing the Roadmap
In the application lifecycle, IT teams must constantly juggle between two often conflicting priorities:
- RUN: handling incidents, ensuring service continuity, keeping users satisfied in the short term.
- BUILD: developing new versions, pushing the roadmap forward, and creating long-term value.
When a crisis hits—multiple production incidents, a critical outage, users under heavy pressure—the balance becomes fragile. How can teams stay responsive without derailing ongoing developments? How can they absorb the pressure without freezing the strategic roadmap?
The stakes in crisis management
- Immediate user satisfaction: downtime destroys trust; speed of response is critical.
- Preserving the roadmap: freezing all development may feel safe but can jeopardize product strategy.
- Avoiding team burnout: constant firefighting and interruptions drain morale and effectiveness.
Best practices to balance RUN and BUILD
1. Separate roles clearly
- Dedicated RUN squad (or temporary crisis cell): absorbs incidents, investigates, and delivers hotfixes.
- Protected BUILD team: continues roadmap development without being overwhelmed by interruptions.
- For smaller teams: establish a rotation between RUN and BUILD to avoid overloading the same people.
2. Prioritize interventions
- Urgent hotfix: only if production is blocked or severely degraded.
- Deferred fix: if a workaround exists and the impact is acceptable.
- Planned consolidation: resolve deeply in the next release to avoid piling up fragile quick patches.
3. Deliver in small, smart increments
- Isolated hotfixes: minimal, well-tested, delivered quickly.
- Regular releases: keep delivering planned features.
- This ensures you don’t block the roadmap while still being reactive to user pain points.
4. Keep communication transparent
- With users: explain the issue, estimated timelines, and possible workarounds.
- With the team: share workload, give visibility on priorities, and avoid overloading one single person.
5. Learn and improve post-crisis
- Run a post-mortem to identify root causes.
- Implement structural improvements (automated testing, monitoring, observability, robust CI/CD).
- Define clear decision criteria for when an issue deserves an immediate hotfix versus inclusion in the next release.
Suggested crisis-time organization
- Crisis leader identified (Tech Lead or Incident Manager): sets priorities, centralizes communication.
- RUN pair: investigates, delivers hotfixes.
- Remaining BUILD team: continues roadmap with minimal disruption.
- Product Owner / Business lead: manages user communication and business pressure.
Tips for struggling teams
- Don’t fall into permanent firefighting: rotate responsibilities and document incidents.
- Protect focus time for roadmap work.
- Automate wherever possible: tests, deployments, monitoring.
- Define internal SLAs to decide what requires immediate action vs what can wait.
- Maintain team morale: acknowledge the extra effort and avoid letting urgency dictate every priority.
Conclusion
Successful crisis management depends on clear organization and disciplined prioritization. A reactive RUN cell, a protected BUILD team, well-scoped hotfixes, and transparent communication make it possible to balance immediate user needs with long-term roadmap delivery.