Our Process for Shipping Fast Without Breaking Things
"Move fast and break things" was fine for Facebook in 2009. In 2024, users expect reliability. But that doesn't mean you have to move slowly. Here's how we balance velocity with stability.
The false dichotomy
Many teams believe there's an inherent tradeoff:
- Move fast → break things
- High quality → move slowly
We reject this premise. With the right systems and culture, you can ship quickly and maintain quality. Here's how.
Principle 1: Invest in automation upfront
Every hour spent on automation returns tenfold. Our baseline for any project:
CI/CD pipeline (day one)
# Minimum viable pipeline
on: [push, pull_request]
jobs:
quality:
steps:
- lint # Catch style issues
- typecheck # Catch type errors
- test # Catch logic errors
- build # Ensure it builds
deploy:
needs: quality
if: github.ref == 'refs/heads/main'
steps:
- deploy-to-stagingThis catches 80% of issues before anyone reviews code.
Automated previews
Every pull request gets a preview deployment. Reviewers can actually use the feature, not just read the code.
Database migrations
Automated and tested. No manual SQL scripts in production.
Principle 2: Small, frequent deployments
The data is clear: smaller deployments are safer deployments.
| Deployment size | Rollback rate | Mean time to recovery |
|---|---|---|
| Large (weekly) | 15-20% | 4+ hours |
| Medium (daily) | 5-10% | 1-2 hours |
| Small (continuous) | 1-3% | 5-15 minutes |
We deploy to production multiple times per day. Each deployment is small enough to understand and quick to rollback if needed.
How we make this work
- Feature flags — New features deploy dark, enable when ready
- Trunk-based development — Short-lived branches, frequent merges
- Automated testing — Confidence to deploy without fear
- Monitoring — Know immediately when something's wrong
Principle 3: Testing strategy that makes sense
Not all tests are created equal. We follow the testing trophy, not the pyramid:
┌──────────────┐
│ E2E Tests │ Few but critical paths
└──────────────┘
┌──────────────────────┐
│ Integration Tests │ Where most bugs hide
└──────────────────────┘
┌──────────────────────────┐
│ Unit Tests │ Fast, isolated
└──────────────────────────┘
┌──────────────────────────────┐
│ Static Analysis │ Types, linting, etc.
└──────────────────────────────┘
Our testing rules
- Write tests for things that would be embarrassing to break — auth, payments, core flows
- Integration over unit for complex logic — test the behavior, not the implementation
- E2E for critical user paths — signup, purchase, core workflow
- Static analysis catches the rest — TypeScript, ESLint, Prettier
Principle 4: Code review that doesn't bottleneck
Code review is essential but often becomes a blocker. Our approach:
Size matters
< 200 lines → Reviews in hours
200-400 lines → Reviews in a day
> 400 lines → Should probably be split
We enforce this with tooling. Large PRs require justification.
Focus on what matters
Reviewers should focus on:
- ✅ Logic correctness
- ✅ Edge cases
- ✅ Security implications
- ✅ Architecture/patterns
- ✅ Missing tests for important paths
Reviewers should not focus on:
- ❌ Formatting (automated)
- ❌ Style nitpicks (configured in linter)
- ❌ Minor naming preferences (move on)
Two-hour SLA
We aim to provide initial review feedback within 2 hours during working hours. First review doesn't need to be complete — even "I'll look at this properly after lunch" is useful.
Principle 5: Incident response
Things will break. The question is how quickly you recover.
Our incident process
- Detect — Monitoring alerts us before users complain
- Triage — Is this affecting users? How many?
- Mitigate — Fix the symptoms first (rollback, feature flag, etc.)
- Communicate — Keep stakeholders informed
- Resolve — Fix the root cause
- Learn — Blameless retrospective
The blameless postmortem
Every significant incident gets a writeup:
## Incident: [Title]
### Timeline
- 14:23 Alert fired
- 14:25 On-call acknowledged
- 14:32 Identified as [X]
- 14:35 Rolled back deployment
- 14:40 Confirmed resolution
### What happened
[Factual description]
### Contributing factors
[Not "who" but "what systems failed"]
### Action items
[Concrete improvements, assigned and timeboxed]No blame. Just learning.
Principle 6: Technical debt is a tool
We deliberately take on technical debt when it makes sense:
Intentional debt
"We're using a simple solution now because we're not sure this feature will succeed. If it does, we'll refactor in Q2."
This is documented and scheduled for review.
Unintentional debt
"We didn't realize this would be a problem" — learn from it.
What we never compromise on
- Security
- Data integrity
- Core user experience
- Accessibility
These are never "we'll fix it later."
Measuring velocity and quality
You can't improve what you don't measure:
Velocity metrics
- Deployment frequency — How often we ship
- Lead time — Idea to production
- PR cycle time — Open to merged
Quality metrics
- Rollback rate — How often we revert
- Incident rate — Issues per deployment
- MTTR — Mean time to recovery
- Customer-reported bugs — Issues that slip through
We track these weekly and discuss trends monthly.
The culture underneath
All these processes only work with the right culture:
- Ownership — You build it, you run it
- Transparency — Everyone sees everything
- Psychological safety — Mistakes are learning opportunities
- Continuous improvement — Good enough isn't
- User focus — Everything traces back to user value
Key takeaways
- Invest in automation — CI/CD, testing, previews
- Deploy small and often — Smaller changes, lower risk
- Test strategically — Not everything needs a test, but critical paths do
- Review efficiently — Small PRs, focused feedback, quick turnaround
- Learn from incidents — Blameless postmortems, concrete actions
- Measure and improve — Track metrics, discuss trends
Speed and quality aren't opposites. With the right systems, they reinforce each other.
Want to build a development process that ships fast and stays stable? Let's work together to set up systems that support your team.
