engineering metrics team health DORA metrics

Engineering Team Health Metrics Beyond DORA (2026)

DORA metrics measure delivery speed. But delivery speed isn't team health. Here are the metrics that tell you how your engineering team is actually doing.

By Matthew ·
Engineering Team Health Metrics Beyond DORA (2026)

TL;DR: DORA metrics tell you how fast your team delivers. They don’t tell you whether your team is healthy. Review participation, comment quality trends, rework patterns, and AI adoption curves give you the full picture. Stop treating DORA as the only framework that matters.


Why aren’t DORA metrics enough?

DORA metrics are good. I’m not here to trash them. Deployment frequency, lead time for changes, change failure rate, and time to restore service — these four metrics do exactly what they claim: they measure software delivery performance.

The problem is that somewhere between Google’s DORA research and the average engineering manager’s dashboard, people started treating DORA as a complete picture of team health. It’s not. It was never designed to be.

A team can have excellent DORA scores and still be in trouble. I’ve seen it firsthand: a team shipping daily with a 2-day lead time and low change failure rate — textbook “elite” by DORA standards — where one senior engineer was reviewing 70% of all PRs, two developers hadn’t submitted a review in weeks, and the codebase was accumulating the same category of bug over and over because nobody was catching it in review.

By DORA standards, that team was performing. By any reasonable health assessment, that team was three months from a serious problem.

DORA measures delivery. Delivery speed is a symptom, not a diagnosis. To understand team health, you need to look at the processes and behaviors that create (or destroy) sustainable delivery.

How does review participation reveal team health?

Review participation is the percentage of your team actively participating in code review. Not just approving — actually reviewing, commenting, and engaging with the code.

Healthy teams have distributed review participation. If your team has 8 engineers and all 8 are reviewing code regularly, that’s a sign of shared ownership, knowledge distribution, and collective accountability. If 2 of those 8 engineers are doing 80% of the reviews, you have a problem that DORA won’t catch.

Uneven review participation signals three things:

Bottlenecks. When one person reviews everything, they become a single point of failure. They go on vacation, and PRs pile up. They get sick, and cycle time doubles. DORA will eventually catch this as a lead time spike, but by then the damage is done.

Disengagement. When team members stop reviewing code, it often means they’re checked out. They might still be writing code, so your throughput numbers look fine. But they’re not invested in the team’s collective output. This is a retention risk that no delivery metric will surface.

Knowledge silos. If only certain people review certain parts of the codebase, knowledge concentrates. When those people leave, the team loses context that takes months to rebuild. Review participation patterns are your early warning system for bus factor problems.

Track review participation weekly. If it’s uneven, don’t just redistribute review assignments — figure out why certain people aren’t reviewing. Is it workload? Is it comfort level with the codebase? Is it a signal they’re looking for the door?

The number of review comments is easy to count. What’s harder — and more important — is measuring whether those comments are any good.

A review with one comment that says “This will break pagination for users with more than 100 items — here’s why” is worth more than a review with 10 comments that say “nit: spacing” and “LGTM.” But most metrics tools treat these the same. A comment is a comment.

Comment quality trends tell you whether your team’s review process is getting stronger or weaker over time. When quality declines, it almost always means one of these:

Review fatigue. Your team is reviewing too many PRs, so they start cutting corners. Comments get shorter, less substantive, more perfunctory. The reviews are still happening — the checkbox is still checked — but the actual quality assurance function has degraded.

PR size creep. When PRs get bigger, reviews get worse. A 1,200-line PR doesn’t get a thorough review. It gets a skim. If comment quality is dropping, check whether average PR size is climbing at the same time.

Team composition changes. New team members may not know what “good” review comments look like. If you onboarded several people recently and comment quality dropped, that’s a training opportunity, not a crisis.

MergeScout is an AI-powered engineering metrics dashboard that watches your GitHub repos and delivers executive briefings in seconds. One of its unique capabilities is scoring comment quality at the individual comment level — distinguishing between rubber-stamp approvals and substantive technical feedback. This turns “are reviews happening” into “are reviews working.”

How do rework patterns differ from rework rate?

Rework rate tells you what percentage of PRs need follow-up fixes. That’s useful. But rework patterns tell you something deeper: are the same types of bugs recurring?

A 7% rework rate is one story if the bugs are all different (random noise in a complex system). It’s a completely different story if 60% of the rework is related to the same thing — say, missing null checks, broken pagination, or incorrect API error handling.

Pattern detection matters more than the rate itself. If the same category of bug keeps escaping review, you have a systemic issue — a gap in your review checklist, a missing linter rule, or a part of the codebase that’s undertested.

Here’s how to use rework patterns:

  • Track what gets reworked, not just that rework happened. Categorize follow-up fixes by type: logic error, missing edge case, API contract mismatch, UI regression.
  • Look for concentration. Is rework concentrated in one repo, one service, or one part of the stack? That points you to the specific area that needs attention.
  • Connect patterns to reviews. When you find a recurring rework pattern, go back to the original PRs and look at the reviews. Were comments left about the issue and ignored? Were the reviews too cursory? That feedback loop is where real improvement happens.

A team that ships with a 7% rework rate and knows exactly why — and is actively reducing it through targeted fixes — is healthier than a team with a 4% rework rate that has no idea what their patterns look like.

Is AI adoption a team health metric?

It is in 2026. Not because using AI makes your team healthy, but because how your team adopts new tools reveals a lot about culture.

Tracking AI adoption is really tracking adaptability. Teams that experiment with new tools, evaluate them honestly, and integrate what works are teams that adapt well. Teams that refuse to engage or adopt blindly without evaluation have a culture problem — either too rigid or too uncritical.

The more interesting question is whether AI-assisted work produces different outcomes. Specifically:

  • Do AI-assisted PRs have lower or higher rework rates? If AI-generated code has higher rework, developers might be trusting AI output without sufficient review. If it has lower rework, the tools are genuinely improving quality.
  • Does AI adoption change review behavior? Some teams review AI-generated code less carefully because they assume the AI got it right. That’s a dangerous pattern — and it shows up in comment quality metrics.
  • Is adoption evenly distributed? If half your team is using AI tools and half isn’t, that’s a conversation worth having. Not to force adoption, but to understand the resistance and share what’s working.

Track the AI adoption curve over time. Healthy teams show gradual, sustained adoption with measurable quality outcomes. Unhealthy patterns include forced adoption spikes (mandate from above with no evaluation) or complete stagnation (team refuses to experiment).

What does a team health dashboard look like beyond DORA?

Here are the six metrics that give you a complete picture of engineering team health, beyond what DORA covers:

1. Review participation distribution — What percentage of the team is actively reviewing? Is participation balanced or concentrated? Target: 80%+ of the team reviewing weekly, no single reviewer handling more than 30% of all reviews.

2. Comment quality score — Are reviews substantive or rubber stamps? Track the ratio of meaningful feedback to perfunctory approvals. Target: trending stable or improving over time.

3. Rework rate by pattern — What percentage of PRs need follow-up fixes, and what types of fixes are recurring? Target: under 5% overall, no single pattern accounting for more than 25% of rework.

4. Review rounds per PR — How many back-and-forth cycles before code merges? Target: 1-2 rounds average. Consistently 3+ rounds means unclear requirements or misaligned standards. Read more about reducing review rounds on the blog.

5. AI adoption rate and quality impact — What percentage of PRs involve AI-assisted code, and does that code have different rework rates? Target: growing adoption with equal or better quality metrics.

6. Bus factor by area — For each major area of the codebase, how many people have reviewed code there in the last 30 days? Target: at least 2 active reviewers per critical area.

Combine these with the four DORA metrics and you have a 10-metric framework that covers delivery performance and team health. That’s not 10 metrics for a dashboard that nobody reads — it’s 10 metrics that each answer a specific question an engineering manager actually asks.

You can track most of these in MergeScout. Review participation, comment quality, rework rate, review rounds, and AI adoption are all computed automatically from your GitHub data. No manual tagging, no Jira required.


Frequently Asked Questions

Should I stop tracking DORA metrics?

No. DORA metrics are valuable for measuring delivery performance. The point is that DORA alone doesn’t tell you about team health. Keep DORA and add metrics for review quality, participation, and rework patterns. Think of DORA as the “what” (delivery speed) and team health metrics as the “how” (sustainable practices).

How often should I review team health metrics?

Weekly for the core metrics (review participation, comment quality, rework rate). Monthly for trends and patterns. Quarterly for a full health assessment that you discuss with the team. Daily is too noisy — you’ll chase every fluctuation.

Can a team have great DORA scores and poor team health?

Absolutely. A team can ship fast by concentrating all reviews on one person, skipping thorough reviews, and accumulating technical debt. The DORA numbers look elite until the bottleneck person burns out or leaves. Team health metrics catch these structural risks before they become delivery problems.

What’s the single most important team health metric beyond DORA?

Review participation distribution. It’s the earliest warning signal for bottlenecks, disengagement, and knowledge silos — three problems that eventually destroy delivery performance. If you only add one metric beyond DORA, make it this one.

How do I introduce team health metrics without making my team feel surveilled?

Share the metrics with the team, not just leadership. Make them team-level, not individual. Frame them as process improvement tools: “We want to know if our review process is working, not whether individuals are performing.” When the team sees the metrics helping them (fewer bottlenecks, better reviews, less rework), the surveillance concern fades.