---
id: "action-measure-review-burden"
type: "action-item"
source_timestamps: ["00:17:48", "00:17:55"]
tags: ["evaluation", "metrics"]
related: ["framework-agent-evaluation", "concept-negative-lift"]
speakers: ["Nate B. Jones"]
action: "Measure the time humans spend reviewing the agent's draft against the time saved."
outcome: "Objective data to determine if the agent is actually driving productivity or causing negative lift."
sources: ["s06-openai-free-employee"]
sourceVaultSlug: "s06-openai-free-employee"
originDay: 6
---
# Strictly Measure Review Burden

## Action

**Measure the time humans spend reviewing the agent's draft against the time saved.**

## Expected Outcome

Objective data to determine if the agent is actually driving productivity or causing [[concept-negative-lift|negative lift]].

## Detail

After deploying an agent, do **not** judge its success by the quality of the demo. Instead, rigorously track:

- Time previously spent executing the workflow manually
- Time now spent reading, second-guessing, and editing the agent's draft

If review time exceeds the time saved by not having to write from scratch, you have negative lift and should kill or heavily refactor the agent. See [[framework-agent-evaluation]] for the full step-by-step.

## Why This Matters

McKinsey's net-productivity formula maps exactly: `(time saved) − (review/correction time) > 0`. Validators report ~74% of failed AI projects collapse on exactly this gap.