Lab 01 · Verification & Trust

Fast AI.
Slow finish.

The AI is fast. The finish isn't — because every output still gets re-checked by hand. That re-checking is the Verification Tax, and it's quietly eating your AI payback.

Field test · 1–20 July 2026 · ~2.5 h per tester
live · the verification tax 0:00
TaskOne-page cover letter for a €68k proposal
① AI generates0:00
② Human verifies0:00
  • Facts checked
  • Tone fixed
  • Numbers corrected
  • Cleared to send
9 sec to write. 15 min to finish.
The Problem

AI didn't erase effort. It relocated it.

AI doesn't take away your tasks — it just changes them. Instead of doing the work yourself, you spend time reviewing and correcting what the AI produces. This hidden effort rarely gets tracked, so the true cost goes unnoticed.

Real taskA one-page cover letter opening a €68k proposal pack — Sales
Without AI
One senior AE · 20 min, stopwatch-measured
With AI
Still ~15 minutes — almost all of it checking
Manual baseline — 20 min, stopwatch-measured AI writes it — 9 seconds A human checks it — ≈15 minutes
“Still faster?” Sure — by five minutes, not the magnitude “9 seconds” promised. And you just trusted a draft you didn't write, on a €68,000 deal. The sliver of blue is the pitch; the wall of orange is the Verification Tax.
So which tasks should actually use AI?

Nobody can tell you yet — real business cases have never been tested with senior professionals doing their actual work. The split below is the industry's best guess. Lab 01 turns that guessing into measured fact — by domain, and by task.

Assumed: AI winsstructured, low-stakes work — thought to be quick to check
  • Summarising a meeting or a long email thread
  • First draft of a routine email or reply
  • Reformatting, tidying or restructuring text
  • Standard status updates and briefings
Assumed: the tax bitesunstructured, high-judgment work — thought to be checking-heavy
  • A client-facing proposal or pitch
  • Financial commentary and numbers that must be right
  • Sensitive, nuanced or negotiated messages
  • Anything where one wrong detail is costly

Think it's a small tax? Let's see your total.

Interactive Put in your numbers. Watch the tax add up.
80
8
60
hours every week, spent only on checking
verification cost per year (€)
A rough estimate from your own inputs. Lab 01's ROI Calculator does this properly — with real per-domain coefficients measured in the field test.
The Knowledge Gap

Why nobody can hand you this number yet

Every so-called productivity study on AI is built on easy-mode setups — free models, students, cherry-picked scenarios, and brand bias. That's why nobody has the real numbers that matter in your business. Lab 01 breaks the cycle: real frontier AIs, real senior experts, real-world tasks, double-blind. Finally, you get the truth about where the time really goes.

Most studiesLab 01

Freebie models, not the real deal

They test free or local models because frontier budgets are expensive. We run Claude Sonnet 4.6, Haiku 4.5, Gemini 2.5 Pro & Flash — the models you'd actually deploy — in a 2×2 design.

Most studiesLab 01

Rookies, not experts

Most “AI research” is run on undergrads who've never closed a quarter — their “verification tax” is a fantasy. You care about risk with real money on the line: we test senior pros with a decade+ in the trenches.

Most studiesLab 01

Only easy scenarios, not the messy realities

They cherry-pick: experts prompting, experts checking — the best case. But real life isn't that clean. We run all four combos: who prompts × who verifies. You see the messy truth, not the brochure version.

Most studiesLab 01

Brand bias, not pure judgment

Testers know the brand behind every output — and of course “Claude is smarter, right?” We keep it double-blind start to finish: no one knows which AI wrote what. The result? Judgment, not brand loyalty.

Stop guessing. Start measuring what actually matters for your bottom line.

Our Hypotheses

The bets behind Lab 01 — and why the tools aren't optional

We don't make wild guesses. Every hypothesis is pre-registered on OSF and built to be broken, not just confirmed. Track each bet: where it comes from, what it's really claiming, and why — if none can be disproven — these tools stop being optional and become mission-critical.

Our Methodology

Lab01, Unfiltered: From Idea to Proof

We show our work—every stage, every decision. Follow the full story from the first hypothesis to the final tool, and know exactly how Lab 01 delivers proof you can trust.

The Question We Answer

Who really wins with GenAI — and where are you just spinning your wheels? We put Sales, Marketing, Finance, and Project Management head-to-head so you know exactly where to double down (and where to hold your fire).

moderatorModel TierFrontier (Sonnet, Pro) vs Cheap (Haiku, Flash)
Is premium AI really worth the price — or are you just burning budget? Model Tier is the wild card: does shelling out for Sonnet or Pro actually deliver better results in your department, or does the bargain-bin model do the job just as well?
predictorFunctional Domain
SalesMarketingFinanceProject MgmtPM
direct effect
outcomeNet Time Savingsper department · hours / week
indirect effect
Functional Domain
mediatorOutput Quality
Net Time Savings
Output quality is the mediator — the in-between that explains why GenAI pays off big in one department and barely budges the dial in another.
This is the headline relationship. The full set — every independent and dependent variable, all the moderators and the mediation model — is pre-registered and open: osf.io/tznf8.
Pre-registered on OSF Double-blind
Field test · 1–20 July 2026 · 15 partner seats

Stop paying a tax
you never agreed to.

Put your team in the only field study that exposes the real cost of trusting AI — and walk away armed to slash it.

15 partner seats 5–20 testers per partner ~2.5 h per tester