Blog

Done is a claim, not a fact.

Coding agents report completion with total confidence and mixed accuracy. This post is about why that happens, and how Loom re-checks work before a slot counts as finished.

Download Loom Verification in depth

Why agents declare victory early

An agent's report of its own work is written by the same process that did the work. The summary that says everything passed is just more output, produced with the same fluency as the code, and it reads identically whether the task is finished or merely sounds finished. There is no tone of voice that gives away an unfinished job.

In a single terminal this is survivable, because you are the verification. You read the scrollback, you run the tests, you decide. Across six parallel Claude Code sessions that habit does not scale. If the system takes done at face value, a fleet is simply a faster way to accumulate unfinished work, and you discover it all at review time.

The fix

Separate the claim from the count.

Re-checked before done

When a session reports completion, the Conductor re-checks the work before the task counts. Claiming done and being done are treated as different events.

Finished slots detected

Loom detects when a slot has genuinely finished and marks it, so the mission view reflects verified reality rather than the most optimistic line in a scrollback.

Premature done caught

A session that announces completion too early is caught, and the mission does not move on over it. The task stays open until the work is actually there.

In practice

Progress you can plan on.

Because nothing counts until it is checked, the mission DAG and live activity strips show verified progress, not self-reported progress. Smart steering keeps sessions moving and verification decides whether what they produced counts, a clean split of duties. Final judgment is still yours: the source control panel with its full git graph is built for reviewing what the fleet shipped before you push.

slot 3 of 6

# slot 3 announces completion
slot 3: tests green, task complete
$ Conductor re-checks before it counts
# verified, slot 3 marked finished
# the DAG advances on evidence, not on tone

The other half of autonomy

Verification is the closing bracket of the same idea that opens with auto-accept. Pressing yes safely is what lets six sessions run without you, and re-checking done is what makes the result worth coming back to. An unattended fleet needs both, or it is only unattended in one direction.

Hand it the work.
Walk away.

macOS, Linux, and Windows. Around 13 MB. Free and open source.

Download Loom Back to home