Blog
Dead sessions don't stay dead.
Run six Claude Code sessions long enough and some will stall, exit, or hit a limit. Loom's watchdog notices, relaunches the CLI, and re-sends the task, with cooldowns so recovery never turns into thrash.
Failure modes
Three ways a session goes quiet.
The silent stall
The process is alive but the terminal has stopped producing meaningful output. Output is the heartbeat, and a flat line earns the session a nudge.
The dead process
The CLI crashed or exited outright. OS-level signals catch this immediately, no guessing from a frozen screen, and Loom relaunches the CLI in the same slot.
The usage limit
Rate limits are detected and recovered, and with pooled Claude accounts the limited account is parked while the slot rotates to the next. Details in account rotation.
Two signals, one verdict
Watch the output, trust the OS.
Each fleet slot is a real CLI in a real native PTY, so the watchdog gets two independent feeds: the terminal output stream, which tells it whether the session is making progress, and OS-level process signals, which tell it whether the session is alive at all. Output catches the subtle failures, signals catch the fatal ones, and neither can mask the other. When a session is judged stalled or gone, the response is mechanical: relaunch the CLI if needed, then re-send the task it was working on.
# for every fleet slot, continuously
$ read terminal output, read process signals
# alive but silent: stalled
$ nudge the session
# process gone: exited
$ relaunch the CLI, re-send the task
# still failing: cool down, then retryWhy cooldowns matter more than retries
A naive watchdog relaunches a failing session in a tight loop and turns one failure into a hundred. Loom's recovery is paced on purpose: cooldowns space the attempts out, and retries are bounded, so a slot that keeps dying gets progressively more patience instead of progressively more punishment. If the cause was transient, a relaunch and a re-sent task is all it takes and the mission barely notices. If the cause is a usage limit, recovery routes through rotation rather than blind retry.
The payoff is that unattended time becomes real working time. The whole premise of a fleet is that you hand over a goal and walk away, and that promise is only as good as the slowest unnoticed failure. With the watchdog, a session that dies at 2 a.m. is relaunched and back on task without you, which is what makes overnight runs a practical habit instead of a gamble. Recovered sessions still go through the same gate as everything else: work is re-checked before it counts as done, as covered in verification. And if you want the wider story of how missions flow around these recoveries, read Anatomy of a mission.
Hand it the work.
Walk away.
macOS, Linux, and Windows. Around 13 MB. Free and open source.