durability · 08

Durable state & crash recovery

Every turn is written to disk before anything else happens. Close the laptop, lose the network, kill the model — then attach and resume.

maps to as-built §4§7§15

walk away for real · every turn is on disk before it matters
Written every turn, recovered from any death On the left, what each turn writes to disk before anything else happens: state.json via an atomic temp-file-then-rename, a full working-tree snapshot under snapshots/turn-N, and the append-only spend, traces, provenance, and events logs. On the right, what happens when the run dies (a closed laptop, a lost network, a killed model): the task lock goes stale because its process id is no longer alive, so the next caller reclaims it, and resume replays from the last completed turn. The worst case is redoing a single turn. WRITTEN EVERY TURN durable by default state.json: atomic temp → fsync → rename snapshots/turn-N/: full working tree spend.jsonl · traces.jsonl provenance.jsonl · events.jsonl atomic write = never a half-written state.json WHEN IT DIES closed laptop · lost network · killed model the process holding the run just vanishes lock goes stale: its PID is not alive the next caller reclaims it attach · resume · from another terminal replays from the last completed turn
state.json is never half-written: it is staged in a temp file and renamed into place. The worst case after a crash is redoing one turn.

A long run is only useful if it survives the real world: a closed laptop, a dropped connection, a model that hangs. deadreckon earns that by writing everything to disk as it goes, so no single failure costs you the run.

Every turn, before anything else happens, the run saves its state. The state.json file is written the safe way: deadreckon writes a temporary file, flushes it to disk, then atomically renames it into place, so a crash can never leave it half-written. Alongside it, the run writes a full snapshot of the working tree under snapshots/turn-N/, plus append-only logs for spend, traces, file provenance, and events.

When the process holding a run dies, its task lock goes stale: the lock records a process id, and a quick liveness check shows that process is gone. The next caller reclaims the lock cleanly. You then deadreckon resume from any terminal, and the loop replays from the last completed turn, reconstructing history from traces.jsonl if needed.

One completed run is final. A run that already reached Completed can't be resumed back into the loop; it can only be re-promoted (which is safe to repeat) or replaced by a new run.

source