Beta feature. The audit ships as beta while we collect early feedback.
The detector catalog and report format may change before the next stable
cut. Please open an issue if anything looks off.
/audit dashboard
page — your agent’s archetype, a 0–100 score, and exactly which policies
would have caught what.
Run it
Three ways in — all land on the same/audit report.
No install
npx -y failproofai audit fetches failproofai, runs the scan, and opens the
dashboard for you — nothing to install first.From the CLI
failproofai audit runs the scan in your terminal, then opens
localhost:8020/audit automatically when it finishes.From the dashboard
Run
failproofai and click Audit in the navbar (between Policies and
Projects), or open /audit directly.cd <cwd> prefixes, sleep-polling loops, re-reading files just edited, and more.
For each transcript, every tool-use event is replayed through the 39 builtin policies and through 8 audit-only detectors that catch patterns not yet covered by runtime policies. Counts are aggregated per policy / detector across all sessions.
What you get
The/audit page is a single-screen, shareable poster followed by four below-the-fold sections:
- Poster — your agent’s identity at a glance: its archetype (one of 8 —
optimist,cowboy,explorer,goldfish,paranoid architect,precision builder,hammer,ghost), its persona keywords, how rare that archetype is, and a 0–100 score with a tier band (Sdown tobottom tier). Built to share — post to X or LinkedIn, or download it as a PNG. // strengths— what your agent already does well, as real numbers from the scan (e.g. clean-tool-call %,0push-to-main attempts), shown only where the relevant policy has a clean record.// quirks— what slipped through: a ranked table of behaviors failproofai would have caught — when it last happened, what slipped (and the builtin that would have blocked it), its severity, and how often it was seen (new/recurring/N× seen).// how to improve— the prescribed fix list: one row per policy with a copy-pastefailproofai policy add <slug>, plus an install all button that enables every recommendation at once and shows your projected score if you did.// come back better— build the habit: set a re-audit email reminder (3d/7d/14d/30d) or re-audit now, and invite a friend to run their own audit (sent from failproof.ai, Cc’d to you). Reminders and invites require sign-in — seefailproofai auth.
Audit-only detectors
These detect “stupid behavior” patterns not (yet) enforced in real time. They run only during the audit and never block a live tool call.| Detector | What it counts |
|---|---|
redundant-cd-cwd | Bash commands starting with cd <cwd> && … even though commands already run in cwd. |
prefer-edit-over-read-cat | cat/head/tail/less/more on a single source file — use the Read tool. |
prefer-edit-over-sed-awk | sed -i / awk … > file in-place edits — use the Edit tool. |
prefer-write-over-heredoc | Heredoc / multi-line echo > file writing files — use the Write tool. |
sleep-polling-loop | Long sleep N (≥ 30s) or while …; sleep …; done polling loops. |
find-from-root | find /, find /home, find /usr, etc. — scope to cwd instead. |
git-commit-no-verify | git commit … --no-verify / -n, skipping hooks. |
reread-after-edit | Read of a file that was just Edit/Write in the same session. |
Caches
- Per-transcript cache at
~/.failproofai/cache/audit/<sha1>.jsonkeyed by(mtime, size, engineVersion, detectorVersion)— invalidates automatically when the transcript or the policy/detector code changes. Each entry also stores acachedAttimestamp as TTL metadata (not part of the cache key); entries older than 7 days are rejected on read so long-lived results don’t outlive evolving detector intent. - Whole-result cache at
~/.failproofai/audit-dashboard.json(mode 0600). Lets the dashboard render instantly on navigation without re-running. Also rejected on read past the 7-day TTL —/auditthen falls through to its empty state and prompts a fresh run. Click[ re-audit now ]near the bottom of the report to refresh — re-audit sendsnoCache: true, so it bypasses the per-transcript cache and re-scans every transcript instead of returning the cached result; the run streams progress via a sticky top strip and swaps the result in place on success (no page reload; a failed re-audit keeps the previous report).
Notes
- No mutation. The audit replays in read-only mode.
warn-repeated-tool-callsis skipped because its per-session sidecar would otherwise be modified. - Workflow policies skipped.
require-*-before-stoppolicies fire only onStopevents andexecSyncagainst the live git state — they have no meaningful “what would have happened in 2025” interpretation, so they don’t appear in audit counts. - Custom policies skipped. User-supplied custom hooks are not replayed (they may have changed since the original session).

