Path Z-the CI gate¶
The whatifd CLI is a wedge. The destination is something larger:
An engineer sees a failing
whatifdcheck in a PR and thinks “I’m not merging until this is green.”
That’s the moment a tool crosses from “useful” to “infrastructure.” Path Z is the trajectory toward it.
Today (the wedge)¶
whatifd fork is interactive. You run it from your laptop when you want to test a fix. The output is a Markdown verdict report you read yourself.
v0.2 (CI-ready — shipped)¶
Same CLI, with affordances that make CI integration trivial. All shipped in v0.2:
whatifd.config.yamlso the command-line argument soup lives in version control.Deterministic outputs (sorted JSON keys, stable hashes, seeded baseline sampling, per-field
x-deterministicannotations with cross-platform byte-equality enforcement).The
whatifd-forkcomposite GitHub Action that wraps the CLI and posts the Markdown verdict as a PR comment.
The CLI doesn’t change. Wiring it into PR checks is a one-step uses: reference.
v1.0 (the gate)¶
Path Z fully materialized:
whatifdruns on every PR that touches a prompt, model, or tool definition.It pulls real production failures from the last N days plus a representative baseline.
It replays the proposed change against both cohorts.
It blocks the merge if the verdict fails the configured policy.
The verdict report is posted as a PR comment, with evidence visible inline.
The reviewer’s experience:
✗ whatifd / experiment-runner-Verdict: Don't Ship
Failures: 14/20 improved, but baseline regressed 6/20.
Median baseline Δ: -0.18 (CI: [-0.24, -0.12])
See verdict report → /artifacts/whatifd-report.md
Top regression: trace t_492af ("agent now refuses requests it
previously handled correctly")
That output, on a PR, is the moment.
Why this matters¶
Today’s PR review for an LLM project doesn’t include a check that says:
This prompt change won’t regress your last 50 production failures, or your last 50 baseline successes.
Tomorrow’s should. whatifd is positioned to be that check.
What v0.1 deliberately deferred¶
v0.1 shipped the CLI such that CI integration was a thin wrapper that already-works:
Machine-readable output (JSON).
Exit codes that reflect the configured decision policy (0 / 1 / 2).
Configuration in a file, not arguments.
Deterministic-ish outputs.
v0.2 then shipped the whatifd-fork GitHub Action — proof the architecture supports the wrapper without any CLI change. v1.0 is the destination: the gate runs automatically, blocks the merge on Don’t Ship, and posts the rendered verdict inline.
Why the trajectory matters now¶
Naming Path Z changes the architecture even though we don’t build it yet. Every v0.1 design decision is tested against:
Will this still work when the CLI is invoked by a GitHub Action on a PR?
That filter is what produced the JSON output, the exit-code policy, the deterministic-output discipline, and the configuration-in-a-file convention. None of them are needed for interactive use. All are needed for the gate.