A framework for running design past a human before it ships.
You have already seen the slop. The sameness. The confident wrongness of generated output that no one reviewed. AHD does not lecture about this problem; it gives you a toolchain to prevent it in your own workflow.
The system compiles a design brief into a structured evaluation, runs it against frontier vision models with explicit rubrics, and returns scored criticism you can act on. Not scores for vanity. Scores that block merge until they are addressed.
$ ahd eval-live swiss-editorial --brief briefs/landing.yml --models <specs> --n 10 # Compiles brief → runs 28-rule lint → dispatches to vision critic → returns scored report
OUTPUT
A directory of evaluated frames with per-rule scores, a deduplicated issue list ranked by severity, and a structured brief you can check in and diff. The vision critic runs with a frozen system prompt and token-constrained output, so results are reproducible across runs and models.
What ships
What is gated
forgejo mirror available on request