AHD · Positioning
What AHD is.
AHD is a guardrail and evaluation layer for AI-generated design. Web UI, graphic design, illustration, image generation. It is not a generator itself; it sits beside any generator and measures whether the output exhibits the specific, repeated failure modes that mark AI-generated design as AI-generated.
Four pieces, one purpose
-
A named taxonomy of AI design slop.
Thirty-nine concrete tells across web, graphic and typographic surfaces. Enforced today by thirty-five HTML and CSS rules, three SVG rules, and fourteen vision-critic rules on rendered pixels. The rule count is higher than the taxonomy count because several entries are covered by more than one rule.
-
Style tokens as promptable design direction.
Ten curated bundles spanning web, editorial, identity, illustration and image-generation surfaces. Each declares grid or composition, type, palette, forbidden list, required quirks, reference lineage and per-model prompt fragments.
-
A brief compiler.
Turns a structured intent into constrained model instructions for any surface, with a final mode for single-shot output and a draft mode for human-in-the-loop exploration.
-
An empirical eval loop.
A controlled raw-vs-compiled comparison across any set of text or image generators, scored against the taxonomy, with attempted-vs-scored counts, canonical model identifiers, per-model deltas and per-tell frequency. Vision critique on rendered pixels via a multimodal critic.
What AHD is not
Not a prompt pack. Prompt packs sell style recipes. AHD's value is the reproducible scoring that tells you whether any recipe, ours or yours, actually moves a given model off its median.
Not a canvas product. Galileo, v0, Lovable, Bolt, Magic Patterns, Subframe optimise "prompt to shipped UI." Midjourney, Krea, Lovart optimise "prompt to image." AHD sits beside any of them as an enforcement layer.
Not a design system. Design systems prescribe components. AHD prescribes the thirty-nine patterns a page or image must not exhibit, and measures compliance.
What makes this defensible
The moat is not the prompts. The moat is the taxonomy plus reproducible scoring. A prompt anyone can rewrite; a named, versioned taxonomy with deterministic lint rules and a vision critic is an artefact that compounds with use. A style token anyone can fork; a per-release eval harness that publishes attempted counts, extraction failures, exact model identifiers, confidence intervals and negative results is a cultural commitment competitors rarely match.
Prior art
Pieces of AHD exist in the wild. This combination does not.
Prompt libraries for AI UI generation (uiprompt.io, Promter, GenDesigns, WebGardens) encode style direction; they do not carry a taxonomy or an eval.
Design-token linters (@lapidist/design-lint, stylelint-design-tokens-plugin) enforce token consistency in source. AHD's rules target AI-generated anti-patterns, not adherence to an internal design system.
Figma-era audit tools (DesignLint AI) audit design files against token rules. AHD audits rendered output and source, not design files.
AI UI benchmarks (UI Bench) score generated HTML on engineering quality: axe, Lighthouse, semantics. AHD rates a page's slop fingerprint under a paired raw-vs-compiled control.
What nobody else bundles: a named AI-slop taxonomy spanning web and image, a token-driven brief compiler, a deterministic linter for source-checkable tells, a vision critic for rendered tells, and a raw-vs-compiled empirical eval loop, all in one reproducible project.
What we promise and what we don't
We promise an honest, versioned taxonomy spanning web, graphic and illustration. We promise a deterministic source-level linter covering every taxonomy entry that can be decided from code. We promise a vision-critic pipeline that works on any rendered image. We promise an eval harness that publishes attempted, extracted and scored counts, canonical model identifiers, and per-model deltas including negative results.
We do not promise the compiled brief beats the raw brief for every model. It does not. The measured run publishes Claude Opus 4.7 dropping tells to zero, Qwen 2.5 Coder unmoved, Llama 3.3 70B regressing under the compiled prompt, and SDXL Lightning ignoring the image negative entirely. The framework exposes these differences; it does not paper over them.
We do not promise aesthetic judgement. The linter catches tells, not taste. A page or image can pass every rule and still be bad design. AHD narrows the output; a human still picks.
Shelf life
AHD's core premise, that AI generators default to a set of named failure modes, is time-bound. Two to three years from now frontier models may produce credible swiss-editorial output with no compiled prompt layer in front of them, and the "compiled beats raw" claim softens against the best models of that moment. We treat this as a feature of the framing, not a bug. Two things survive a world where the frontier catches up: the taxonomy, as a historical record of how AI-generated design looked when the distribution was still sloppy, and the eval methodology, as a template for the next distributional-failure problem. The vehicles that may not survive are the source linter and the compiler, both of which exist to correct output the frontier eventually learns to produce on its own. The framework remains useful for OSS and open-weight models, which lag frontier by eighteen to twenty-four months on the distributional axes AHD tracks. AHD is shipped today as a working guardrail; it is also published as a research artefact, because we think reading it in three years is more valuable than pretending it will age into irrelevance.
The raw measured data that underwrites every claim on this site is on the cross-provider page, the narrow-roster five-model page, and the per-run manifest links inside each. Results appear in full, including the models and runs where the compiled prompt lost.
Adjacent reading: the thirty-nine-tell taxonomy, how we measure, how to use AHD in production, install AHD.