What is llms.txt?
llms.txt is a markdown file at /llms.txt that summarises a website for AI crawlers and LLM-based agents. It tells them what the site is, who it's for, and which pages are authoritative — similar in role to robots.txt for traditional search engines, but with curated editorial content instead of crawl rules.
llms.txt is an emerging web standard proposed by Jeremy Howard (Answer.AI) and codified at llmstxt.org. It's a plain-text markdown file served at the root of a domain (e.g. yoursite.com/llms.txt) that gives AI agents a fast, structured summary of the site: title, one-paragraph description, and a curated list of authoritative pages grouped by category. Designed for LLM consumption, not human reading — though it's perfectly readable either way.
What an llms.txt file looks like
The minimal spec is four sections. Here's a working example from our own llms.txt (abbreviated):
# monitoraeo > AI Answer Engine Optimisation (AEO) and Generative Engine Optimisation (GEO) audits. > We measure how often Claude, ChatGPT, Perplexity, Gemini and Google AI Overviews > name a brand, cite its domain, and recommend it in buyer-facing answers. ## Key concepts - [What is AEO?](https://www.monitoraeo.com/what-is-aeo): Answer Engine Optimisation — the practice of getting your brand named... - [What is GEO?](https://www.monitoraeo.com/what-is-geo): Generative Engine Optimisation — the technical layer... ## Product - [How it works](https://www.monitoraeo.com/how-it-works): A monitoraeo audit takes a domain... - [Audit product](https://www.monitoraeo.com/product/audit): One-off diagnostic across all 5 AI engines. ## Pricing - [Free preview](https://www.monitoraeo.com/#preview): 8 buyer-facing questions... - [Two Engine Audit](https://www.monitoraeo.com/pricing): $29 one-off...
The four required pieces, in order:
- H1 — the site name (one line, no description)
- Blockquote — a one-paragraph description of what the site is and who it's for
- H2 sections grouping links — typically "Key concepts", "Product", "Pricing", "Documentation", etc.
- Optional H2 — for less-critical pages an LLM might want but shouldn't prioritise
That's the whole spec. There's no XML schema to validate against, no rigid metadata fields. The discipline is editorial: pick the 10–30 pages that actually represent your site, write a one-line description of each, group them by purpose.
llms.txt vs robots.txt vs sitemap.xml
All three are root-level files that talk to crawlers. They solve different problems and should all be published — they don't replace each other.
| What | robots.txt | sitemap.xml | llms.txt |
|---|---|---|---|
| Format | Plain text rules | XML URL list | Markdown editorial |
| Audience | All crawlers | Search engine crawlers | LLM-based agents + AI crawlers |
| Content style | Allow/disallow rules | Every URL + lastmod | Curated summary + key links |
| Optimises for | Access control | Discoverability | Comprehension |
| Typical length | 10–30 lines | 100–10,000+ URLs | 30–100 lines |
| Should you publish? | Yes | Yes | Yes |
Do AI engines actually read it?
Honest answer: mixed adoption, but trending up fast. Status as of mid-2026:
- Anthropic — confirmed reading llms.txt in their web crawler (
anthropic-ai/ClaudeBot) - Perplexity — confirmed parsing it as part of their indexing pipeline
- RAG-as-a-service vendors (Mendable, Helicone Web, etc.) — many natively support fetching llms.txt as the entry point for site ingestion
- OpenAI / ChatGPT — no formal announcement, but their search-augmented models behave as though they read it (anecdotal, not confirmed)
- Google AI products — not yet honouring it as of mid-2026; uses sitemap.xml + Google-Extended robots rules
- Smaller AI tools (browser extensions, summarisers) — growing rapidly
The publishing cost is one-time ~10 minutes and zero ongoing. The downside is none. The upside compounds as adoption grows. There's no defensible reason not to publish one.
How to publish your llms.txt
Three steps:
1. Write the file. Start from the spec at llmstxt.org or use our example above as a template. Keep it under 100 lines. Include only your authoritative pages — the ones you'd want quoted by an AI summarising your site. Skip blog posts, support FAQs, anything ephemeral.
2. Serve it at the root. Publish at yoursite.com/llms.txt with content-type text/markdown; charset=utf-8 (preferred) or text/plain. Static site generators (Hugo, Astro, Next.js static export) handle this by dropping a file in public/. For dynamic sites, expose a single route — it's just text, no templating needed.
3. Reference it from robots.txt. Add a comment line so AI crawlers that read robots first know your llms.txt exists:
# AI summary: https://yoursite.com/llms.txt
Sitemap: https://yoursite.com/sitemap.xml
Common mistakes
- Including every URL on the site — defeats the purpose. llms.txt is a curated summary, not a sitemap. 10–30 links is the right ballpark for most sites.
- Skipping the blockquote description — the one-paragraph description is the most-quoted part of the file. LLMs use it as the canonical site summary. Write it carefully.
- Wrong content-type — serving it as
text/htmlmeans tools that sniff type may skip it. Usetext/markdownortext/plain. - Letting it go stale — if your /about page moved to /company but llms.txt still points to /about, AI crawlers cite broken links and you look unmaintained. Review quarterly.
- Treating it as a marketing brochure — keep the prose terse and factual. LLMs are excellent at detecting promotional fluff and weight it lower.
How monitoraeo uses llms.txt
We publish our own at /llms.txt and check whether yours exists as one of the 15 technical foundations in every paid audit. Sites that publish a well-formed llms.txt consistently see higher visibility scores in Claude and Perplexity within 2–4 weeks of publishing — the engines that confirm parsing it. See the full methodology →
Related concepts
- What is GEO? — Generative Engine Optimisation; llms.txt is one of the 6 GEO foundations.
- What is AEO? — Answer Engine Optimisation, the discipline that this serves.
- Glossary — every AI search term defined.
Frequently asked about llms.txt
Do AI engines actually read llms.txt?
Mixed adoption but trending up. Anthropic and Perplexity confirmed reading it. OpenAI hasn't formally announced support. Google AI products don't fetch it as of mid-2026. Given the publishing cost is one-time ~10 minutes and zero ongoing, the downside is none — worth publishing today.
How is llms.txt different from robots.txt?
robots.txt gates crawler access via allow/disallow rules. llms.txt provides editorial summary content for LLM-based agents. Different jobs — both should be published.
How is llms.txt different from sitemap.xml?
Sitemap = every URL in machine-readable XML (completeness). llms.txt = 10–30 curated pages with editorial descriptions (comprehension). Both should exist.
What goes in an llms.txt file?
Four sections: H1 with site name, blockquote with one-paragraph description, one or more H2-grouped link sections, optional "Optional" H2 for less-critical pages. Each link is a markdown list item with URL + short description. Typically 30–100 lines total.
Where do I publish llms.txt?
At yoursite.com/llms.txt with content-type text/markdown; charset=utf-8 (preferred) or text/plain. Reference it from robots.txt with a comment line so crawlers that read robots first know it exists.
How often should I update llms.txt?
When site structure changes — new top-level sections, renamed pages, deprecated features. Not every blog post. Monthly review + quarterly updates is the right cadence for most sites.