Question 1

What does the AI Crawler Access Auditor check?

Accepted Answer

Two things at once. First, it parses your robots.txt and reports, per bot, which AI crawlers you allow or block — across the full reference list of answer engines, search indexes, user-action fetchers, and model-training crawlers. Second, it checks whether your content is actually in the raw HTML those crawlers receive, because most AI crawlers don’t execute JavaScript.

Question 2

Why does “allowed but empty” matter?

Accepted Answer

It’s the trap almost no checker catches. You can welcome every answer engine in robots.txt and still be invisible: if your page renders its content with client-side JavaScript, the crawlers arrive, fetch the raw HTML, and find an empty shell. Allowing a bot and being readable by it are two separate things — this tool checks both and flags the gap.

Question 3

Is blocking model-training crawlers bad for AI search?

Accepted Answer

No. Training crawlers (GPTBot, ClaudeBot, Google-Extended, CCBot) are separate from the answer-engine and search bots that decide citations (OAI-SearchBot, Claude-SearchBot, PerplexityBot, Googlebot, Bingbot). You can block training while staying fully citable — see the copy-paste robots.txt snippets on our AI crawler reference page.

Question 4

Does “allowed” guarantee a crawler will obey it?

Accepted Answer

Not always. The verdict reflects what your robots.txt instructs under the standard. Some crawlers — Bytespider is the common example — are reported to ignore robots.txt in the wild, and user-initiated fetchers (ChatGPT-User, Perplexity-User) are not bound by it at all. The results table notes each bot’s stated policy so you can verify against your own server logs.