Frequently asked questions
What does the AI Crawler Access Auditor check?
Two things at once. First, it parses your robots.txt and reports, per bot, which AI crawlers you allow or block — across the full reference list of answer engines, search indexes, user-action fetchers, and model-training crawlers. Second, it checks whether your content is actually in the raw HTML those crawlers receive, because most AI crawlers don’t execute JavaScript.
Why does “allowed but empty” matter?
It’s the trap almost no checker catches. You can welcome every answer engine in robots.txt and still be invisible: if your page renders its content with client-side JavaScript, the crawlers arrive, fetch the raw HTML, and find an empty shell. Allowing a bot and being readable by it are two separate things — this tool checks both and flags the gap.
Is blocking model-training crawlers bad for AI search?
No. Training crawlers (GPTBot, ClaudeBot, Google-Extended, CCBot) are separate from the answer-engine and search bots that decide citations (OAI-SearchBot, Claude-SearchBot, PerplexityBot, Googlebot, Bingbot). You can block training while staying fully citable — see the copy-paste robots.txt snippets on our AI crawler reference page.
Does “allowed” guarantee a crawler will obey it?
Not always. The verdict reflects what your robots.txt instructs under the standard. Some crawlers — Bytespider is the common example — are reported to ignore robots.txt in the wild, and user-initiated fetchers (ChatGPT-User, Perplexity-User) are not bound by it at all. The results table notes each bot’s stated policy so you can verify against your own server logs.