Your robots.txt has regulated which search engine crawlers can access your website for years. But beyond Googlebot, there are now half a dozen AI crawlers reading your content — and using it in their responses. If you don't know and control these crawlers, you lose control over your AI visibility. This guide shows which AI crawlers exist, how to configure them, and what the new llms.txt standard means.
GPTBot is OpenAI's crawler, collecting data for ChatGPT. ClaudeBot belongs to Anthropic and feeds the Claude AI assistant. PerplexityBot searches the web for Perplexity AI, an AI search engine that cites sources with links. Google-Extended is Google's crawler for Gemini and other AI products — separate from the regular Googlebot. Bingbot is also used for Microsoft's Copilot. Each crawler has its own user-agent string and can be separately controlled in your robots.txt. The decision about which crawlers you allow directly influences which AI systems your content appears in.
The configuration follows the same pattern as for Googlebot. With "User-agent: GPTBot" and "Allow: /" you grant OpenAI full access. With "Disallow: /internal/" you block specific directories. For maximum AI visibility, we recommend: Allow GPTBot, ClaudeBot, and PerplexityBot access to public content. Only block sensitive areas like admin panels or internal documents. Important: If you completely block AI crawlers, your content will no longer be cited in ChatGPT, Claude, and Perplexity. That may be desired for some content — for your products and services, it's a competitive disadvantage.
Beyond robots.txt, there's a new standard: llms.txt. This file sits in your website's root directory and provides AI systems with a compact summary of your business. While robots.txt only controls access, llms.txt actively delivers information: Who are you? What do you offer? What are your core products? AI systems can use this file to better understand your business and recommend it more accurately. Luminara AI generates your llms.txt automatically from your product data and Schema.org information. Combined with Luminara's integration guide, you can set up robots.txt and llms.txt in minutes — no developer or technical expertise needed.
robots.txt and llms.txt are the foundation of your AI visibility. Properly configuring both files lets you actively determine which AI systems your content appears in. Luminara AI helps you with this — from automatic llms.txt generation to the complete integration guide.
Get started with Luminara AIGet started with Luminara AI now and optimize your presence in AI search engines.
Get Started