Robots.txt Generator — Create & Customize robots.txt
Generate a robots.txt file with user-agent rules, allow/disallow paths, and sitemap URL.
Prevent AI training bots from scraping your content.
# robots.txt # Generated by Devryo User-agent: * Allow: /
About Robots.txt Generator — Create & Customize robots.txt
Robots.txt Generator creates a valid robots.txt file for your website in seconds. Block AI training crawlers (GPTBot, ClaudeBot, Google-Extended, Bytespider), protect admin and private pages with Disallow rules, and add your Sitemap URL — all through a visual UI. No syntax knowledge required. Download ready-to-upload or copy instantly.
How to Use
- 1Choose a preset — "Allow All" for standard sites, "Block All" to stop all crawlers, or "Custom" to set specific rules.
- 2In Custom mode, tick the paths you want to disallow from a categorized checklist (admin, private, user areas, etc.), or enter your own paths.
- 3To prevent AI bots from scraping your content, check the AI crawlers you want to block — or hit "Block all AI bots" to select all 15 at once.
- 4Optionally enter your Sitemap URL. It will be appended automatically as a Sitemap: directive.
- 5The preview updates in real time. Hit "Copy" or "Download robots.txt" when you're ready.
Features
- No need to memorize robots.txt syntax — the visual UI generates it for you
- Block 15 AI training crawlers (GPTBot, ClaudeBot, Google-Extended, Bytespider, and more)
- Pick common Disallow paths from a checklist — /admin/, /wp-admin/, /cart/, and more
- Download as a ready-to-upload robots.txt file — no reformatting needed
- Free, no sign-up, runs entirely in your browser. Works with WordPress, Shopify, Next.js and any platform.
What is robots.txt and Why Does It Matter for SEO?
A robots.txt file is a plain text file that lives at the root of your website (https://yourdomain.com/robots.txt). It uses the Robots Exclusion Protocol to communicate instructions to web crawlers — telling them which pages they may visit and which they should skip. Every major search engine, including Google, Bing, and Yahoo, reads this file before crawling your site.
How Crawlers Use robots.txt
When a search engine bot visits your site, it first checks for robots.txt at your domain root. If the file exists, the bot reads the rules and adjusts its crawl behavior accordingly. This happens before any other page is fetched. A missing or misconfigured robots.txt can waste your crawl budget on unimportant pages, slow down indexing of your key content, or — in worst cases — accidentally block your entire site from search engines.
robots.txt vs. noindex: What's the Difference?
These two tools solve different problems. robots.txt controls whether a page gets crawled (visited by the bot). The noindex meta tag controls whether a visited page gets added to the search index. A page blocked in robots.txt will not be crawled, but it can still appear in search results if other sites link to it. For guaranteed removal from search results, use a noindex tag — not robots.txt alone.
Crawl Budget and Why It Matters
Googlebot allocates a limited number of page crawls per site per day — your "crawl budget." For large sites, if Googlebot wastes this budget on duplicate URLs, paginated archives, or low-value parameters, important pages may go un-crawled. Using robots.txt to block low-value sections like /tag/, /search/, and URL parameters helps concentrate your crawl budget on the pages that matter most.
robots.txt Directives and Disallow / Allow Syntax
A robots.txt file is built from a small set of directives. Understanding each one lets you write precise rules, control which paths are blocked, and avoid the configuration errors that silently hurt rankings.
User-agent — Target Specific Crawlers
User-agent specifies which crawler the following rules apply to. Use User-agent: * to target all crawlers. Use a specific name like User-agent: Googlebot to target only Google's crawler, or User-agent: GPTBot to target OpenAI's training bot. Each User-agent block applies its rules until the next blank line. More specific blocks take precedence over the wildcard block.
Disallow and Allow — Control Which Paths Are Crawled
Disallow: /path/ tells the crawler not to visit any URL starting with that path. Allow: /path/ explicitly permits a path even when a broader Disallow rule would block it. Key patterns: "Disallow: /" blocks your entire site (dangerous). "Disallow: /admin/" blocks the admin directory. "Disallow: " (empty) allows everything. "Disallow: /private/" + "Allow: /private/public.html" blocks all of /private/ except one page. Path matching is prefix-based: Disallow: /blog blocks /blog, /blog/, and /blog/post-1.
Sitemap — Help Crawlers Find Your Content
The Sitemap directive points crawlers to your XML sitemap. Example: Sitemap: https://example.com/sitemap.xml. This is not a restriction — it's a discovery hint that helps bots find your content faster. You can include multiple Sitemap lines for separate sitemaps. This generator automatically appends this directive when you enter a sitemap URL.
Crawl-delay — Reduce Server Load
Crawl-delay: 10 tells supporting bots to wait 10 seconds between page requests, reducing server load from aggressive crawlers. Note: Googlebot ignores this directive entirely. To control Googlebot's crawl rate, use Google Search Console → Settings → Crawl Rate. Bingbot, Baiduspider, and most non-Google crawlers do honor Crawl-delay.
Common robots.txt Mistakes to Avoid
Robots.txt errors can have serious SEO consequences. These are the most frequent mistakes webmasters make when configuring their robots.txt file.
Accidentally Blocking Your Entire Site
The single most damaging mistake: Disallow: / under User-agent: * blocks all crawlers from your entire site. This removes your website from Google entirely. Always double-check this rule before uploading. If your site suddenly disappears from search results, check robots.txt first.
Blocking CSS and JavaScript
Blocking /wp-content/ or other asset folders prevents Googlebot from loading your pages' styles and scripts. Google needs to render pages like a browser to evaluate their content and user experience. Blocked assets can result in lower rankings because Google cannot see what your pages look like to real users.
Using robots.txt as a Security Measure
Robots.txt is publicly visible at yourdomain.com/robots.txt. Any content listed under Disallow: is broadcast to the world, including malicious bots that don't follow the protocol. Never use robots.txt to "hide" sensitive content — it actually advertises the path to anyone looking. Use proper authentication and server-level access controls for genuinely private content.
How to Add robots.txt to Your Website
Once you've generated your robots.txt file with this tool, the installation process depends on your platform. The file must always be placed at the root of your domain, accessible at https://yourdomain.com/robots.txt.
WordPress
WordPress auto-generates a basic robots.txt if none exists. To use your own: upload the downloaded robots.txt file to the root directory of your WordPress installation via FTP or your hosting file manager. Alternatively, use an SEO plugin: in Yoast SEO go to Tools → File editor; in Rank Math go to General Settings → Edit robots.txt. Both let you paste your custom rules directly without FTP access.
Shopify
Since 2021, Shopify stores support a customizable robots.txt.liquid template. Go to your Shopify Admin → Online Store → Themes → Actions → Edit code. Find robots.txt.liquid in the Templates folder. Replace the content with your custom rules or add sections as needed. Changes take effect immediately.
Static Sites and Other Platforms
For static sites (Next.js, Astro, Hugo, plain HTML), place the robots.txt file in your public/ or static/ root directory so it is served at the domain root. For other CMS platforms like Wix or Squarespace, check your platform's SEO settings panel — most modern platforms include a dedicated robots.txt editor under their SEO or advanced settings.
FAQ
- What does Disallow: / mean in robots.txt?
- Disallow: / (with a forward slash) blocks the crawler from visiting every page on your site. Combined with User-agent: *, this hides your entire site from all search engines — the most common accidental mistake in robots.txt configuration. If your site disappears from Google, check this rule first. Disallow: (empty, no value) means the opposite: allow everything.
- How do I block only certain pages, not the whole site?
- Use Disallow followed by the specific path. Example: "Disallow: /admin/" blocks everything under /admin/. "Disallow: /cart" blocks /cart and all sub-paths. Leave "Disallow:" with an empty value to allow all pages. You can combine multiple Disallow lines in one User-agent block to block several paths at once.
- Do GPTBot and ClaudeBot actually respect robots.txt?
- Yes. OpenAI (GPTBot), Anthropic (ClaudeBot), Google (Google-Extended), and other major AI companies have publicly committed to honoring robots.txt rules. Blocking them using Disallow: / in a User-agent block is the standard, widely accepted method to opt out of AI training data collection.
- What is the difference between robots.txt and noindex?
- robots.txt controls whether a crawler visits a URL at all. The noindex meta tag controls whether a visited page appears in search results. A critical difference: a page blocked in robots.txt can still appear in Google search if another site links to it — Google knows the URL exists, it just hasn't crawled the content. To guarantee removal from search results, use noindex — not robots.txt alone.
- Should I block CSS and JavaScript files?
- No. Blocking /wp-content/ or asset folders prevents Googlebot from loading your pages' styles and scripts. Google needs to render pages like a real browser to evaluate content and user experience. Blocked CSS/JS can cause lower rankings because Google cannot see what your pages look like. Only block assets that are genuinely private.
- How do I install robots.txt on WordPress, Shopify, or Next.js?
- WordPress: upload the file to your WordPress root via FTP, or use Yoast SEO → Tools → File editor. Shopify: go to Online Store → Themes → Edit code → robots.txt.liquid and paste your rules. Next.js / Astro / Hugo: place robots.txt in the public/ or static/ directory — it will be served at the domain root automatically after build.
- How do I test my robots.txt file?
- Open https://yourdomain.com/robots.txt in a browser to confirm the file is live and correct. For deeper testing, use our robots.txt Checker to validate syntax, detect errors, and audit AI crawler blocks — or use Google Search Console → Settings → robots.txt tester for Googlebot-specific verification.
Found a bug or something not working as expected?
Report a bug →