ToolMint
SEO Tools5 min readMay 17, 2026

How to Write a robots.txt File – Rules, Examples & Common Mistakes

Robots.txt is one of the simplest files on a website — a plain text file with a handful of rules — but it is also one of the easiest to get wrong. A misplaced 'Disallow: /' has caused entire sites to disappear from Google overnight. This guide covers the correct syntax, what to block and what not to block, and the most common mistakes that damage new sites.

Robots.txt Syntax: The Basics

A robots.txt file consists of one or more blocks, each targeting a specific crawler (User-agent) and listing the paths it should or should not access. User-agent: * Disallow: /admin/ Disallow: /private/ Allow: / Sitemap: https://example.com/sitemap.xml User-agent: * means the rule applies to all crawlers. Use Googlebot for Google-specific rules. Disallow: tells a crawler not to access paths starting with the specified string. Allow: overrides a Disallow for a specific sub-path. Sitemap: tells all crawlers where to find your sitemap — always include this. Lines beginning with # are comments. A blank line separates different User-agent blocks.

What to Block in robots.txt

Block paths that should never appear in search results: • Admin and login pages: /admin/, /wp-admin/, /login/, /dashboard/ • Internal search result pages: /search/, /?s=, /?q= • URL parameter variants that create duplicate content: /?sort=, /?filter=, /?ref= • Staging or development subdirectories: /staging/, /dev/ • Private API endpoints: /api/internal/, /api/private/ • Shopping cart and checkout pages (e-commerce): /cart/, /checkout/ • Thank-you and confirmation pages: /thank-you/, /order-confirmation/ For WordPress specifically: block /wp-admin/ (but explicitly Allow /wp-admin/admin-ajax.php for AJAX to function), /wp-includes/, and /?s= (site search).

What NOT to Block (Common Mistakes)

The most dangerous mistake is blocking the entire site: Disallow: / — this one line prevents every crawler from accessing anything on your site. It is the single most common catastrophic robots.txt error. Do not block CSS and JavaScript files. Old SEO advice said to block /wp-content/plugins/ and /wp-content/themes/ but this is wrong. Google needs to render your pages to understand them — blocking CSS and JS prevents proper rendering and can hurt rankings. Do not block your sitemap URL. Do not use robots.txt for security. The file is public and any crawler (including malicious ones) can read and ignore it. For truly private content, use server-side authentication or .htaccess password protection. Do not block pages you want indexed. This sounds obvious but it happens when someone copies a robots.txt template that blocks a path their own important content sits under.

Testing Your robots.txt Before Going Live

Google Search Console has a robots.txt Tester (under Settings → robots.txt) that lets you test any URL against your current robots.txt to see if it would be blocked. Use it after making any changes. Also verify that your live robots.txt is accessible at https://yourdomain.com/robots.txt — return a 200 status code, not a redirect or error. A 404 on robots.txt means all crawlers will proceed with no restrictions, which is usually fine but means any accidental sensitive paths are crawlable.

Try the tools mentioned in this guide

Frequently Asked Questions

Does robots.txt affect SEO?
Yes, significantly. Blocking important pages prevents them from being indexed and ranking. Allowing too many low-value pages wastes crawl budget. A correct robots.txt improves crawl efficiency for large sites and protects sensitive paths from appearing in search results.
Can I block a specific search engine with robots.txt?
Yes. Use a specific User-agent instead of *. For example, User-agent: AhrefsBot / Disallow: / blocks the Ahrefs crawler while allowing Google. Note that bad actors can ignore this — it only works for crawlers that follow the convention.
What is the Disallow: / vs Disallow: rule?
Disallow: with nothing after it means 'disallow nothing' — allowing everything. Disallow: / means 'disallow the entire site.' These look similar but have opposite effects. Always double-check the trailing slash.

Related Guides