What is Robots.txt File & How to Create it? Sample & Examples

If you want search engines to crawl your website correctly, the robots txt file becomes one of the first things you must set up. We use the robots txt file in SEO to tell Google which pages you want them to access and which ones they should avoid.

If you have a website, then you need to know that every unnecessary crawl wastes time and crawl budget. That’s exactly where robots.txt helps.

It controls what search engines can explore, protects private sections like admin folders, and ensures Google reads your site efficiently. A simple text file can completely change how your site appears in search results.

What is Robots.txt File?

A robots.txt file is a plain text file placed in your website’s root folder. It guides search engine bots on which pages they can crawl and which they must ignore. From an SEO point of view, it helps you manage crawl budget, prevent indexing of unwanted URLs, and keep private or duplicate sections out of search results.

Robots.txt is not a security tool. It only offers instructions to bots. But when you use it correctly, you give search engines a clean path to follow, improving your website’s technical SEO, crawl efficiency, and overall online visibility.

Robots.txt File Sample

This is a sample robots txt file:

User-agent: *
Disallow: /admin/
Allow: /

Sitemap: https://example.com/sitemap.xml

These are the components of robots txt file with meaning:

Component	Description
User-agent	Defines which bot the rule applies to (e.g., Googlebot, Bingbot, all bots using *).
Disallow	Tells bots which pages or directories they must not crawl.
Allow	Specifies exceptions inside a disallowed section that bots can still crawl.
Sitemap	Provides the link to the XML sitemap for better crawling and indexing.
Wildcard (*)	Used to target multiple URLs or patterns.
$ Symbol	Means “end of URL” — useful for blocking specific file types.

How Robots.txt File Works?

1. Search engine bots visit your domain first

Before crawling anything, Googlebot or Bingbot visits yourwebsite.com/robots.txt. This is the first file they read. If it exists, bots follow the rules; if not, they assume everything is allowed.

2. Bots identify who the rule is for (User-Agent)

The file starts with “User-agent”. This tells which bot the rule applies to.

Example:

User-agent: Googlebot controls only Google’s main crawler.

3. Bots read what is blocked (Disallow)

The “Disallow” directive tells bots which pages or folders they should not crawl.

Example:

Disallow: /admin/ blocks all admin pages from crawling.

4. Bots read what is allowed (Allow)

If there are exceptions inside a blocked folder, the “Allow” directive clarifies what bots can crawl.

Example:

Allow: /admin/help-page/

5. Bots follow hints about site structure (Sitemap)

A sitemap URL placed in robots.txt helps bots quickly find all important pages to crawl.

Example:

Sitemap: https://example.com/sitemap.xml

6. Bots follow rules but are not forced

Robots.txt is a guideline, not a law. Good bots follow it; harmful or unknown bots may ignore it. But for SEO, Google and all major crawlers fully respect it.

Recommended Professional
Certificates

Digital Marketing Mentorship Program

4.9 ★★★★★ (19476)

👤 20000 Learners
⏱ 20 Weeks

View Brochure Learn More

Advanced AI Marketing Bootcamp

4.9 ★★★★★ (18789)

👤 Learners
⏱ 5 Weeks

View Brochure Learn More

Performance Marketing Bootcamp

4.9 ★★★★★ (919)

👤 17000 Learners
⏱ 8 Weeks

View Brochure Learn More

SEO Specialist Bootcamp

4.9 ★★★★★ (6983)

👤 18000 Learners
⏱ 10 Weeks

View Brochure Learn More

Importance of Robots txt File

These are the benefits of robots txt file for a website:

1. Controls What Search Engines Can Access

Robots.txt helps us decide which pages should appear in Google and which must stay private. This includes admin areas, login pages, filters, backend URLs, or incomplete content that shouldn’t be indexed.

2. Saves Crawl Budget on Large Websites

If your site has thousands of URLs, Google may not crawl everything. Robots.txt ensures bots spend their crawl time only on important pages, improving indexing speed and SEO performance.

3. Prevents Duplicate, Thin, or Irrelevant Pages From Indexing

E-commerce filters, search pages, tags, and archives can create messy duplicate URLs. Robots.txt lets us block them easily and maintain clean search results.

4. Helps With Site Organization and Clear Crawl Paths

When bots see a well-structured robots.txt and sitemap inclusion, crawling becomes faster, smoother, and more accurate.

5. Protects Sensitive Sections (Not Security, but Instructions)

While it doesn’t secure a folder, it instructs bots to avoid crawling backend resources, confidential areas, and temporary pages.

Robots.txt Examples

These are some examples of robots txt file for different websites:

WordPress Robots.txt

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /?s=
Disallow: /trackback/
Sitemap: https://example.com/sitemap_index.xml

E-Commerce Store Robots.txt

User-agent: *
Disallow: /checkout/
Disallow: /cart/
Disallow: /account/
Disallow: /search/
Allow: /
Sitemap: https://store.com/sitemap.xml

Robots.txt Blocking Staging or Development Sites

User-agent: *
Disallow: /

Robots.txt for a Blog With Categories & Tags

User-agent: *
Disallow: /tag/
Disallow: /category/
Allow: /
Sitemap: https://blogsite.com/sitemap.xml

Advanced Robots.txt Example With Wildcards

User-agent: *
Disallow: /*?ref=
Disallow: /*.pdf$
Allow: /
Sitemap: https://example.com/sitemap.xml

Robots.txt Example for Large News Website

User-agent: *
Disallow: /private/
Disallow: /internal/
Disallow: /drafts/
Allow: /
Sitemap: https://newsportal.com/news-sitemap.xml
Sitemap: https://newsportal.com/video-sitemap.xml

How to Create Robots.txt File for Website?

This is how to create robots.txt file:

Step 1: Decide What You Want Search Engines to Crawl (Planning)

Before you create a robots.txt file, you must be clear about what we want search engines to see and what they should skip.

At WsCube Tech, we allow all important pages (courses, tutorials, blogs, etc.) and block sections like 404 pages, admin area, portfolio, internal campaigns, and PDFs that don’t need indexing.

Step 2: Open a Plain Text Editor or SEO Plugin

To create a robots.txt file, open any plain text editor like Notepad (Windows), TextEdit (Mac), or code editor (VS Code).

If your site is on WordPress, you can also use SEO plugins like Yoast, Rank Math, or server-level file manager (cPanel / hosting panel) to manage robots.txt directly.

Step 3: Start With User-Agent and Basic Allow Rule

The first line of a robots.txt file tells which crawlers the rules apply to. For WsCube Tech, we want to define rules for all bots, so we use:

User-agent: *
Allow: /
Allow: /blog/wp-admin/admin-ajax.php

Here,

User-agent: * → all bots (Google, Bing, etc.).

Allow: / → let them crawl the full site.

Allow: /blog/wp-admin/admin-ajax.php → specifically allow this important AJAX file even if admin is blocked later.

Step 4: Add Disallow Rules for Unwanted Sections

Next, we tell bots what not to crawl. At WsCube Tech, we don’t want certain URLs like 404 page, portfolio, certain events, refer-and-earn, or plugin folders to be crawled or indexed.

Disallow: /404
Disallow: /portfolio
Disallow: /blog/wp-admin/
Disallow: /blog/wp-content/plugins/
Disallow: /events/
Disallow: /refer-and-earn
Disallow: /challenges
Disallow: /?ref
Disallow: /*.pdf$

These rules avoid crawling non-SEO pages, internal campaigns, and all .pdf files.

/*.pdf$ blocks all URLs ending with .pdf, which is useful when we don’t want PDFs indexed.

Step 5: Add Sitemap URLs for Better Crawling

To make crawling easier, we list all important sitemaps in the robots.txt file. WsCube Tech has multiple sitemaps for different sections of the website:

# Sitemaps
Sitemap: https://www.wscubetech.com/sitemap.xml
Sitemap: https://www.wscubetech.com/categories-sitemap.xml
Sitemap: https://www.wscubetech.com/courses-sitemap.xml
Sitemap: https://www.wscubetech.com/tutorials-sitemap.xml
Sitemap: https://www.wscubetech.com/programs-sitemap.xml
Sitemap: https://www.wscubetech.com/quizzes-sitemap.xml
Sitemap: https://www.wscubetech.com/compiler-sitemap.xml
Sitemap: https://www.wscubetech.com/free-courses-sitemap.xml
Sitemap: https://www.wscubetech.com/blog/sitemap_index.xml

This helps search engines quickly discover all key pages: courses, tutorials, programs, quizzes, blog articles, and more.

Step 6: Save the File as robots.txt

Once you add all rules, save the file as: robots.txt (not .doc, .rtf, or anything else — it must be a plain .txt file).

Make sure there are no extra formatting styles or hidden characters. It should be a clean text file only.

Step 7: Upload Robots.txt to the Root Folder of Your Domain

Now you upload this file to your website’s root directory (public_html or root of the domain) via:

cPanel or hosting file manager

FTP/SFTP (FileZilla, WinSCP, etc.)

Or directly via your server config if you’re on a VPS/cloud setup

For WsCube Tech, the final URL is:

https://www.wscubetech.com/robots.txt

This is where every search engine bot will look for it.

Step 8: Check Your Robots.txt File in the Browser

To make sure everything is correct, open your robots.txt in any browser:

Visit: https://www.wscubetech.com/robots.txt

If you can see the file content clearly, it means the file is uploaded correctly and publicly accessible to bots.

You should do the same for your own domain once your robots.txt is ready.

Final Version: WsCube Tech Robots.txt

User-agent: *
Allow: /
Allow: /blog/wp-admin/admin-ajax.php
Disallow: /404
Disallow: /portfolio
Disallow: /blog/wp-admin/
Disallow: /blog/wp-content/plugins/
Disallow: /events/
Disallow: /refer-and-earn
Disallow: /challenges
Disallow: /?ref
Disallow: /*.pdf$

# Host
Host: https://www.wscubetech.com/

# Sitemaps
Sitemap: https://www.wscubetech.com/sitemap.xml
Sitemap: https://www.wscubetech.com/categories-sitemap.xml
Sitemap: https://www.wscubetech.com/courses-sitemap.xml
Sitemap: https://www.wscubetech.com/tutorials-sitemap.xml
Sitemap: https://www.wscubetech.com/programs-sitemap.xml
Sitemap: https://www.wscubetech.com/quizzes-sitemap.xml
Sitemap: https://www.wscubetech.com/compiler-sitemap.xml
Sitemap: https://www.wscubetech.com/free-courses-sitemap.xml
Sitemap: https://www.wscubetech.com/blog/sitemap_index.xml

Read More Digital Marketing Blogs

Digital Marketing Funnel	Future of Digital Marketing	Types of Digital Marketing
Benefits of Digital Marketing	Latest Digital Marketing Trends	Highest Paying Digital Marketing Jobs
Digital Marketing Channels	Digital Marketing ROI	Digital Marketing Objectives

Most Common User Agents in Robots.txt

When you use the User-agent: * directive in your robots.txt file, it means the rules apply to all crawlers by default. If you add specific user agents after that (like Googlebot or Bingbot), those customized rules will override the general “*” rule for those particular bots.

User-Agent	Company / Platform
Googlebot	Google
Googlebot-Mobile	Google
Bingbot	Microsoft Bing
Slurp	Yahoo
DuckDuckBot	DuckDuckGo
Baiduspider	Baidu
YandexBot	Yandex
GPTBot	OpenAI
CCBot	CommonCrawl
ClaudeBot	Anthropic

Example using multiple bots in a robots.txt file:

# Allow Google full access
User-agent: Googlebot
Allow: /

# Block Bing completely
User-agent: Bingbot
Disallow: /

# Block AI crawlers
User-agent: GPTBot
Disallow: /

User-agent: CCBot
Disallow: /

Where to Upload Robots.txt File?

You must upload the robots.txt file to the root directory of your website — not inside any folder. The correct path is:

https://yourwebsite.com/robots.txt

This means:

Place the file in public_html (if using shared hosting)

Or at the root of your domain (if using VPS/cloud)

Search engine bots always check this exact location. If the file is in any other folder, it will not work.

How to Check Robots.txt of a Website?

Checking a website’s robots.txt file is extremely simple:

1. Type the URL + /robots.txt

Example:

https://www.wscubetech.com/robots.txt

https://www.google.com/robots.txt

https://www.amazon.in/robots.txt

2. Use Google Search Operators

Type:

site:example.com robots.txt

3. Use SEO Tools or GSC Tester

Tools like Ahrefs, Screaming Frog, and the Google Search Console robots testing tool can analyse it.

Advanced Robots.txt Tips for SEO

Block Parameter-Based URLs to Avoid Duplicate Content

URLs generated by filters, sorting, or tracking parameters can create thousands of duplicate pages. Use patterns like /*?ref= to block them.

Allow Essential JS, CSS, and AJAX Files

Google needs these to properly render your website. Never block theme files, layout CSS, or important scripts.

Use “$” to Block Specific File Types

For example, blocking all PDFs:

Disallow: /*.pdf$

This ensures only full URLs ending with .pdf are blocked.

Add Multiple Sitemaps for Large Websites

If your site has blogs, courses, tutorials, or categories, separate sitemaps improve crawling.

Use Wildcards Smartly to Clean URL Patterns

Wildcards help block groups of URLs without listing each manually:

Disallow: /*utm_source=
Disallow: /*?sort=

Robots.txt Best Practices for SEO

1. Keep the File Simple and Clean: Avoid overly complex rules. Search engines understand clarity, not confusion.

2. Always Add the Sitemap Link: It speeds up discovery of important URLs and improves indexing efficiency.

3. Never Block Important Pages Accidentally: Check twice before disallowing folders like /wp-content/, /blog/, or /courses/.

4. Test the File After Updating It: Even a small typo can block your entire website from crawling.

5. Use Disallow Only When Necessary: Sometimes canonical tags or noindex tags may be better solutions, depending on the situation.

Upcoming Masterclass

Attend our live classes led by experienced and desiccated instructors of Wscube Tech.

Common Mistakes to Avoid in Robots.txt

1. Blocking the Entire Website Accidentally

A small mistake like this can remove your entire site from Google:

Disallow: /

2. Blocking Essential CSS and JS Files

This harms rendering and can drop search rankings.

3. Using Robots.txt for Security

It only hides paths from good bots — it cannot protect sensitive folders.

4. Incorrect Use of Wildcards

Bad wildcard usage can block many pages unintentionally.

5. Not Adding Sitemap URLs

Without sitemap hints, Google may take longer to discover key pages.

6. Placing Robots.txt in the Wrong Folder

If the file is not in the domain root, bots will not read it.

7. Using Uppercase/Lowercase Incorrectly

Robots.txt is case-sensitive. /Blog/ is different from /blog/.

FAQs About Robots.txt

Is robots.txt mandatory?

No, but it’s strongly recommended for SEO and crawl control.

Does robots.txt block indexing?

It blocks crawling, not indexing. A blocked page can still appear in Google if another site links to it.

Is robots.txt case sensitive?

Yes. /Blog/ and /blog/ are treated differently.

Can robots.txt improve my SEO rankings?

Not directly, but it improves crawl efficiency, which indirectly boosts SEO health.

What is Disallow in robots.txt?

It tells bots which folders or pages they must not crawl.

What is Allow in robots.txt?

It specifies which URLs bots can crawl even inside blocked sections.

What happens if robots.txt is missing?

Search engines assume your entire website is open for crawling.

How big can a robots.txt file be?

Google recommends keeping it under 500 KB.

Can I use comments inside robots.txt?

Yes. Anything starting with # is a comment.

Should I block admin pages?

Yes, you should block backend pages to avoid unnecessary crawling.

Explore Our Free Courses

Semrush Course	GTM Course	Blogging Course
Email Marketing Course	Video Editing Course	Affiliate Marketing Course

Article by

Virendra Soni

Virendra is the Content & SEO Manager at WsCube Tech. He holds 7+ years of experience in blogging, content marketing, SEO, and editing. With B.Tech. in ECE and working for the IT and edtech industry, Virendra holds expertise in turning web pages into traffic magnets. His mantra? Keep it simple, make it memorable, and yes, let Google fall in love with it.

View all posts by Virendra Soni

What is Robots.txt File & How to Create it? Sample & Examples

What is Robots.txt File?

Robots.txt File Sample

How Robots.txt File Works?

1. Search engine bots visit your domain first

2. Bots identify who the rule is for (User-Agent)

3. Bots read what is blocked (Disallow)

4. Bots read what is allowed (Allow)

5. Bots follow hints about site structure (Sitemap)

6. Bots follow rules but are not forced

Recommended Professional Certificates

Digital Marketing Mentorship Program

Advanced AI Marketing Bootcamp

Performance Marketing Bootcamp

SEO Specialist Bootcamp

Importance of Robots txt File

1. Controls What Search Engines Can Access

2. Saves Crawl Budget on Large Websites

3. Prevents Duplicate, Thin, or Irrelevant Pages From Indexing

4. Helps With Site Organization and Clear Crawl Paths

5. Protects Sensitive Sections (Not Security, but Instructions)

Robots.txt Examples

WordPress Robots.txt

E-Commerce Store Robots.txt

Robots.txt Blocking Staging or Development Sites

Robots.txt for a Blog With Categories & Tags

Advanced Robots.txt Example With Wildcards

Robots.txt Example for Large News Website

How to Create Robots.txt File for Website?

Step 1: Decide What You Want Search Engines to Crawl (Planning)

Step 2: Open a Plain Text Editor or SEO Plugin

Step 3: Start With User-Agent and Basic Allow Rule

Step 4: Add Disallow Rules for Unwanted Sections

Step 5: Add Sitemap URLs for Better Crawling

Step 6: Save the File as robots.txt

Step 7: Upload Robots.txt to the Root Folder of Your Domain

Step 8: Check Your Robots.txt File in the Browser

Most Common User Agents in Robots.txt

Where to Upload Robots.txt File?

How to Check Robots.txt of a Website?

1. Type the URL + /robots.txt

2. Use Google Search Operators

3. Use SEO Tools or GSC Tester

Advanced Robots.txt Tips for SEO

Block Parameter-Based URLs to Avoid Duplicate Content

Allow Essential JS, CSS, and AJAX Files

Use “$” to Block Specific File Types

Add Multiple Sitemaps for Large Websites

Use Wildcards Smartly to Clean URL Patterns

Robots.txt Best Practices for SEO

Upcoming Masterclass

Common Mistakes to Avoid in Robots.txt

1. Blocking the Entire Website Accidentally

2. Blocking Essential CSS and JS Files

3. Using Robots.txt for Security

4. Incorrect Use of Wildcards

5. Not Adding Sitemap URLs

6. Placing Robots.txt in the Wrong Folder

7. Using Uppercase/Lowercase Incorrectly

FAQs About Robots.txt

Leave a comment

Leave a Reply Cancel reply

Comments (0)

Related articles

Digital Marketing vs Data Analytics: Which is Better?

Best Instagram Reels Hashtags: Viral & Trending IG Hashtags

Email Etiquette: Rules, Examples & Best Practices

Book Mentor Session

Verify OTP

Recommended Professional
Certificates