What is robots.txt? A simple guide for website owners

If you’ve ever wondered how Google and other search engines know what to crawl (or not crawl) on your website, the answer often lies in a little text file called robots.txt.

This file plays a crucial role in guiding web crawlers and ensuring that your site is indexed correctly. In this post, we’ll break down what robots.txt is, how it works, and why it’s important for your website.

What Is robots.txt?

The robots.txt file is a plain text file placed at the root of your website (e.g. https://josefzacek.com/robots.txt). Its purpose is to give instructions to web crawlers also known as bots about which parts of your site they’re allowed to visit and index.

Think of it as a set of house rules for bots.

Why Is robots.txt Important?

Search engines like Google, Bing, and others send out crawlers to explore your site. The robots.txt file can:

✅ Block bots from crawling private or unnecessary pages

✅ Save server resources by limiting how much bots access

✅ Protect duplicate or staging content from being indexed

❌ But it doesn’t hide your content from the public (it’s not for security)

Basic example of robots.txt

  User-agent: *
  Disallow: /404.html

  Sitemap: https://josefzacek.com/sitemap.xml

Explanation:

User-agent: * - Applies to all bots.

Disallow: /404.html - Don’t crawl the 404 error page.

Sitemap: - Provides the location of your sitemap for better indexing. This is optional but it helps search engines find all your important pages.

Example above tell crawlers to:

crawl everything except the 404 error page
and also provides the location of the sitemap for better indexing

Basic example using disallow directives in robots.txt

  User-agent: *
  Disallow: /private/
  Allow: /private/public-info.html

  Sitemap: https://josefzacek.com/sitemap.xml

Explanation:

User-agent: * - Applies to all bots.

Disallow: /private/ - Don’t crawl the private area.

Allow: /private/public-info.html - Allows crawling of the public-info.html page located in /private/ directory.

Sitemap: - Provides the location of your sitemap for better indexing. This is optional but it helps search engines find all your important pages.

Example above tell crawlers to:

Ignore everything in the /private/ directory
But allow crawling of public-info.html within /private/ directory
It also provides the location of the sitemap for better indexing

If you’re not using any Disallow rules that need exceptions, you don’t need Allow at all.

How Do You Create robots.txt?

1. Open a plain text editor like Notepad or VS Code

2. Write your rules

3. Save it as robots.txt

4. Upload it to your site’s root directory (public_html or /)

Best Practices

Put robots.txt in the root folder only.

Always double-check your syntax—mistakes can block important pages!

Check your file with a robots.txt validator to ensure it’s correct.

Common Mistakes

Blocking important pages by accident (like /blog/)
Assuming it makes content private (it doesn’t)
Not allowing crawling of CSS/JS, which may affect indexing

Final Thoughts

robots.txt might seem technical, but it’s an essential part of controlling how your site appears in search engines. With a few lines of text, you can shape how your content is discovered (or ignored) online.

If you’re running a website - whether it’s a blog, eCommerce site or a portfolio—it’s worth getting familiar with this little file!

I recommend checking out the Google Search Console to help you manage your site’s presence in Google search results.

If you have any questions or need help setting up your robots.txt, feel free to reach out!

Previous
How to implement Scroll to Top button on your website

Next
Static Website vs Content Management System (CMS) Website

What is robots.txt? A simple guide for website owners