1. Home
  2. Hosting Management
  3. Website Security
  4. How to Block Bad Bots and Spiders using .htaccess

How to Block Bad Bots and Spiders using .htaccess

Is your website struggling with spam comments, content scrapers, bandwidth leeches, and other unwanted bots? These bad bots can consume valuable hosting resources and negatively impact your site’s performance.

In this guide, we’ll show you how to block bad bots with minimal effort using .htaccess. Let’s get started!

Automatic Bot Blocking for ChemiCloud Customers

If you’re a ChemiCloud customer, you’re already protected! We have custom security rules that automatically block known resource-draining bots, including:

  • PetalBot
  • MJ12bot
  • DotBot
  • SeznamBot
  • 8LEGS
  • Nimbostratus-Bot
  • Semrush
  • Ahrefs
  • AspiegelBot
  • AhrefsBot
  • MauiBot
  • BLEXBot
  • Sogou

If you actively use services like Ahrefs and need access, our support team can disable the relevant rule for your account. Just reach out—we’re happy to assist!

Identifying Bad Bots

Before blocking bots, it’s important to identify them. You can do this by analyzing your website’s log files. While interpreting logs takes some practice, you can also use log-parsing software to simplify the process.

A quick Google search can provide tools to help analyze logs, or you can use Excel for manual filtering based on patterns in requests. Once you identify the problematic bots, you can block them using different methods:

  • Blocking via Request URI
  • Blocking via User-Agent
  • Blocking via Referrer
  • Blocking via IP Address

Before applying these methods, make sure to research the bot in question. A simple search can reveal whether it’s harmful or useful.


Blocking Bad Bots with .htaccess

Blocking via Request URI

If your logs show suspicious query patterns, such as:

https://www.example.com/asdf-crawl/request/?scanx=123
https://wwww.example2.net/sflkjfglkj-crawl/request/?scanx123445

These requests likely have different user agents, IPs, and referrers. The best approach is to block requests based on recurring patterns. Common elements in the above examples include:

  • crawl
  • scanx

To block such requests, add this to your .htaccess file:

# Block via Request URI
<IfModule mod_alias.c>
    RedirectMatch 403 /crawl/
</IfModule>

To block multiple patterns, use:

# Block via Request URI
<IfModule mod_alias.c>
    RedirectMatch 403 /(crawl|scanx)/
</IfModule>

If the pattern appears in the query string (after the ? symbol), use mod_rewrite instead:

# Block via Query String
<IfModule mod_rewrite.c>
    RewriteEngine On
    RewriteCond %{QUERY_STRING} (crawl|scanx) [NC]
    RewriteRule (.*) - [F,L]
</IfModule>

Always test your site after applying these changes!


Blocking via User-Agent

If a bot repeatedly accesses your site under a specific user agent, block it with:

# Block via User-Agent
<IfModule mod_rewrite.c>
    RewriteEngine On
    RewriteCond %{HTTP_USER_AGENT} (EvilBotHere|SpamSpewer|SecretAgentAgent) [NC]
    RewriteRule (.*) - [F,L]
</IfModule>

To add more bots, use a pipe (|) separator:

RewriteCond %{HTTP_USER_AGENT} (EvilBotHere|SpamSpewer|AnotherOne|YetAnother) [NC]

To test, use online tools like “Bots vs Browsers.”


Blocking via Referrer

If spammers or scrapers access your site through certain referrers, block them with:

# Block via Referrer
<IfModule mod_rewrite.c>
    RewriteEngine On
    RewriteCond %{HTTP_REFERER} ^http://(.*)spamreferrer1\.org [NC,OR]
    RewriteCond %{HTTP_REFERER} ^http://(.*)bandwidthleech\.com [NC,OR]
    RewriteCond %{HTTP_REFERER} ^http://(.*)contentthieves\.ru [NC]
    RewriteRule (.*) - [F,L]
</IfModule>

The last RewriteCond should not include [OR] to properly terminate the condition.


Blocking via IP Address

Blocking by IP is useful in specific cases, though many bots use rotating IPs. To block a single IP:

# Block via IP Address
<IfModule mod_rewrite.c>
    RewriteEngine On
    RewriteCond %{REMOTE_ADDR} ^123\.456\.789\.000$
    RewriteRule (.*) - [F,L]
</IfModule>

To block multiple IPs:

# Block multiple IPs
<IfModule mod_rewrite.c>
    RewriteEngine On
    RewriteCond %{REMOTE_ADDR} ^123\.456\.789\.000$ [OR]
    RewriteCond %{REMOTE_ADDR} ^222\.333\.444\.555$ [OR]
    RewriteCond %{REMOTE_ADDR} ^111\.222\.333\.444$
    RewriteRule (.*) - [F,L]
</IfModule>

For blocking a range of IPs:

# Block a range of IPs
<IfModule mod_rewrite.c>
    RewriteEngine On
    RewriteCond %{REMOTE_ADDR} ^123\. [OR]
    RewriteCond %{REMOTE_ADDR} ^111\.222\. [OR]
    RewriteCond %{REMOTE_ADDR} ^444\.555\.777\.
    RewriteRule (.*) - [F,L]
</IfModule>

This example blocks:

  • All IPs starting with 123.
  • All IPs starting with 111.222.
  • All IPs starting with 444.555.777.

Final Thoughts

Blocking bad bots helps protect your website’s resources, improve performance, and prevent unwanted activity. While ChemiCloud provides automated protection, you can fine-tune bot blocking based on your specific needs using the methods above.

If you have any questions or need assistance, our support team is here to help!

Updated on March 2, 2025
Was this article helpful?

Related Articles

Winter Savings
Up to 78% Off Hosting + Free Migration!
👉 Save Now

Leave a Comment