Is your website struggling with spam comments, content scrapers, bandwidth leeches, and other unwanted bots? These bad bots can consume valuable hosting resources and negatively impact your site’s performance.
In this guide, we’ll show you how to block bad bots with minimal effort using .htaccess. Let’s get started!
Automatic Bot Blocking for ChemiCloud Customers
If you’re a ChemiCloud customer, you’re already protected! We have custom security rules that automatically block known resource-draining bots, including:
- PetalBot
- MJ12bot
- DotBot
- SeznamBot
- 8LEGS
- Nimbostratus-Bot
- Semrush
- Ahrefs
- AspiegelBot
- AhrefsBot
- MauiBot
- BLEXBot
- Sogou
If you actively use services like Ahrefs and need access, our support team can disable the relevant rule for your account. Just reach out—we’re happy to assist!
Identifying Bad Bots
Before blocking bots, it’s important to identify them. You can do this by analyzing your website’s log files. While interpreting logs takes some practice, you can also use log-parsing software to simplify the process.
A quick Google search can provide tools to help analyze logs, or you can use Excel for manual filtering based on patterns in requests. Once you identify the problematic bots, you can block them using different methods:
- Blocking via Request URI
- Blocking via User-Agent
- Blocking via Referrer
- Blocking via IP Address
Before applying these methods, make sure to research the bot in question. A simple search can reveal whether it’s harmful or useful.
Blocking Bad Bots with .htaccess
Blocking via Request URI
If your logs show suspicious query patterns, such as:
https://www.example.com/asdf-crawl/request/?scanx=123
https://wwww.example2.net/sflkjfglkj-crawl/request/?scanx123445
These requests likely have different user agents, IPs, and referrers. The best approach is to block requests based on recurring patterns. Common elements in the above examples include:
crawl
scanx
To block such requests, add this to your .htaccess file:
# Block via Request URI
<IfModule mod_alias.c>
RedirectMatch 403 /crawl/
</IfModule>
To block multiple patterns, use:
# Block via Request URI
<IfModule mod_alias.c>
RedirectMatch 403 /(crawl|scanx)/
</IfModule>
If the pattern appears in the query string (after the ?
symbol), use mod_rewrite instead:
# Block via Query String
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{QUERY_STRING} (crawl|scanx) [NC]
RewriteRule (.*) - [F,L]
</IfModule>
Always test your site after applying these changes!
Blocking via User-Agent
If a bot repeatedly accesses your site under a specific user agent, block it with:
# Block via User-Agent
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (EvilBotHere|SpamSpewer|SecretAgentAgent) [NC]
RewriteRule (.*) - [F,L]
</IfModule>
To add more bots, use a pipe (|
) separator:
RewriteCond %{HTTP_USER_AGENT} (EvilBotHere|SpamSpewer|AnotherOne|YetAnother) [NC]
To test, use online tools like “Bots vs Browsers.”
Blocking via Referrer
If spammers or scrapers access your site through certain referrers, block them with:
# Block via Referrer
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_REFERER} ^http://(.*)spamreferrer1\.org [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(.*)bandwidthleech\.com [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(.*)contentthieves\.ru [NC]
RewriteRule (.*) - [F,L]
</IfModule>
The last RewriteCond
should not include [OR]
to properly terminate the condition.
Blocking via IP Address
Blocking by IP is useful in specific cases, though many bots use rotating IPs. To block a single IP:
# Block via IP Address
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{REMOTE_ADDR} ^123\.456\.789\.000$
RewriteRule (.*) - [F,L]
</IfModule>
To block multiple IPs:
# Block multiple IPs
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{REMOTE_ADDR} ^123\.456\.789\.000$ [OR]
RewriteCond %{REMOTE_ADDR} ^222\.333\.444\.555$ [OR]
RewriteCond %{REMOTE_ADDR} ^111\.222\.333\.444$
RewriteRule (.*) - [F,L]
</IfModule>
For blocking a range of IPs:
# Block a range of IPs
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{REMOTE_ADDR} ^123\. [OR]
RewriteCond %{REMOTE_ADDR} ^111\.222\. [OR]
RewriteCond %{REMOTE_ADDR} ^444\.555\.777\.
RewriteRule (.*) - [F,L]
</IfModule>
This example blocks:
- All IPs starting with
123.
- All IPs starting with
111.222.
- All IPs starting with
444.555.777.
Final Thoughts
Blocking bad bots helps protect your website’s resources, improve performance, and prevent unwanted activity. While ChemiCloud provides automated protection, you can fine-tune bot blocking based on your specific needs using the methods above.
If you have any questions or need assistance, our support team is here to help!