{"id":6491,"date":"2021-08-05T10:38:51","date_gmt":"2021-08-05T10:38:51","guid":{"rendered":"https:\/\/chemicloud.com\/kb\/?post_type=ht_kb&#038;p=6491"},"modified":"2025-03-02T10:44:33","modified_gmt":"2025-03-02T10:44:33","slug":"block-bad-bots-and-spiders-using-htaccess","status":"publish","type":"ht_kb","link":"https:\/\/chemicloud.com\/kb\/article\/block-bad-bots-and-spiders-using-htaccess\/","title":{"rendered":"How to Block Bad Bots and Spiders using .htaccess"},"content":{"rendered":"<p data-pm-slice=\"1 1 []\">Is your website struggling with spam comments, content scrapers, bandwidth leeches, and other unwanted bots? These bad bots can consume valuable hosting resources and negatively impact your site&#8217;s performance.<\/p>\n<p>In this guide, we\u2019ll show you how to block bad bots with minimal effort using .htaccess. Let\u2019s get started!<\/p>\n<h2 id=\"automatic-bot-blocking-for-chemicloud-customers\"><strong>Automatic Bot Blocking for ChemiCloud Customers<\/strong><\/h2>\n<p>If you\u2019re a ChemiCloud customer, you\u2019re already protected! We have custom security rules that automatically block known resource-draining bots, including:<\/p>\n<ul data-spread=\"false\">\n<li>PetalBot<\/li>\n<li>MJ12bot<\/li>\n<li>DotBot<\/li>\n<li>SeznamBot<\/li>\n<li>8LEGS<\/li>\n<li>Nimbostratus-Bot<\/li>\n<li>Semrush<\/li>\n<li>Ahrefs<\/li>\n<li>AspiegelBot<\/li>\n<li>AhrefsBot<\/li>\n<li>MauiBot<\/li>\n<li>BLEXBot<\/li>\n<li>Sogou<\/li>\n<\/ul>\n<p>If you actively use services like Ahrefs and need access, our support team can disable the relevant rule for your account. Just reach out\u2014we&#8217;re happy to assist!<\/p>\n<h2 id=\"identifying-bad-bots\"><strong>Identifying Bad Bots<\/strong><\/h2>\n<p>Before blocking bots, it&#8217;s important to identify them. You can do this by analyzing your website&#8217;s log files. While interpreting logs takes some practice, you can also use log-parsing software to simplify the process.<\/p>\n<p>A quick Google search can provide tools to help analyze logs, or you can use Excel for manual filtering based on patterns in requests. Once you identify the problematic bots, you can block them using different methods:<\/p>\n<ul data-spread=\"false\">\n<li><strong>Blocking via Request URI<\/strong><\/li>\n<li><strong>Blocking via User-Agent<\/strong><\/li>\n<li><strong>Blocking via Referrer<\/strong><\/li>\n<li><strong>Blocking via IP Address<\/strong><\/li>\n<\/ul>\n<p>Before applying these methods, make sure to research the bot in question. A simple search can reveal whether it&#8217;s harmful or useful.<\/p>\n<div>\n<hr \/>\n<\/div>\n<h2 id=\"blocking-bad-bots-with-htaccess\"><strong>Blocking Bad Bots with .htaccess<\/strong><\/h2>\n<h3 id=\"blocking-via-request-uri\"><strong>Blocking via Request URI<\/strong><\/h3>\n<p>If your logs show suspicious query patterns, such as:<\/p>\n<pre><code>https:\/\/www.example.com\/asdf-crawl\/request\/?scanx=123\r\nhttps:\/\/wwww.example2.net\/sflkjfglkj-crawl\/request\/?scanx123445<\/code><\/pre>\n<p>These requests likely have different user agents, IPs, and referrers. The best approach is to block requests based on recurring patterns. Common elements in the above examples include:<\/p>\n<ul data-spread=\"false\">\n<li><code>crawl<\/code><\/li>\n<li><code>scanx<\/code><\/li>\n<\/ul>\n<p>To block such requests, add this to your .htaccess file:<\/p>\n<pre><code># Block via Request URI\r\n&lt;IfModule mod_alias.c&gt;\r\n    RedirectMatch 403 \/crawl\/\r\n&lt;\/IfModule&gt;<\/code><\/pre>\n<p>To block multiple patterns, use:<\/p>\n<pre><code># Block via Request URI\r\n&lt;IfModule mod_alias.c&gt;\r\n    RedirectMatch 403 \/(crawl|scanx)\/\r\n&lt;\/IfModule&gt;<\/code><\/pre>\n<p>If the pattern appears in the query string (after the <code>?<\/code> symbol), use mod_rewrite instead:<\/p>\n<pre><code># Block via Query String\r\n&lt;IfModule mod_rewrite.c&gt;\r\n    RewriteEngine On\r\n    RewriteCond %{QUERY_STRING} (crawl|scanx) [NC]\r\n    RewriteRule (.*) - [F,L]\r\n&lt;\/IfModule&gt;<\/code><\/pre>\n<p>Always test your site after applying these changes!<\/p>\n<div>\n<hr \/>\n<\/div>\n<h3 id=\"blocking-via-user-agent\"><strong>Blocking via User-Agent<\/strong><\/h3>\n<p>If a bot repeatedly accesses your site under a specific user agent, block it with:<\/p>\n<pre><code># Block via User-Agent\r\n&lt;IfModule mod_rewrite.c&gt;\r\n    RewriteEngine On\r\n    RewriteCond %{HTTP_USER_AGENT} (EvilBotHere|SpamSpewer|SecretAgentAgent) [NC]\r\n    RewriteRule (.*) - [F,L]\r\n&lt;\/IfModule&gt;<\/code><\/pre>\n<p>To add more bots, use a pipe (<code>|<\/code>) separator:<\/p>\n<pre><code>RewriteCond %{HTTP_USER_AGENT} (EvilBotHere|SpamSpewer|AnotherOne|YetAnother) [NC]<\/code><\/pre>\n<p>To test, use online tools like &#8220;Bots vs Browsers.&#8221;<\/p>\n<div>\n<hr \/>\n<\/div>\n<h3 id=\"blocking-via-referrer\"><strong>Blocking via Referrer<\/strong><\/h3>\n<p>If spammers or scrapers access your site through certain referrers, block them with:<\/p>\n<pre><code># Block via Referrer\r\n&lt;IfModule mod_rewrite.c&gt;\r\n    RewriteEngine On\r\n    RewriteCond %{HTTP_REFERER} ^http:\/\/(.*)spamreferrer1\\.org [NC,OR]\r\n    RewriteCond %{HTTP_REFERER} ^http:\/\/(.*)bandwidthleech\\.com [NC,OR]\r\n    RewriteCond %{HTTP_REFERER} ^http:\/\/(.*)contentthieves\\.ru [NC]\r\n    RewriteRule (.*) - [F,L]\r\n&lt;\/IfModule&gt;<\/code><\/pre>\n<p>The last <code>RewriteCond<\/code> should <strong>not<\/strong> include <code>[OR]<\/code> to properly terminate the condition.<\/p>\n<div>\n<hr \/>\n<\/div>\n<h3 id=\"blocking-via-ip-address\"><strong>Blocking via IP Address<\/strong><\/h3>\n<p>Blocking by IP is useful in specific cases, though many bots use rotating IPs. To block a single IP:<\/p>\n<pre><code># Block via IP Address\r\n&lt;IfModule mod_rewrite.c&gt;\r\n    RewriteEngine On\r\n    RewriteCond %{REMOTE_ADDR} ^123\\.456\\.789\\.000$\r\n    RewriteRule (.*) - [F,L]\r\n&lt;\/IfModule&gt;<\/code><\/pre>\n<p>To block multiple IPs:<\/p>\n<pre><code># Block multiple IPs\r\n&lt;IfModule mod_rewrite.c&gt;\r\n    RewriteEngine On\r\n    RewriteCond %{REMOTE_ADDR} ^123\\.456\\.789\\.000$ [OR]\r\n    RewriteCond %{REMOTE_ADDR} ^222\\.333\\.444\\.555$ [OR]\r\n    RewriteCond %{REMOTE_ADDR} ^111\\.222\\.333\\.444$\r\n    RewriteRule (.*) - [F,L]\r\n&lt;\/IfModule&gt;<\/code><\/pre>\n<p>For blocking a range of IPs:<\/p>\n<pre><code># Block a range of IPs\r\n&lt;IfModule mod_rewrite.c&gt;\r\n    RewriteEngine On\r\n    RewriteCond %{REMOTE_ADDR} ^123\\. [OR]\r\n    RewriteCond %{REMOTE_ADDR} ^111\\.222\\. [OR]\r\n    RewriteCond %{REMOTE_ADDR} ^444\\.555\\.777\\.\r\n    RewriteRule (.*) - [F,L]\r\n&lt;\/IfModule&gt;<\/code><\/pre>\n<p>This example blocks:<\/p>\n<ul data-spread=\"false\">\n<li>All IPs starting with <code>123.<\/code><\/li>\n<li>All IPs starting with <code>111.222.<\/code><\/li>\n<li>All IPs starting with <code>444.555.777.<\/code><\/li>\n<\/ul>\n<h3 id=\"final-thoughts\"><strong>Final Thoughts<\/strong><\/h3>\n<p>Blocking bad bots helps protect your website\u2019s resources, improve performance, and prevent unwanted activity. While ChemiCloud provides automated protection, you can fine-tune bot blocking based on your specific needs using the methods above.<\/p>\n<p>If you have any questions or need assistance, our support team is here to help!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Is your website struggling with spam comments, content scrapers, bandwidth leeches, and other unwanted bots? These bad bots can consume valuable hosting resources and negatively impact your site&#8217;s performance. In this guide, we\u2019ll show you how to block bad bots with minimal effort using .htaccess. Let\u2019s get started! Automatic Bot&#8230;<\/p>\n","protected":false},"author":10,"featured_media":0,"comment_status":"open","ping_status":"closed","template":"","format":"standard","meta":{"_crdt_document":"","footnotes":""},"ht-kb-category":[192],"ht-kb-tag":[],"class_list":["post-6491","ht_kb","type-ht_kb","status-publish","format-standard","hentry","ht_kb_category-website-security"],"_links":{"self":[{"href":"https:\/\/chemicloud.com\/kb\/wp-json\/wp\/v2\/ht-kb\/6491","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/chemicloud.com\/kb\/wp-json\/wp\/v2\/ht-kb"}],"about":[{"href":"https:\/\/chemicloud.com\/kb\/wp-json\/wp\/v2\/types\/ht_kb"}],"author":[{"embeddable":true,"href":"https:\/\/chemicloud.com\/kb\/wp-json\/wp\/v2\/users\/10"}],"replies":[{"embeddable":true,"href":"https:\/\/chemicloud.com\/kb\/wp-json\/wp\/v2\/comments?post=6491"}],"version-history":[{"count":5,"href":"https:\/\/chemicloud.com\/kb\/wp-json\/wp\/v2\/ht-kb\/6491\/revisions"}],"predecessor-version":[{"id":8616,"href":"https:\/\/chemicloud.com\/kb\/wp-json\/wp\/v2\/ht-kb\/6491\/revisions\/8616"}],"wp:attachment":[{"href":"https:\/\/chemicloud.com\/kb\/wp-json\/wp\/v2\/media?parent=6491"}],"wp:term":[{"taxonomy":"ht_kb_category","embeddable":true,"href":"https:\/\/chemicloud.com\/kb\/wp-json\/wp\/v2\/ht-kb-category?post=6491"},{"taxonomy":"ht_kb_tag","embeddable":true,"href":"https:\/\/chemicloud.com\/kb\/wp-json\/wp\/v2\/ht-kb-tag?post=6491"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}