Bots are a fact of life on the internet.
Some are helpful—like search engine crawlers.
Others scrape your data, spam your forms, or brute-force your login pages.
If you’re self-hosting with Nginx, you don’t need a pricey SaaS WAF to stop them.
Here's how to detect and destroy malicious bots using good ol’ Nginx, a few scripts, and some zip-bomb flavor.
1. Start with Logs — Always
Nginx logs tell the full story. Make sure you're capturing User-Agent
, IP, and paths.
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent"';
access_log /var/log/nginx/access.log main;
Now dig through logs for patterns:
# Top IPs by request volume
awk '{print $1}' /var/log/nginx/access.log | sort | uniq -c | sort -nr | head
# Suspicious User-Agents
grep -iE 'curl|wget|python|scrapy|bot|crawler|headless' /var/log/nginx/access.log | less
Want real-time views? Try GoAccess for a terminal dashboard.
2. Identify Suspicious Behavior
Things that scream “bot”:
- Blank or obviously fake
User-Agent
headers - High request volume from a single IP
- Frequent hits to
/wp-login.php
,/xmlrpc.php
,/admin
, or random paths - Unusual
Referer
headers or none at all - Crawlers hitting endpoints that no normal user would
Bonus: check your logs against public bot signature lists like MitchellKrogza’s bad bot list.
3. Block the Obvious Stuff with Nginx
Create a quick and dirty User-Agent filter:
map $http_user_agent $bad_bot {
default 0;
~*(curl|wget|python|scrapy|bot|Go-http-client) 1;
}
server {
if ($bad_bot) {
return 403;
}
}
And rate limit abusive IPs:
limit_req_zone $binary_remote_addr zone=abusers:10m rate=5r/s;
server {
location / {
limit_req zone=abusers burst=10 nodelay;
...
}
}
Also check out Nginx rate limiting docs.
4. Use Fail2Ban to Auto-Ban IPs
Install Fail2Ban and wire it to your Nginx logs:
Jail config (/etc/fail2ban/jail.local
):
[nginx-badbots]
enabled = true
filter = nginx-badbots
logpath = /var/log/nginx/access.log
maxretry = 5
findtime = 600
bantime = 3600
Filter (/etc/fail2ban/filter.d/nginx-badbots.conf
):
[Definition]
failregex = ^ -.*"(GET|POST).*HTTP.*"(curl|wget|python|scrapy|bot|Go-http-client)
ignoreregex =
Once this is running, bots get banned automatically after a few hits.
5. Use Better Tools for Smarter Bots
If you're seeing more sophisticated attacks, try:
- CrowdSec: Open-source tool that shares a dynamic IP reputation list and applies bans
- ModSecurity: Full WAF, works with Nginx
- OpenResty: Extend Nginx with Lua scripting (e.g., custom captcha, behavior analysis)
If you’re open to a proxy layer:
- Cloudflare free tier: Blocks a lot of trash automatically
- Fastly Bot Protection: Advanced but paid
Bonus Serve Zip Bombs to Dumb Bots (⚠️ Handle with care)
This blog post by Idiallo shows how he turned bot detection into punishment.
The method? Serve them a compressed zip bomb.
To generate one:
dd if=/dev/zero bs=1G count=10 | gzip -c > 10GB.gz
This creates a ~10MB file that decompresses to 10GB of zeros.
If a bot tries to read it without knowing, it chokes.
Then detect and serve it:
if (ipIsBlackListed() || isMalicious()) {
header("Content-Encoding: deflate, gzip");
header("Content-Length: " . filesize(ZIP_BOMB_FILE_10G));
readfile(ZIP_BOMB_FILE_10G);
exit;
}
He explains that when traffic spikes, he swaps in a 1MB variant.
It’s a great deterrent for low-effort bots.
Heuristics like repeated scanning and double-visits from spam IPs helped him fine-tune this method.
📎 Also check out this Hacker News discussion for community input on his approach.
Final Thoughts
You don’t need an enterprise WAF to defend your site.
With a bit of log inspection, some config hacks, and creative trolling like zip bombs, you can knock out the majority of disruptive bots.
I’ve been actively working on a super-convenient tool called LiveAPI.
LiveAPI helps you get all your backend APIs documented in a few minutes
With LiveAPI, you can quickly generate interactive API documentation that allows users to execute APIs directly from the browser.
If you’re tired of manually creating docs for your APIs, this tool might just make your life easier.