For many, many years now, AWS has been the top source of abusive traffic I’ve actually received. That’s not to say that they’re the top source of abusive traffic sent over the Internet. Rather, they have so many legitimate users that it gets through spam filters and blocklists. From outright spammers using SES, to Anthropic’s crawlers which sent my little site over 1 million requests in half a month (noticeably slowing my SSH connection, thanks DigitalOcean), to all the spammers and scammers out there using EC2 and S3, it’s clear that they don’t care who the money’s coming from so long as there’s lots of it.
So… I blocked it. All of AWS’s published IP addresses. From all of my servers.
(For the curious, yes, I’ve also added a robots.txt disallowing Anthropic, but the block is far more effective against all sorts of abusive traffic.)
Anthropic
Anthropic sent this little site you’re on over 1 million requests in half a month (most of it in a few days). This actually noticeably slowed my SSH connection at times as well as causing issues with GitWeb which caused it to crash pretty frequently. It would get restarted just fine but in the meantime some other requests would get an error. I noticed these errors myself, which led to my investigating the cause.
Here’s Anthropic’s requests/day over the past month:
28 01/Sep/2024
209 02/Sep/2024
484 03/Sep/2024
372 04/Sep/2024
46565 05/Sep/2024
441561 06/Sep/2024
134521 07/Sep/2024
23512 08/Sep/2024
16296 09/Sep/2024
4475 10/Sep/2024
3036 11/Sep/2024
751 12/Sep/2024
22 14/Sep/2024
14663 15/Sep/2024
215212 16/Sep/2024
11207 17/Sep/2024
74845 18/Sep/2024
118517 19/Sep/2024
7 20/Sep/2024
That’s 1,106,283 requests in only 20 days. For comparison, I’ve gotten only 1,304,410 other requests in the entire month of September, from all other sources: other crawlers, users, and even my own monitoring and scripts. And this isn’t the first month they’ve done it; they sent me 2 million requests back in July!
Does Amazon care? No. No, they’re making far too much money from Anthropic’s race to burn both the most cash and the most dinosaurs possible.
And Amazon is doing the same thing.
Here are my top 4 user-agents this month:
1106282 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)
369029 (Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot)
122725 (compatible; Bytespider; spider-feedback@bytedance.com)
111430 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/)
And a bunch of the rest are suspiciously old versions of browsers that probably aren’t real.
The script
AWS helpfully publishes (a subset of) their IP addresses, in JSON, so you can download that and add it to your firewall. I went a bit further, and made a script that can be run from cron (or a systemd timer) to fetch it, merge contiguous/overlapping CIDR ranges, and add the new blocklist to iptables, replacing the old blocklist seamlessly.
I’ve been running this for a while, so it should be fairly problem free, but I don’t recommend trying this on a system that you don’t have alternative access to.
You can find the script in my Git.
Feel free to yell at me about how it’s wrong. I might even fix it.
Should you block them?
Probably not.
I didn’t block it from my house, or my desktop; nor did I block it from anything at work. Like I said before, there’s too many legitimate users that you’re throwing out with it.
But frankly, for my servers? If you’re using AWS, you don’t need access to them. Use something else. Anything is better, even DigitalOcean.
Update 2024-10-11: The .mp TLD uses AWS for their nameservers. Disgusting… Anyway so if you run your own resolver and live in Northern Mariana Islands, or use mailchi.mp, don’t block them.
Update 2025-03-18: They’ve moved! So far in the month of March I’ve received 2465481 requests from Alibaba Cloud claiming to be “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36 Edg/114.0.1823.43” (from mid-2023). Fortunately, these are all (currently?) from a single IP range: 47.74.0.0 - 47.87.255.255.