Thanks for your time explaining. I have multiple public facing services and I never had any issues with load just because of some crawlers. That’s why I always wonder why people get so mad at them
I’m providing hosting for a few FOSS services, relatively small scale, for around 7 years now and always thought the same for most of that time. People were complaining about their servers being hit but my traffic was alright and the server seemed bulky enough to have a lot of buffer.
Then, like a month or two ago, the fire nation attacked the bots came crawling. I had sudden traffic spikes of up to 1000x, memory was hogged and the CPU could barely keep up. The worst was the git forge, public repos with bots just continuously hammering away at diffs between random commits, repeatedly building out history graphs for different branches and so on - all fairly intense operations.
After the server went to its knees multiple times over a couple days I had to block public access. Only with proof of work in front could I finally open it again without destroying service uptime. And even weeks later, they were still trying to get at different project diffs whose links they collected earlier, it was honestly crazy.
That’s very interesting, as if only certain types of content get crawled. May I know what kind of software you used and if you had a reverse proxy in front of it?
Thanks for your time explaining. I have multiple public facing services and I never had any issues with load just because of some crawlers. That’s why I always wonder why people get so mad at them
I’m providing hosting for a few FOSS services, relatively small scale, for around 7 years now and always thought the same for most of that time. People were complaining about their servers being hit but my traffic was alright and the server seemed bulky enough to have a lot of buffer.
Then, like a month or two ago,
the fire nation attackedthe bots came crawling. I had sudden traffic spikes of up to 1000x, memory was hogged and the CPU could barely keep up. The worst was the git forge, public repos with bots just continuously hammering away at diffs between random commits, repeatedly building out history graphs for different branches and so on - all fairly intense operations.After the server went to its knees multiple times over a couple days I had to block public access. Only with proof of work in front could I finally open it again without destroying service uptime. And even weeks later, they were still trying to get at different project diffs whose links they collected earlier, it was honestly crazy.
That’s very interesting, as if only certain types of content get crawled. May I know what kind of software you used and if you had a reverse proxy in front of it?