Didn’t have the link to hand. But a search turned this one up: https://reggiodigital.com/blog/nginx-rule-blocking-bad-bots/ it looks to be the same list, and you can see the ones I’ve added to the end of that list.
I’m the administrator of kbin.life, a general purpose/tech orientated kbin instance.
Didn’t have the link to hand. But a search turned this one up: https://reggiodigital.com/blog/nginx-rule-blocking-bad-bots/ it looks to be the same list, and you can see the ones I’ve added to the end of that list.
Hmm, I took an original list and added to it. You got a website I can check? If so I’ll happily remove. I don’t mind slow web crawlers at all.
So on my mbin instance, it’s on cloudflare. So I filter the AS numbers there. Don’t even reach my server.
On the sites that aren’t behind cloudflare. Yep it’s on the nginx level. I did consider firewall level. Maybe just make a specific chain for it. But since I was blocking at the nginx level I just did it there for now. I mean it keeps them off the content, but yes it does tell them there’s a website there to leech if they change their tactics for example.
You need to block the whole ASN too. Those that are using chrome/firefox UAs change IP every 5 minutes from a random other one in their huuuuuge pools.
Yeah, I probably should look to see if there’s any good plugins that do this on some community submission basis. Because yes, it’s a pain to keep up with whatever trick they’re doing next.
And unlike web crawlers that generally check a url here and there, AI bots absolutely rip through your sites like something rabid.
If you’re running nginx I am using the following:
if ($http_user_agent ~* "SemrushBot|Semrush|AhrefsBot|MJ12bot|YandexBot|YandexImages|MegaIndex.ru|BLEXbot|BLEXBot|ZoominfoBot|YaK|VelenPublicWebCrawler|SentiBot|Vagabondo|SEOkicks|SEOkicks-Robot|mtbot/1.1.0i|SeznamBot|DotBot|Cliqzbot|coccocbot|python|Scrap|SiteCheck-sitecrawl|MauiBot|Java|GumGum|Clickagy|AspiegelBot|Yandex|TkBot|CCBot|Qwantify|MBCrawler|serpstatbot|AwarioSmartBot|Semantici|ScholarBot|proximic|GrapeshotCrawler|IAScrawler|linkdexbot|contxbot|PlurkBot|PaperLiBot|BomboraBot|Leikibot|weborama-fetcher|NTENTbot|Screaming Frog SEO Spider|admantx-usaspb|Eyeotabot|VoluumDSP-content-bot|SirdataBot|adbeat_bot|TTD-Content|admantx|Nimbostratus-Bot|Mail.RU_Bot|Quantcastboti|Onespot-ScraperBot|Taboolabot|Baidu|Jobboerse|VoilaBot|Sogou|Jyxobot|Exabot|ZGrab|Proximi|Sosospider|Accoona|aiHitBot|Genieo|BecomeBot|ConveraCrawler|NerdyBot|OutclicksBot|findlinks|JikeSpider|Gigabot|CatchBot|Huaweisymantecspider|Offline Explorer|SiteSnagger|TeleportPro|WebCopier|WebReaper|WebStripper|WebZIP|Xaldon_WebSpider|BackDoorBot|AITCSRoboti|Arachnophilia|BackRub|BlowFishi|perl|CherryPicker|CyberSpyder|EmailCollector|Foobot|GetURL|httplib|HTTrack|LinkScan|Openbot|Snooper|SuperBot|URLSpiderPro|MAZBot|EchoboxBot|SerendeputyBot|LivelapBot|linkfluence.com|TweetmemeBot|LinkisBot|CrowdTanglebot|ClaudeBot|Bytespider|ImagesiftBot|Barkrowler|DataForSeoBo|Amazonbot|facebookexternalhit|meta-externalagent|FriendlyCrawler|GoogleOther|PetalBot|Applebot") { return 403; }
That will block those that actually use recognisable user agents. I add any I find as I go on. It will catch a lot!
I also have a huuuuuge IP based block list (generated by adding all ranges returned from looking up the following AS numbers):
AS45102 (Alibaba cloud) AS136907 (Huawei SG) AS132203 (Tencent) AS32934 (Facebook)
Since these guys run or have run bots that impersonate real browser agents.
There are various tools online to return prefix/ip lists for an autonomous system number.
I put both into a single file and include it into my web site config files.
EDIT: Just to add, keeping on top of this is a full time job! EDIT 2: Removed Mojeek bot as it seems to be a normal web crawler.
The sun always shines on pc.
Well for a gamer no real comment. But there is one metric Intel still trashes AMD in for the APU. Hardware video acceleration/encoding. The quality is objectively better on Intel Quicksync.
When getting a home box that also needed to do transcoding, Intel CPU was a requirement. My desktop development/gaming system? Ryzen + NVidia.
I’m on NVidia with blob driver, KDE Plasma on wayland on Arch. Yeah, standby to resume is like 50/50 the screen will come back. I just turned off stand-by and kept screen sleep only.
But I’m on desktop so less of a problem for me than it would be for a laptop user.
I did a routine upgrade on my mbin server, where I had an old version with changes I made myself.
Well turns out I upgraded something (probably redis) that broke symfony that broke everything.
So I had a fun afternoon upgrading to the latest mbin version. I mean I needed to anyway but my hand was forced.
Yep sometimes an innocent looking update will change your weekend plans.
Anyways, any reason not to use ssh?
This one threw me off. I’d muted discord by mistake. Weirdly voice still works. I spent ages checking and double checking settings to see why I wasn’t getting notification sounds and the ptt sound. Dismissing any mute possibility because voice was working.
When I found it was this…
These days with UEFI it’s much less likely to break things. Worse case though you just boot from a LIVE USB boot, chroot in and rerun grub/your bootloader installer. Often even if windows puts its own bootloader first, you can choose your bootloader from the bios boot menu and just rerun the bootloader installer.
It used to be a lot worse.
I said elsewhere, I hope this is just some way to track changes over time per user.
But they need to take an anonymous hash of some non changing data or create an install id that is used for this and nothing else (e.g it identifies a unique user but not the person or hardware behind the user).
Too much identifying info is just pushed around like we shouldn’t care, it’s become a real problem.
The way I read it, the developer wanted opt-out but it’s likely it will be opt-in. I’m find with opt-in and vehemently against opt-out for telemetry.
I would prefer the information was statistical only. Rather than hostname (making the assumption they only want hostname to be able to somehow separate the data to follow changes over time), a much better idea would be some kind of hash based on information unlikely to change, but enough information that it would be unlikely possible to brute-force the original data out of the hash. So all they know is, this data came from the same machine, but cannot ID the machine. Maybe some kind of unique but otherwise untrackable unique ID is created at install time and ONLY used for this purpose and no other.
Yeah, my only concern here was if it was opt-out. That’d be bad.
Now I completely understand the developer on this. This is useful info to have to help decide future changes/features and general direction, but balancing the right to privacy means this kind of data provision should ALWAYS be opt-in. Microsoft, you hearing me here?
I think it had its uses in the past, specifically if it had the memory backup to prevent full array rebuilds and cached data loss on power failure.
Also at the height of raid controller use (I would say 90s and 2000s) there probably was some compute savings by shifting the work to a dedicated controller.
In modern day, completely agree.
I’m sure I’ve seen paid software that will detect and read data from several popular hardware controllers. Maybe there’s something free that can do the same.
For the future, I’d say that with modern copy on write filesystems, so long as you don’t mind the long rebuild on power failures, software raid is fine for most people.
I found this, which seems to be someone trying to do something similar with a drive array built with an Intel raid controller
Note, they are using drive images, you should be too.
The OP made clear it was a controller failure or entire system (I read hardware here) failure. Which does complicate things somewhat.
I would very much agree here. I’ve (admittedly mostly server side) been using linux for around 30 years now. But I’m still dual booting on my desktop. There’s just a few things that will still only work in Linux, and also if I break things I can go to windows if I need to do something “right now”
Dual boot gives you the option of, if you have the time trying to make something work in linux. But, if you don’t have the time, just boot to windows and do it.
How I do things, is I have drives that are shared between both OS (I use btrfs since there is a windows driver and, so far (around 3 years) I’ve had no corruption problems. But you can share ntfs too and a boot drive for both. But, it’s not a requirement.
Also yes, it is quite easy to break a linux install. It’s not really because Linux is bad. It’s just because you have so much choice in which drivers to use, which desktop environment (and even the components that make it up) that it’s easy to accidentally select some combination that doesn’t work and you end up with only a console to fix things from.
I like that the OP is choosing Mint. I’ve not used Mint, but from all I’ve seen it looks a real good option for someone starting into Linux from no experience.
/mnt/shared/Development or E:\Development depending on which operating system is running.
Not in home mainly because I use the same directory in windows and Linux.
I feel like the only even remotely acceptable way to do this is to show the ad, prompt for the answer for 10 seconds. They can log the right/wrong answer or if the time expires the lack of one and must move on.
I can imagine metrics knowing if your advertising is actually reaching people is valid. But to make people answer and especially make them watch more if they answer wrong is about as dystopian as it gets.
If (and I say if, I really don’t want to believe it is) that is the case, the only correct response is to uninstall Hulu immediately and put on your pirate hat.