I’m the administrator of kbin.life, a general purpose/tech orientated kbin instance.

  • 0 Posts
  • 27 Comments
Joined 2 years ago
cake
Cake day: June 29th, 2023

help-circle
  • I feel like the only even remotely acceptable way to do this is to show the ad, prompt for the answer for 10 seconds. They can log the right/wrong answer or if the time expires the lack of one and must move on.

    I can imagine metrics knowing if your advertising is actually reaching people is valid. But to make people answer and especially make them watch more if they answer wrong is about as dystopian as it gets.

    If (and I say if, I really don’t want to believe it is) that is the case, the only correct response is to uninstall Hulu immediately and put on your pirate hat.




  • So on my mbin instance, it’s on cloudflare. So I filter the AS numbers there. Don’t even reach my server.

    On the sites that aren’t behind cloudflare. Yep it’s on the nginx level. I did consider firewall level. Maybe just make a specific chain for it. But since I was blocking at the nginx level I just did it there for now. I mean it keeps them off the content, but yes it does tell them there’s a website there to leech if they change their tactics for example.

    You need to block the whole ASN too. Those that are using chrome/firefox UAs change IP every 5 minutes from a random other one in their huuuuuge pools.


  • Yeah, I probably should look to see if there’s any good plugins that do this on some community submission basis. Because yes, it’s a pain to keep up with whatever trick they’re doing next.

    And unlike web crawlers that generally check a url here and there, AI bots absolutely rip through your sites like something rabid.


  • If you’re running nginx I am using the following:

    if ($http_user_agent ~* "SemrushBot|Semrush|AhrefsBot|MJ12bot|YandexBot|YandexImages|MegaIndex.ru|BLEXbot|BLEXBot|ZoominfoBot|YaK|VelenPublicWebCrawler|SentiBot|Vagabondo|SEOkicks|SEOkicks-Robot|mtbot/1.1.0i|SeznamBot|DotBot|Cliqzbot|coccocbot|python|Scrap|SiteCheck-sitecrawl|MauiBot|Java|GumGum|Clickagy|AspiegelBot|Yandex|TkBot|CCBot|Qwantify|MBCrawler|serpstatbot|AwarioSmartBot|Semantici|ScholarBot|proximic|GrapeshotCrawler|IAScrawler|linkdexbot|contxbot|PlurkBot|PaperLiBot|BomboraBot|Leikibot|weborama-fetcher|NTENTbot|Screaming Frog SEO Spider|admantx-usaspb|Eyeotabot|VoluumDSP-content-bot|SirdataBot|adbeat_bot|TTD-Content|admantx|Nimbostratus-Bot|Mail.RU_Bot|Quantcastboti|Onespot-ScraperBot|Taboolabot|Baidu|Jobboerse|VoilaBot|Sogou|Jyxobot|Exabot|ZGrab|Proximi|Sosospider|Accoona|aiHitBot|Genieo|BecomeBot|ConveraCrawler|NerdyBot|OutclicksBot|findlinks|JikeSpider|Gigabot|CatchBot|Huaweisymantecspider|Offline Explorer|SiteSnagger|TeleportPro|WebCopier|WebReaper|WebStripper|WebZIP|Xaldon_WebSpider|BackDoorBot|AITCSRoboti|Arachnophilia|BackRub|BlowFishi|perl|CherryPicker|CyberSpyder|EmailCollector|Foobot|GetURL|httplib|HTTrack|LinkScan|Openbot|Snooper|SuperBot|URLSpiderPro|MAZBot|EchoboxBot|SerendeputyBot|LivelapBot|linkfluence.com|TweetmemeBot|LinkisBot|CrowdTanglebot|ClaudeBot|Bytespider|ImagesiftBot|Barkrowler|DataForSeoBo|Amazonbot|facebookexternalhit|meta-externalagent|FriendlyCrawler|GoogleOther|PetalBot|Applebot") { return 403; }

    That will block those that actually use recognisable user agents. I add any I find as I go on. It will catch a lot!

    I also have a huuuuuge IP based block list (generated by adding all ranges returned from looking up the following AS numbers):

    AS45102 (Alibaba cloud) AS136907 (Huawei SG) AS132203 (Tencent) AS32934 (Facebook)

    Since these guys run or have run bots that impersonate real browser agents.

    There are various tools online to return prefix/ip lists for an autonomous system number.

    I put both into a single file and include it into my web site config files.

    EDIT: Just to add, keeping on top of this is a full time job! EDIT 2: Removed Mojeek bot as it seems to be a normal web crawler.



  • r00ty@kbin.lifetoLinux@lemmy.mlIntel or AMD CPUs for new Laptops?
    link
    fedilink
    arrow-up
    7
    arrow-down
    1
    ·
    3 months ago

    Well for a gamer no real comment. But there is one metric Intel still trashes AMD in for the APU. Hardware video acceleration/encoding. The quality is objectively better on Intel Quicksync.

    When getting a home box that also needed to do transcoding, Intel CPU was a requirement. My desktop development/gaming system? Ryzen + NVidia.



  • I did a routine upgrade on my mbin server, where I had an old version with changes I made myself.

    Well turns out I upgraded something (probably redis) that broke symfony that broke everything.

    So I had a fun afternoon upgrading to the latest mbin version. I mean I needed to anyway but my hand was forced.

    Yep sometimes an innocent looking update will change your weekend plans.

    Anyways, any reason not to use ssh?





  • The way I read it, the developer wanted opt-out but it’s likely it will be opt-in. I’m find with opt-in and vehemently against opt-out for telemetry.

    I would prefer the information was statistical only. Rather than hostname (making the assumption they only want hostname to be able to somehow separate the data to follow changes over time), a much better idea would be some kind of hash based on information unlikely to change, but enough information that it would be unlikely possible to brute-force the original data out of the hash. So all they know is, this data came from the same machine, but cannot ID the machine. Maybe some kind of unique but otherwise untrackable unique ID is created at install time and ONLY used for this purpose and no other.






  • I would very much agree here. I’ve (admittedly mostly server side) been using linux for around 30 years now. But I’m still dual booting on my desktop. There’s just a few things that will still only work in Linux, and also if I break things I can go to windows if I need to do something “right now”

    Dual boot gives you the option of, if you have the time trying to make something work in linux. But, if you don’t have the time, just boot to windows and do it.

    How I do things, is I have drives that are shared between both OS (I use btrfs since there is a windows driver and, so far (around 3 years) I’ve had no corruption problems. But you can share ntfs too and a boot drive for both. But, it’s not a requirement.

    Also yes, it is quite easy to break a linux install. It’s not really because Linux is bad. It’s just because you have so much choice in which drivers to use, which desktop environment (and even the components that make it up) that it’s easy to accidentally select some combination that doesn’t work and you end up with only a console to fix things from.

    I like that the OP is choosing Mint. I’ve not used Mint, but from all I’ve seen it looks a real good option for someone starting into Linux from no experience.