Some thoughts on how useful Anubis really is. Combined with comments I read elsewhere about scrapers starting to solve the challenges, I’m afraid Anubis will be outdated soon and we need something else.

  • poVoq@slrpnk.net
    link
    fedilink
    English
    arrow-up
    89
    arrow-down
    1
    ·
    edit-2
    23 hours ago

    And it was/is for sure the lesser evil compared to what most others did: put the site behind Cloudflare.

    I feel people that complain about Anubis have never had their server overheat and shut down on an almost daily basis because of AI scrapers 🤦

    • daniskarma@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      1
      ·
      2 hours ago

      I still think captchas are a better solution.

      In order to surpass them they have to run AI inference which is also comes with compute costs. But for legitimate users you don’t run unauthorized intensive tasks on their hardware.

      • poVoq@slrpnk.net
        link
        fedilink
        English
        arrow-up
        2
        ·
        33 minutes ago

        They are much worse for accessibility, and also take longer to solve and are more distruptive for the majority of users.

        • daniskarma@lemmy.dbzer0.com
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          1
          ·
          edit-2
          22 minutes ago

          Anubis is worse for privacy. As you have to have JavaScript enabled. And worse for the environment as the cryptographic challenges with PoW are just a waste.

          Also reCaptcha types are not really that disturbing most of the time.

          As I said, the polite thing you just be giving users the options. Anubis PoW running directly just for entering a website is one of the most rudest piece of software I’ve seen lately. They should be more polite, and just give an option to the user, maybe the user could chose to solve a captcha or run Anubis PoW, or even just having Anubis but after a button the user could click.

          I don’t think is good practice to run that type of software just for entering a website. If that tendency were to grow browsers would need to adapt and straight up block that behavior. Like only allow access to some client resources after an user action.

          • poVoq@slrpnk.net
            link
            fedilink
            English
            arrow-up
            2
            ·
            edit-2
            18 minutes ago

            Are you seriously complaining about an (entirely false) negative privacy aspect of Anubis and then suggest reCaptcha from Google is better?

            Look, no one thinks Anubis is great, but often it is that or the website becoming entirely inaccessible because it is DDOSed to death by the AI scrapers.

            • daniskarma@lemmy.dbzer0.com
              link
              fedilink
              English
              arrow-up
              1
              ·
              11 minutes ago

              First, I said reCaptcha types, meaning captchas of the style of reCaptcha. That could be implemented outside a google environment. Secondly, I never said that types were better for privacy. I just said Anubis is bad for privacy. Traditional captchas that work without JavaScript would be the privacy friendly way.

              Third, it’s not a false proposition. Disabling JavaScript can protect your privacy a great deal. A lot of tracking is done through JavaScript.

              Last, that’s just the Anubis PR slogan. Not the truth, as I said ddos mitigation could be implemented in other ways. More polite and/or environmental friendly.

              Are you astrosurfing for anubis? Because I really cannot understand why something as simple as a landing page with a button “run PoW challenge” would be that bad

    • mobotsar@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      4
      ·
      17 hours ago

      Is there a reason other than avoiding infrastructure centralization not to put a web server behind cloudflare?

      • poVoq@slrpnk.net
        link
        fedilink
        English
        arrow-up
        17
        ·
        14 hours ago

        Yes, because Cloudflare routinely blocks entire IP ranges and puts people into endless captcha loops. And it snoops on all traffic and collects a lot of metadata about all your site visitors. And if you let them terminate TLS they will even analyse the passwords that people use to log into the services you run. It’s basically a huge survelliance dragnet and probably a front for the NSA.

      • Björn Tantau@swg-empire.de
        link
        fedilink
        English
        arrow-up
        9
        arrow-down
        1
        ·
        16 hours ago

        Cloudflare would need https keys so they could read all the content you worked so hard to encrypt. If I wanted to do bad shit I would apply at Cloudflare.

        • mobotsar@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          6
          ·
          edit-2
          15 hours ago

          Maybe I’m misunderstanding what “behind cloudflare” means in this context, but I have a couple of my sites proxied through cloudflare, and they definitely don’t have my keys.

          I wouldn’t think using a cloudflare captcha would require such a thing either.

          • StarkZarn@infosec.pub
            link
            fedilink
            English
            arrow-up
            10
            ·
            14 hours ago

            That’s because they just terminate TLS at their end. Your DNS record is “poisoned” by the orange cloud and their infrastructure answers for you. They happen to have a trusted root CA so they just present one of their own certificates with a SAN that matches your domain and your browser trusts it. Bingo, TLS termination at CF servers. They have it in cleartext then and just re-encrypt it with your origin server if you enforce TLS, but at that point it’s meaningless.

          • Björn Tantau@swg-empire.de
            link
            fedilink
            English
            arrow-up
            4
            ·
            edit-2
            12 hours ago

            Hmm, I should look up how that works.

            Edit: https://developers.cloudflare.com/ssl/origin-configuration/ssl-modes/#custom-ssltls

            They don’t need your keys because they have their own CA. No way I’d use them.

            Edit 2: And with their own DNS they could easily route any address through their own servers if they wanted to, without anyone noticing. They are entirely too powerful. Is there some way to prevent this?

    • tofu@lemmy.nocturnal.gardenOP
      link
      fedilink
      English
      arrow-up
      17
      arrow-down
      1
      ·
      edit-2
      22 hours ago

      Yeah, I’m just wondering what’s going to follow. I just hope everything isn’t going to need to go behind an authwall.

        • interdimensionalmeme@lemmy.ml
          link
          fedilink
          English
          arrow-up
          5
          arrow-down
          6
          ·
          18 hours ago

          What CPU do you have made after 2004 that doesn’t have automatic temperature control ?
          I don’t think there is any, unless you somehow managed to disable it ?
          Even a raspberry pi without a heatsink won’t overheat to shutdown

          • poVoq@slrpnk.net
            link
            fedilink
            English
            arrow-up
            8
            arrow-down
            1
            ·
            17 hours ago

            You are right, it is actually worse, it usually just overloads the CPU so badly that it starts to throttle and then I can’t even access the server via SSH anymore. But sometimes it also crashes the server so that it reboots, and yes that can happen on modern CPUs as well.

            • interdimensionalmeme@lemmy.ml
              link
              fedilink
              English
              arrow-up
              4
              arrow-down
              6
              ·
              17 hours ago

              You need to set you http serving process to a priority below the administrative processes (in the place where you are starting it, so assuming linux server that would be your init script or systemd service unit).

              Actual crash causing reboot ? Do you have faulty ram maybe ? That’s really not ever supposed to happen from anything happenning in userland. That’s not AI, your stuff might be straight up broken.

              Only thing that isn’t broken that could reboot a server is a watchdog timer.

              You server shouldn’t crash, reboot or become unreachable from the admin interface even at 100% load and it shouldn’t overheat either, temperatures should never exceed 80C no matter what you do, it’s supposed to be impossible with thermal management, which all processors have had for decades.

              • poVoq@slrpnk.net
                link
                fedilink
                English
                arrow-up
                3
                arrow-down
                1
                ·
                15 hours ago

                Great that this is all theoretical 🤷 My server hardware might not be the newest but it is definitly not broken.

                And besides, what good is that you can still barely access the server through ssh, when the cpu is constantly maxed out and site visitors only get a timeout when trying to access the services?

                I don’t even get what you are trying to argue here. That the AI scraper DDOS isn’t so bad because in theory it shouldn’t crash the server? Are you even reading what you are writing yourself? 🤡

                • daniskarma@lemmy.dbzer0.com
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  ·
                  edit-2
                  2 hours ago

                  Why the hell don’t you limit the CPU usage of that service?

                  For any service that could hog resources so bad that they can block the entire system the normal thing to do is to limit their max resource usage. This is trivial to do using containers. I do it constantly for leaky software.

                  • poVoq@slrpnk.net
                    link
                    fedilink
                    English
                    arrow-up
                    1
                    ·
                    39 minutes ago

                    Obviously I did that, but that just means the site becomes inaccessible even sooner.

                • interdimensionalmeme@lemmy.ml
                  link
                  fedilink
                  English
                  arrow-up
                  2
                  arrow-down
                  6
                  ·
                  14 hours ago

                  Even if your server is a cell phone from 2015, if it’s operating correctly and the cpu is maxed out, that means it’s fully utilized and services hundreds of megabits of information.

                  You’ve decided to let the entire world read from your server, that indiscriminatory policy is letting people you don’t want getting your data, get your data and use your resources.

                  You want to correct that by making everyone that comes in solve a puzzle, therefore in some way degrading their access, it’s not surprising that they’re going to complain. The other day I had to wait over 30 second at an anubis puzzle page, when I know that the AI scrappers have no problem getting through, something on my computer, probably some anti-crypto mining protection is getting triggered by it and now I can’t no-script the web either because of that thing and it can’t even stop scrappers anyway !

                  So, anubis is going to be left behind, all the real users are, for years, going to be annoyed and have their entire internet degraded by it while the scrappers got that institutionally figured out in days.

                  If it’s freely available public data then the solution isn’t restricting access trying to play a futile arms race with the scrapper and throwing the real users to the dogs, it’s to have standardized incremental efficient database dumps so the scrappers stop assuming every website is interoperability-hostile and scrape them. Let facebook and xitter fight the scrappers, let anyone trying to leverage public (and especially user contributed data) fight the scrappers.

                  • tofu@lemmy.nocturnal.gardenOP
                    link
                    fedilink
                    English
                    arrow-up
                    3
                    ·
                    13 hours ago

                    Even if one would want to give them everything, they don’t care. They just burn through their resources and recursively scrape every single link on your page. Providing standardized database dumps is absolutely not helping against your server being overloaded by scrapers of various companies with deep pockets.

                  • poVoq@slrpnk.net
                    link
                    fedilink
                    English
                    arrow-up
                    4
                    arrow-down
                    1
                    ·
                    14 hours ago

                    Aha, an apologist for AI scraper DDOS, why didn’t you say so directly instead of wasting my time?