Some thoughts on how useful Anubis really is. Combined with comments I read elsewhere about scrapers starting to solve the challenges, I’m afraid Anubis will be outdated soon and we need something else.

  • interdimensionalmeme@lemmy.ml
    link
    fedilink
    English
    arrow-up
    2
    arrow-down
    6
    ·
    17 hours ago

    Even if your server is a cell phone from 2015, if it’s operating correctly and the cpu is maxed out, that means it’s fully utilized and services hundreds of megabits of information.

    You’ve decided to let the entire world read from your server, that indiscriminatory policy is letting people you don’t want getting your data, get your data and use your resources.

    You want to correct that by making everyone that comes in solve a puzzle, therefore in some way degrading their access, it’s not surprising that they’re going to complain. The other day I had to wait over 30 second at an anubis puzzle page, when I know that the AI scrappers have no problem getting through, something on my computer, probably some anti-crypto mining protection is getting triggered by it and now I can’t no-script the web either because of that thing and it can’t even stop scrappers anyway !

    So, anubis is going to be left behind, all the real users are, for years, going to be annoyed and have their entire internet degraded by it while the scrappers got that institutionally figured out in days.

    If it’s freely available public data then the solution isn’t restricting access trying to play a futile arms race with the scrapper and throwing the real users to the dogs, it’s to have standardized incremental efficient database dumps so the scrappers stop assuming every website is interoperability-hostile and scrape them. Let facebook and xitter fight the scrappers, let anyone trying to leverage public (and especially user contributed data) fight the scrappers.

    • tofu@lemmy.nocturnal.gardenOP
      link
      fedilink
      English
      arrow-up
      3
      ·
      16 hours ago

      Even if one would want to give them everything, they don’t care. They just burn through their resources and recursively scrape every single link on your page. Providing standardized database dumps is absolutely not helping against your server being overloaded by scrapers of various companies with deep pockets.

      • interdimensionalmeme@lemmy.ml
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        2
        ·
        15 hours ago

        Like anubis, that’s not going to last, the point isn’t to hammer the web servers off the net, it’s to get the precious data. The more standardized and streamlined that’s going to be made and only if there’s no preferential treatment to certain players (open ai / google facebook) then the dumb scraper will burn themselves out.

        One nice thing about anubis and nepenthes is that it’s going to burn out those dumb scrapers faster and force them to become more sophisticated and stealth. That’s should resolve the ddos problem on its own.

        For the truly public data sources, I think coordinated database dumps is the way to go, for hostile carrier, like reddit and facebook, it’s going to be scrapper arms race warfare like Cory Doctorow predicted.

    • poVoq@slrpnk.net
      link
      fedilink
      English
      arrow-up
      4
      arrow-down
      1
      ·
      16 hours ago

      Aha, an apologist for AI scraper DDOS, why didn’t you say so directly instead of wasting my time?

      • interdimensionalmeme@lemmy.ml
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        4
        ·
        15 hours ago

        The ddos is caused by the gatekeeping, there was no such issue before the 2023 API wars, fork over the goods and nobody gets hurt, it’s not complicated, you want to publish information to the public, don’t scrunch it up behind diseased trackers and ad infested pages which burn you cpu cycles. Or just put it in a big tarball torrent, the web is turning into a cesspool, how long until our browsers don’t even query websites at all but self-hosted crawler and search like searxng, at least then I won’t be catching cooties from your javascript cryptomining bots embed into the pages !

        • Deathray5@lemmynsfw.com
          link
          fedilink
          English
          arrow-up
          2
          ·
          11 hours ago

          “fork over the goods and nobody gets hurt” mate you are not sounding like the good person here