Got a warning for my blog going over 100GB in bandwidth this month… which sounded incredibly unusual. My blog is text and a couple images and I haven’t posted anything to it in ages… like how would that even be possible?

Turns out it’s possible when you have crawlers going apeshit on your server. Am I even reading this right? 12,181 with 181 zeros at the end for ‘Unknown robot’? This is actually bonkers.

Edit: As Thunraz points out below, there’s a footnote that reads “Numbers after + are successful hits on ‘robots.txt’ files” and not scientific notation.

Edit 2: After doing more digging, the culprit is a post where I shared a few wallpapers for download. The bots have been downloading these wallpapers over and over, using 100GB of bandwidth usage in the first 12 days of November. That’s when my account was suspended for exceeding bandwidth (it’s an artificial limit I put on there awhile back and forgot about…) that’s also why the ‘last visit’ for all the bots is November 12th.

  • slazer2au@lemmy.world
    link
    fedilink
    English
    arrow-up
    73
    ·
    9 hours ago

    AI scrapers are the new internet DDoS.

    Might want to throw something Infront of your blog to ward them off like Anubis or a Tarpit.

    • ikt@aussie.zone
      link
      fedilink
      English
      arrow-up
      24
      arrow-down
      1
      ·
      edit-2
      9 hours ago

      the one with the quadrillion hits is this bad boy: https://www.babbar.tech/crawler

      Babbar.tech is operating a crawler service named Barkrowler which fuels and update our graph representation of the world wide web. This database and all the metrics we compute with are used to provide a set of online marketing and referencing tools for the SEO community.

          • Vorpal@programming.dev
            link
            fedilink
            English
            arrow-up
            4
            ·
            3 hours ago

            It is common custom to indicate quotes, with either “quotes” or for a longer quote a

            block quote

            The latter can be done by prefixing the line with a > here on lemmy (uses the common markdown syntax).

            Doing either of this help avoid ambiguity.

            • Jessica@discuss.tchncs.de
              link
              fedilink
              English
              arrow-up
              2
              ·
              24 minutes ago

              You replied to the wrong person. I already know this, but clearly the person who posted the quote doesn’t ;)

            • porcoesphino@mander.xyz
              link
              fedilink
              English
              arrow-up
              2
              ·
              2 hours ago

              Thanks the taking the time. I always find it hard to follow up and point out the ambiguity / alternative without coming across in some unwelcome way