I got into the self-hosting scene this year when I wanted to start up my own website run on old recycled thinkpad. A lot of time was spent learning about ufw, reverse proxies, header security hardening, fail2ban.

Despite all that I still had a problem with bots knocking on my ports spamming my logs. I tried some hackery getting fail2ban to read caddy logs but that didnt work for me. I nearly considered giving up and going with cloudflare like half the internet does. But my stubbornness for open source self hosting and the recent cloudflare outages this year have encouraged trying alternatives.

Coinciding with that has been an increase in exposure to seeing this thing in the places I frequent like codeberg. This is Anubis, a proxy type firewall that forces the browser client to do a proof-of-work security check and some other nice clever things to stop bots from knocking. I got interested and started thinking about beefing up security.

I’m here to tell you to try it if you have a public facing site and want to break away from cloudflare It was VERY easy to install and configure with caddyfile on a debian distro with systemctl. In an hour its filtered multiple bots and so far it seems the knocks have slowed down.

https://anubis.techaro.lol/

My botspam woes have seemingly been seriously mitigated if not completely eradicated. I’m very happy with tonights little security upgrade project that took no more than an hour of my time to install and read through documentation. Current chain is caddy reverse proxy -> points to Anubis -> points to services

Good place to start for install is here

https://anubis.techaro.lol/docs/admin/native-install/

  • ___qwertz___@feddit.org
    link
    fedilink
    English
    arrow-up
    4
    arrow-down
    3
    ·
    7 hours ago

    Funnily enough, PoW was a hot topic in academia around the late 90s / early 2000, and it’s somewhat clear that the autor of Anubis has not read much about the discussion back then.

    There was a paper called “Proof of work does not work” (or similar, can’t be bothered to look it up) that argued that PoW can not work for spam protection, because you have to support both low-powered consumer devices while blocking spammers with heavy hardware. And that is very valid concern. Then there was a paper arguing that PoW can still work, as long as you scale the difficulty in such a way that a legit user (e.g. only sending one email) has a low difficulty, while a spammer (sending thousands of emails) has a high difficulty.

    The idea of blocking known bad actors actually is used in email quite a lot in forms of DNS block lists (DNSBLs) such as spamhaus (this has nothing to do with PoW, but such a distributed list could be used to determine PoW difficulty).

    Anubis on the other hand does nothing like that and a bot developed to pass Anubis would do so trivially.

    Sorry for long text.

    • sudo@programming.dev
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 hour ago

      Then there was a paper arguing that PoW can still work, as long as you scale the difficulty in such a way that a legit user

      Telling a legit user from a fake user is the entire game. If you can do that you just block the fake user. Professional bot blockers like Cloudflare or Akamai have machine learning systems to analyze trends in network traffic and serve JS challenges to suspicious clients. Last I checked, all Anubis uses is User-Agent filters, which is extremely behind the curve. Bots are able to get down to faking TLS fingerprints and matching them with User-Agents.

    • Flipper@feddit.org
      link
      fedilink
      English
      arrow-up
      8
      ·
      6 hours ago

      At least in the beginning the scrapers just used curl with a different user agent. Forcing them to use a headless client is already a 100x increase in resources for them. That in itself is already a small victory and so far it is working beautifully.

      • sudo@programming.dev
        link
        fedilink
        English
        arrow-up
        2
        ·
        56 minutes ago

        Well in most cases it would by Python requests not curl. But yes, forcing them to use a browser is the real cost. Not just in CPU time but in programmer labor. PoW is overkill for that though.