Anubis is awesome and I want to talk about it

SmokeyDope@piefed.social · edit-2 4 months ago

Anubis is awesome and I want to talk about it

A_norny_mousse@feddit.org · edit-2 4 months ago

At the time of commenting, this post is 8h old. I read all the top comments, many of them critical of Anubis.

I run a small website and don’t have problems with bots. Of course I know what a DDOS is - maybe that’s the only use case where something like Anubis would help, instead of the strictly server-side solution I deploy?

I use CrowdSec (it seems to work with caddy btw). It took a little setting up, but it does the job.
(I think it’s quite similar to fail2ban in what it does, plus community-updated blocklists)

Am I missing something here? Why wouldn’t that be enough? Why do I need to heckle my visitors?

Despite all that I still had a problem with bots knocking on my ports spamming my logs.

By the time Anubis gets to work, the knocking already happened so I don’t really understand this argument.

If the system is set up to reject a certain type of requests, these are microsecond transactions of no (DDOS exception) harm.

poVoq@slrpnk.net · edit-2 3 months ago

AI scraping is a massive issue for specific types of websites, such as git forges, wikis and to a lesser extend Lemmy etc, that rely on complex database operations that can not be easily cached. Unless you massively overprovision your infrastructure these web-applications come to a grinding halt by constantly maxing out the available CPU power.

The vast majority of the critical commenters here seem to talk from a point of total ignorance about this, or assume operators of such web applications have time for hyperviligance to constantly monitor and manually block AI scrapers (that do their best to circumvent more basic blocks). The realistic options for such operators are right now: Anubis (or similar), Cloudflare or shutting down their servers. Of these Anubis is clearly the least bad option.

chunes@lemmy.world · 3 months ago

Sounds like maybe webapps are a bad idea then.

If they need dynamism, how about releasing a desktop application?

SmokeyDope@piefed.social · edit-2 4 months ago

If crowdsec works for you thats great but also its a corporate product whos premium sub tier starts at 900$/month not exactly a pure self hosted solution.

I’m not a hypernerd, still figuring all this out among the myriad of possible solutions with different complexity and setup times. All the self hosters in my internet circle started adopting anubis so I wanted to try it. Anubis was relatively plug and play with prebuilt packages and great install guide documentation.

Allow me to expand on the problem I was having. It wasnt just that I was getting a knock or two, its that I was getting 40 knocks every few seconds scraping every page and searching for a bunch that didnt exist that would allow exploit points in unsecured production vps systems.

On a computational level the constant network activity of bytes from webpage, zip files and images downloaded from scrapers pollutes traffic. Anubis stops this by trapping them in a landing page that transmits very little information from the server side. By traping the bot in an Anubis page which spams that 40 times on a single open connection before it gives up, it reduces overall network activity/ data transfered which is often billed as a metered thing as well as the logs.

And this isnt all or nothing. You don’t have to pester all your visitors, only those with sketchy clients. Anubis uses a weighted priority which grades how legit a browser client is. Most regular connections get through without triggering, weird connections get various grades of checks by how sketchy they are. Some checks dont require proof of work or JavaScript.

On a psychological level it gives me a bit of relief knowing that the bots are getting properly sinkholed and I’m punishing/wasting the compute of some asshole trying to find exploits my system to expand their botnet. And a bit of pride knowing I did this myself on my own hardware without having to cop out to a corporate product.

Its nice that people of different skill levels and philosophies have options to work with. One tool can often complement another too. Anubis worked for what I wanted, filtering out bots from wasting network bandwith and giving me peace of mind where before I had no protection. All while not being noticeable for most people because I have the ability to configure it to not heckle every client every 5 minutes like some sites want to do.

A_norny_mousse@feddit.org · edit-2 3 months ago

If crowdsec works for you thats great but also its a corporate product

It’s also fully FLOSS with dozens of contributors (not to speak of the community-driven blocklists). If they make money with it, great.

not exactly a pure self hosted solution.

Why? I host it, I run it. It’s even in Debian Stable repos, but I choose their own more up-to-date ones.

Allow me to expand on the problem I was having. It wasnt just that I was getting a knock or two, its that I was getting 40 knocks every few seconds scraping every page and searching for a bunch that didnt exist that would allow exploit points in unsecured production vps systems.

Again, a properly set up WAF will deal with this pronto
You should not have exploit points in unsecured production systems, full stop.

On a computational level the constant network activity of bytes from webpage, zip files and images downloaded from scrapers pollutes traffic. Anubis stops this by trapping them in a landing page that transmits very little information from the server side.

And instead you leave the computations to your clients. Which becomes a problem on slow hardware.
Again, with a properly set up WAF there’s no “traffic pollution” or “downloading of zip files”.

Anubis uses a weighted priority which grades how legit a browser client is.

And apart from the user agent and a few other responses, all of which are easily spoofed, this means “do some javascript stuff on the local client” (there’s a link to an article here somewhere that explains this well) which will eat resources on the client’s machine, which becomes a real pita on e.g. smartphones.

Also, I use one of those less-than-legit, weird and non-regular browsers, and I am being punished by tools like this.

All the self hosters in my internet circle started adopting anubis so I wanted to try it. Anubis was relatively plug and play with prebuilt packages

edit: I feel like this part of OP’s argument needs to be pointed out, it explains so much:

All the self hosters in my internet circle started adopting anubis so I wanted to try it. Anubis was relatively plug and play with prebuilt packages

SmokeyDope@piefed.social · edit-2 3 months ago

why? I run it.

Mmm how to say this. i suppose what I’m getting at is like a philosophy of development and known behaviors of corporate products.

So, here’s what I understand about crowdsec. Its essentially like a centralized collection of continuously updated iptable rules and botscanning detectors that clients install locally.

In a way its crowd sourcing is like a centralized mesh network each client is a scanner node which phones home threat data to the corporate home which updates that.

Notice the optimal word, centralized. The company owns that central home and its their proprietary black box to do what they want with. And so you know what for profit companies like to do to their services over time? Enshittify them by

adding subscription tier price models
putting once free features behind paywalls,
change data sharing requirements as a condition for free access
restricting free api access tighter and tighter to encourage paid tiers,
making paid tiers cost more to do less.
Intentionally ruining features in one service to drive power users to use a different.

They can and do use these tactics to drive up profit or reduce overhead once a critical mass has been reached. I do not expect alturism and respect for usersfrom corporations, I expect bean counters using alturism as a vehicle to attract users in the growing phase and then flip the switch in their tos to go full penny pinching once they’re too big to fail.

Crowdsecs pricing updates from last year

CrowdSec updated pricing policy

Hi everyone,

Our former pricing model led to some incomprehensions and was sub-optimal for some use-cases.

We remade it entirely here. As a quick note, in the former model, one never had to pay $2.5K to get premium blocklists. This was Support for Enterprise, which we poorly explained. Premium blocklists were and are still available from the premium SaaS plan, accessible directly from the SaaS console.

Here are the updates:

Security Engine: All its embedded features (IDS, IPS and WAF) were, are and will remain free.

SAAS: The free plan offers up to three silver-grade blocklists (on top of receiving IP related to signals your security engines share). Premium plans can use any free, premium and gold-grade blocklists. Previously, we had a premium and an enterprise plan with more features. All features are now merged into a unique SaaS enterprise plan. The one starting at $31/month. As before, those are available directly from the SaaS console page: https://app.crowdsec.net/

SUPPORT: The $2.5K (which were mostly support for Enterprise) are now becoming optional. Instead, a client can contract $1K for Emergency bug & security fixes and $1K for support if they want to.

BLOCKLISTS: Very specific (country targeted, industry targeted, stack targeted, etc.) or AI-enhanced are now nested in a different offer named “Platinum blocklists subscription”. You can subscribe to them, regardless of whether you use the FOSS Security Engine or not. They can be joined, tuned, and injected directly into most firewalls with regular automatic remote updates of their content. As long as you do not resell them (meaning you are the final client), you can use the subscription in any part of your company.

CTI DATA: They can be consumed through API keys with associated quotas. These are affordable and intended for use in tools like OpenCTI, MISP, The Hive, Xsoar, etc. Costs are in the range of hundreds of dollars per month. The Full CTI database can also be locally replicated at your place and constantly synced for deltas. Those are the largest plans we have, and they are usually destined to L/XL enterprises, governmental bodies, OEM & hardware vendors.

Safer together.
14
·
14
Comments Section
u/ShroomShroomBeepBeep avatar
ShroomShroomBeepBeep
•
1y ago

Whilst I’m pleased to see it made clearer, £290 a year for each security engine is still far too expensive for me to consider it.
2
u/GuitarEven avatar
GuitarEven
•
1y ago

We get that £290 is too high for individual home labs. Those offers are made for companies.
Free tier features should cover homelabs correctly.

Features that are oriented for enterprise clients.
If a company cannot invest $300 yearly in its security, no judgment and the free tier will still be very helpful until it recovers some budget margins to strengthen its security posture.
4
[deleted]
•
1y ago

Any idea why we dont have any good free / freemium (max $5 per month) app yet. Reason am asking - adguard, urigin etc had filters which matches js/domains and filters them out. Same logic can be applied atleast for the ip lists - so that these ips cann be added to iptables to block. A lot of things are easy to make. The tough ones are things like scenarios and may be ssh bw etc. I wonder why no real competition.
1
u/GuitarEven avatar
GuitarEven
•
1y ago

hi u/ElizabethThomas44

Well you actually do. To date, for free, you get:

the security engine (IDS/IPS/WAF)
all scenarios
the blocklist of IPs you are participating to detect when you use scenarios and share signals
the free tier of the console

The IPs you automatically get for free are already added to your nftables or iptables using the related remediation component.

<TL/DR> You already have it.

(damn, personal reddit account, sorry, this is Philippe@CrowdSec)
4

At the end of the day its not the thousands of anonymous users contributing their logs or Foss voulenteers on git getting a quarterly payout. They’re the product and free compute + live action pen testing ginnea pigs, no matter what PR they spin saying how much they care about the security of the plebs using their network for free.

Its always about maximizing the money with these people your security can get fucked if they dont get some use out of you. Expect at some point the tos will change so that anonymized data sharing is no longer an option for free tier.

What happens if the company goes bankrupt? Does it just stop working when their central servers shut down? Does their open source security have the possibility of being forked and run from local servers?

It doesnt have to be like this. Peer to peer Decentralized mesh networks like YaCy already show its possible for a crowdsourced network of users can all contribute to an open database. Something that can be completely run as a local Node which federates and updates the information in global node. Something like it that updates a global iptables is already a step in the right direction. In that theoretical system there is no central monopoly its like the fediverse everyone contributes to hosting the global network as a mesh which altruistic hobbyist can contribute free compute to on their own terms.

https://github.com/yacy/yacy_search_server

I"I dont see anything wrong with people getting paid" is something I see often on discussions. Theres nothing wrong with people who do work and make contributions getting paid. What’s wrong is it isnt the open source community on github or the users contributing their precious data getting paid, its a for profit centralized monopoly that controls access to the network which the open source community built for free out of alturism.

The pattern is nearly always the same. The thing that once worked well and which you relied on gets slowly worse each ToS update, while their pricing inches just a dollar higher each quarter, and you get less and less control over how you get to use their product. Its pattern recognition.

The only solution is to cut the head off the snake. If I can’t fully host all of the components, see the source code of the mechanisms at all layers, own a local copy of the global database, then its not really mine.

Again, it’s a philosophy thing. Its very easy to look at all that, shrug, and go “whatever not my problem I’ll just switch If it becomes an issue”. But the problem festers the longer its ignored or enabled for convinence. The community needs to truly own the services they run on every level, it has to be open, and for profit bean counters can’t be part of the equation especially for hosting. There are homelab hobbyist out there who will happily eat cents on a electric bill to serve an open service to a community, get 10,000 of them on a truly open source decentralized mesh network and you can accomplish great things without fear of being the product.

quick_snail@feddit.nl · 4 months ago

With varnish and wazuh, I’ve never had a need for Anubis.

My first recommendation for anyone struggling with bots is to fix their cache.

kalleboo@lemmy.world · 3 months ago

Anubis was originally created to protect git web interfaces since they have a lot of heavy-to-compute URLs that aren’t feasible to cache (revision diffs, zip downloads etc).

After that I think it got adopted by a lot of people who didn’t actually need it, they just don’t like seeing AI scrapers in their logs.

quick_snail@feddit.nl · 3 months ago

Yes!

Also, another very simple solution is to authwall expensive pages that can’t be cached.

Miggi@discuss.tchncs.de · 4 months ago

I also used CrowdSec for almost a year, but as AI scrapers became more aggressive, CrowdSec alone wasn’t enough. The scrapers used distributed IP ranges and spoofed user agents, making them hard to detect and costing my Forgejo instance a lot in expensive routes. I tried custom CrowdSec rules but hit its limits.

Then I discovered Anubis. It’s been an excellent complement to CrowdSec — I now run both. In my experience they work very well together, so the question isn’t “A or B?” but rather “How can I combine them, if needed?”

daniskarma@lemmy.dbzer0.com · edit-2 3 months ago

You are right. For most self-hosting usecases anubis is not only irrelevant, but it actually works against you. False sense of security and making your devices do extra work for nothing.

Anubis is though for public facing services that may get ddos or AI scrapped by some not targeted bot (for a target bot it’s trivial to get over Anubis in order to scrap).

And it’s never a substitute of crowdsec or fail2ban. Getting an Anubis token it’s just a matter of executing the PoW challenge. You still need a way to detect and ban malicious attacks.