I’m the administrator of kbin.life, a general purpose/tech orientated kbin instance.

  • 0 Posts
  • 12 Comments
Joined 2 years ago
cake
Cake day: June 29th, 2023

help-circle

  • I feel like the only even remotely acceptable way to do this is to show the ad, prompt for the answer for 10 seconds. They can log the right/wrong answer or if the time expires the lack of one and must move on.

    I can imagine metrics knowing if your advertising is actually reaching people is valid. But to make people answer and especially make them watch more if they answer wrong is about as dystopian as it gets.

    If (and I say if, I really don’t want to believe it is) that is the case, the only correct response is to uninstall Hulu immediately and put on your pirate hat.




  • So on my mbin instance, it’s on cloudflare. So I filter the AS numbers there. Don’t even reach my server.

    On the sites that aren’t behind cloudflare. Yep it’s on the nginx level. I did consider firewall level. Maybe just make a specific chain for it. But since I was blocking at the nginx level I just did it there for now. I mean it keeps them off the content, but yes it does tell them there’s a website there to leech if they change their tactics for example.

    You need to block the whole ASN too. Those that are using chrome/firefox UAs change IP every 5 minutes from a random other one in their huuuuuge pools.


  • Yeah, I probably should look to see if there’s any good plugins that do this on some community submission basis. Because yes, it’s a pain to keep up with whatever trick they’re doing next.

    And unlike web crawlers that generally check a url here and there, AI bots absolutely rip through your sites like something rabid.


  • If you’re running nginx I am using the following:

    if ($http_user_agent ~* "SemrushBot|Semrush|AhrefsBot|MJ12bot|YandexBot|YandexImages|MegaIndex.ru|BLEXbot|BLEXBot|ZoominfoBot|YaK|VelenPublicWebCrawler|SentiBot|Vagabondo|SEOkicks|SEOkicks-Robot|mtbot/1.1.0i|SeznamBot|DotBot|Cliqzbot|coccocbot|python|Scrap|SiteCheck-sitecrawl|MauiBot|Java|GumGum|Clickagy|AspiegelBot|Yandex|TkBot|CCBot|Qwantify|MBCrawler|serpstatbot|AwarioSmartBot|Semantici|ScholarBot|proximic|GrapeshotCrawler|IAScrawler|linkdexbot|contxbot|PlurkBot|PaperLiBot|BomboraBot|Leikibot|weborama-fetcher|NTENTbot|Screaming Frog SEO Spider|admantx-usaspb|Eyeotabot|VoluumDSP-content-bot|SirdataBot|adbeat_bot|TTD-Content|admantx|Nimbostratus-Bot|Mail.RU_Bot|Quantcastboti|Onespot-ScraperBot|Taboolabot|Baidu|Jobboerse|VoilaBot|Sogou|Jyxobot|Exabot|ZGrab|Proximi|Sosospider|Accoona|aiHitBot|Genieo|BecomeBot|ConveraCrawler|NerdyBot|OutclicksBot|findlinks|JikeSpider|Gigabot|CatchBot|Huaweisymantecspider|Offline Explorer|SiteSnagger|TeleportPro|WebCopier|WebReaper|WebStripper|WebZIP|Xaldon_WebSpider|BackDoorBot|AITCSRoboti|Arachnophilia|BackRub|BlowFishi|perl|CherryPicker|CyberSpyder|EmailCollector|Foobot|GetURL|httplib|HTTrack|LinkScan|Openbot|Snooper|SuperBot|URLSpiderPro|MAZBot|EchoboxBot|SerendeputyBot|LivelapBot|linkfluence.com|TweetmemeBot|LinkisBot|CrowdTanglebot|ClaudeBot|Bytespider|ImagesiftBot|Barkrowler|DataForSeoBo|Amazonbot|facebookexternalhit|meta-externalagent|FriendlyCrawler|GoogleOther|PetalBot|Applebot") { return 403; }

    That will block those that actually use recognisable user agents. I add any I find as I go on. It will catch a lot!

    I also have a huuuuuge IP based block list (generated by adding all ranges returned from looking up the following AS numbers):

    AS45102 (Alibaba cloud) AS136907 (Huawei SG) AS132203 (Tencent) AS32934 (Facebook)

    Since these guys run or have run bots that impersonate real browser agents.

    There are various tools online to return prefix/ip lists for an autonomous system number.

    I put both into a single file and include it into my web site config files.

    EDIT: Just to add, keeping on top of this is a full time job! EDIT 2: Removed Mojeek bot as it seems to be a normal web crawler.


  • Not sure how it is in the US. But here in the UK there’s two ways a business can export.

    1: They pre-clear the customs duty and include it in the sales total (so it’s like paying sales tax at the checkout, except it’s the pre-cleared duty fees). Then the parcel has a nice duty paid stamp and goes straight through customs (I guess unless customs are suspicious and check into it).

    2: They just charge you the item price with no tax applied. In which case you need to pay local tax and duties applicable once the product arrives. Here it’s a bit different. They will hold it at the local depot and you can either go there and pay + collect, or you can pay online and it will be rescheduled for delivery once you pay.

    As others have said, it’s not a scam. There’s no requirement for a business to do option 1, and it’s likely only viable for large businesses to register and have someone/software that knows the various duties required for various countries.

    I’ve ordered from newegg and B&M in the past for example, and in both cases the items were pre-cleared and arrived promptly without any hassle.

    Maybe there’s something similar for imports into the US too?


  • Article 12 (from the 1993 adoption) of the additional protocols from 1974-1977:

    Article 12 — Protection of medical units 1 Medical units shall be respected and protected at all times and shall not be the object of attack. 2 Paragraph 1 shall apply to civilian medical units, provided that they: a) belong to one of the Parties to the conflict; b) are recognized and authorized by the competent authority of one of the Parties to the conflict; or c) are authorized in conformity with Article 9, paragraph 2, of this Protocol or Article 27 of the First Convention. 3 The Parties to the conflict are invited to notify each other of the location of their medical units. The absence of such notification shall not exempt any of the Parties from the obligation to comply with the provisions of paragraph 1. 4 Under no circumstances shall medical units be used in an attempt to shield military objectives from attack. Whenever possible, the Parties to the conflict shall ensure that medical units are so sited that attacks against military objectives do not imperil their safety.

    As I read it, it seems very clear it would contravene section 4 there.

    EDIT: Actually I’d not call it clear. Because it seems to me they’re talking more about using hospitals and the like to shield military units. But I would argue hiding a unit in an ambulance is a good interpretation too.