

I’ve been looking into crowdsec for ages now and still haven’t gotten around to even a test deployment. One of these days, lol, and I’ll get around to it.
I’m surprisingly level-headed for being a walking knot of anxiety.
Ask me anything.
Special skills include: Knowing all the “na na na nah nah nah na” parts of the Three’s Company theme.
I also develop Tesseract UI for Lemmy/Sublinks
Avatar by @SatyrSack@feddit.org


I’ve been looking into crowdsec for ages now and still haven’t gotten around to even a test deployment. One of these days, lol, and I’ll get around to it.


Oooooh. That’s smart. I mostly host apps, but in theory, I should be able to dynamically modify the response body and tack on some HTML for a hidden button and do that.
I used to disallow everything in robots.txt but the worst crawlers just ignored it. Now my robots.txt says all are welcome and every bot gets shunted to the tarpit 😈


I’ve got bot detection setup in Nginx on my VPS which used to return 444 (Nginx for "close the connection and waste no more resources processing it), but I recently started piping that traffic to Nepenthes to return gibberish data for them to train on.
I documented a rough guide in the comment here. Of relevance to you are the two .conf files at the bottom. In the deny-disallowed.conf, change the line for return 301 ... to return 444
I also utilize firewall and fail2ban in the VPS to block bad actors, overly-aggressive scrapers, password brute forces, etc and the link between the VPS and my homelab equipment never sees that traffic.
In the case of a DDoS, I’ve done the following:
Granted, I’m not running anything mission-critical, just some services for friends and family, so I can deal with a little downtime.


I used to use HAProxy but switched to Nginx so I could add the modsecurity module and run WAF services. I still use HAProxy for some things, though.


I have never used it, so take this with a grain of salt, but last I read, with the free tier, you could not secure traffic between yourself and Cloudflare with your own certs which implies they can decrypt and read that traffic. What, if anything, they do with that capability I do not know. I just do not trust my hosted assets to be secured with certs/keys I do not control.
There are other things CF can do (bot detection, DDoS protection, etc), but if you just want to avoid exposing your home IP, a cheap VPS running Nginx can work the same way as a CF tunnel. Setup Wireguard on the VPS and have your backend servers in Nginx connect to your home assets via that. If the VPS is the “server” side of the WG tunnel, you don’t have to open any local ports in your router at all. I’ve been doing that, originally with OpenVPN, since before CF tunnels were ever offered as a service.
Edit: You don’t even need WG, really. If you setup a persistent SSH tunnel and forward / bind a port to your VPS, you can tunnel the traffic over that.


So, I set this up recently and agree with all of your points about the actual integration being glossed over.
I already had bot detection setup in my Nginx config, so adding Nepenthes was just changing the behavior of that. Previously, I had just returned either 404 or 444 to those requests but now it redirects them to Nepenthes.
Rather than trying to do rewrites and pretend the Nepenthes content is under my app’s URL namespace, I just do a redirect which the bot crawlers tend to follow just fine.
There’s several parts to this to keep my config sane. Each of those are in include files.
An include file that looks at the user agent, compares it to a list of bot UA regexes, and sets a variable to either 0 or 1. By itself, that include file doesn’t do anything more than set that variable. This allows me to have it as a global config without having it apply to every virtual host.
An include file that performs the action if a variable is set to true. This has to be included in the server portion of each virtual host where I want the bot traffic to go to Nepenthes. If this isn’t included in a virtual host’s server block, then bot traffic is allowed.
A virtual host where the Nepenthes content is presented. I run a subdomain (content.mydomain.xyz). You could also do this as a path off of your protected domain, but this works for me and keeps my already complex config from getting any worse. Plus, it was easier to integrate into my existing bot config. Had I not already had that, I would have run it off of a path (and may go back and do that when I have time to mess with it again).
The map-bot-user-agents.conf is included in the http section of Nginx and applies to all virtual hosts. You can either include this in the main nginx.conf or at the top (above the server section) in your individual virtual host config file(s).
The deny-disallowed.conf is included individually in each virtual hosts’s server section. Even though the bot detection is global, if the virtual host’s server section does not include the action file, then nothing is done.
Note that I’m treating Google’s crawler the same as an AI bot because…well, it is. They’re abusing their search position by double-dipping on the crawler so you can’t opt out of being crawled for AI training without also preventing it from crawling you for search engine indexing. Depending on your needs, you may need to comment that out. I’ve also commented out the Python requests user agent. And forgive the mess at the bottom of the file. I inherited the seed list of user agents and haven’t cleaned up that massive regex one-liner.
# Map bot user agents
## Sets the $ua_disallowed variable to 0 or 1 depending on the user agent. Non-bot UAs are 0, bots are 1
map $http_user_agent $ua_disallowed {
default 0;
"~PerplexityBot" 1;
"~PetalBot" 1;
"~applebot" 1;
"~compatible; zot" 1;
"~Meta" 1;
"~SurdotlyBot" 1;
"~zgrab" 1;
"~OAI-SearchBot" 1;
"~Protopage" 1;
"~Google-Test" 1;
"~BacklinksExtendedBot" 1;
"~microsoft-for-startups" 1;
"~CCBot" 1;
"~ClaudeBot" 1;
"~VelenPublicWebCrawler" 1;
"~WellKnownBot" 1;
#"~python-requests" 1;
"~bitdiscovery" 1;
"~bingbot" 1;
"~SemrushBot" 1;
"~Bytespider" 1;
"~AhrefsBot" 1;
"~AwarioBot" 1;
# "~Poduptime" 1;
"~GPTBot" 1;
"~DotBot" 1;
"~ImagesiftBot" 1;
"~Amazonbot" 1;
"~GuzzleHttp" 1;
"~DataForSeoBot" 1;
"~StractBot" 1;
"~Googlebot" 1;
"~Barkrowler" 1;
"~SeznamBot" 1;
"~FriendlyCrawler" 1;
"~facebookexternalhit" 1;
"~*(?i)(80legs|360Spider|Aboundex|Abonti|Acunetix|^AIBOT|^Alexibot|Alligator|AllSubmitter|Apexoo|^asterias|^attach|^BackDoorBot|^BackStreet|^BackWeb|Badass|Bandit|Baid|Baiduspider|^BatchFTP|^Bigfoot|^Black.Hole|^BlackWidow|BlackWidow|^BlowFish|Blow|^BotALot|Buddy|^BuiltBotTough|
^Bullseye|^BunnySlippers|BBBike|^Cegbfeieh|^CheeseBot|^CherryPicker|^ChinaClaw|^Cogentbot|CPython|Collector|cognitiveseo|Copier|^CopyRightCheck|^cosmos|^Crescent|CSHttp|^Custo|^Demon|^Devil|^DISCo|^DIIbot|discobot|^DittoSpyder|Download.Demon|Download.Devil|Download.Wonder|^dragonfl
y|^Drip|^eCatch|^EasyDL|^ebingbong|^EirGrabber|^EmailCollector|^EmailSiphon|^EmailWolf|^EroCrawler|^Exabot|^Express|Extractor|^EyeNetIE|FHscan|^FHscan|^flunky|^Foobot|^FrontPage|GalaxyBot|^gotit|Grabber|^GrabNet|^Grafula|^Harvest|^HEADMasterSEO|^hloader|^HMView|^HTTrack|httrack|HTT
rack|htmlparser|^humanlinks|^IlseBot|Image.Stripper|Image.Sucker|imagefetch|^InfoNaviRobot|^InfoTekies|^Intelliseek|^InterGET|^Iria|^Jakarta|^JennyBot|^JetCar|JikeSpider|^JOC|^JustView|^Jyxobot|^Kenjin.Spider|^Keyword.Density|libwww|^larbin|LeechFTP|LeechGet|^LexiBot|^lftp|^libWeb|
^likse|^LinkextractorPro|^LinkScan|^LNSpiderguy|^LinkWalker|msnbot|MSIECrawler|MJ12bot|MegaIndex|^Magnet|^Mag-Net|^MarkWatch|Mass.Downloader|masscan|^Mata.Hari|^Memo|^MIIxpc|^NAMEPROTECT|^Navroad|^NearSite|^NetAnts|^Netcraft|^NetMechanic|^NetSpider|^NetZIP|^NextGenSearchBot|^NICErs
PRO|^niki-bot|^NimbleCrawler|^Nimbostratus-Bot|^Ninja|^Nmap|nmap|^NPbot|Offline.Explorer|Offline.Navigator|OpenLinkProfiler|^Octopus|^Openfind|^OutfoxBot|Pixray|probethenet|proximic|^PageGrabber|^pavuk|^pcBrowser|^Pockey|^ProPowerBot|^ProWebWalker|^psbot|^Pump|python-requests\/|^Qu
eryN.Metasearch|^RealDownload|Reaper|^Reaper|^Ripper|Ripper|Recorder|^ReGet|^RepoMonkey|^RMA|scanbot|SEOkicks-Robot|seoscanners|^Stripper|^Sucker|Siphon|Siteimprove|^SiteSnagger|SiteSucker|^SlySearch|^SmartDownload|^Snake|^Snapbot|^Snoopy|Sosospider|^sogou|spbot|^SpaceBison|^spanne
r|^SpankBot|Spinn4r|^Sqworm|Sqworm|Stripper|Sucker|^SuperBot|SuperHTTP|^SuperHTTP|^Surfbot|^suzuran|^Szukacz|^tAkeOut|^Teleport|^Telesoft|^TurnitinBot|^The.Intraformant|^TheNomad|^TightTwatBot|^Titan|^True_Robot|^turingos|^TurnitinBot|^URLy.Warning|^Vacuum|^VCI|VidibleScraper|^Void
EYE|^WebAuto|^WebBandit|^WebCopier|^WebEnhancer|^WebFetch|^Web.Image.Collector|^WebLeacher|^WebmasterWorldForumBot|WebPix|^WebReaper|^WebSauger|Website.eXtractor|^Webster|WebShag|^WebStripper|WebSucker|^WebWhacker|^WebZIP|Whack|Whacker|^Widow|Widow|WinHTTrack|^WISENutbot|WWWOFFLE|^
WWWOFFLE|^WWW-Collector-E|^Xaldon|^Xenu|^Zade|^Zeus|ZmEu|^Zyborg|SemrushBot|^WebFuck|^MJ12bot|^majestic12|^WallpapersHD)" 1;
}
# Deny disallowed user agents
if ($ua_disallowed) {
# This redirects them to the Nepenthes domain. So far, pretty much all the bot crawlers have been happy to accept the redirect and crawl the tarpit continuously
return 301 https://content.mydomain.xyz/;
}


Weird. Other than how it used to choke when there were conflicts (and all uploads stopped until that was fixed) I haven’t had any issues like that. Guess I’m just lucky.


I’ve had pretty good experience with Nextcloud’s instant upload. The only time I’ve had it shit the bed was ages ago when it would occasionally get stuck on a conflict, but that hasn’t happened in a long time. Pretty much all of my image folders (camera/DCIM, Screenshots, Downloads) get synced. The only annoying thing was when apps would suddenly change where they download to and I’d have to reconfigure yet another sync folder, but I can’t really fault NC for that.
Mine is set to upload and keep a local copy and only do a one way sync (phone to NC). Not sure if that causes less issues than a 2 way sync or deleting the local copy after upload?


AFAIK, yes. Though I haven’t tried it with WPA3.
While WEP is dead, you can still use it to capture the WPA/2 handshake and run it through something like John the Ripper to try to recover the passphrase.
Admittedly, I haven’t messed with it in years.
Either way, you still need a wireless adapter that’s capable of promiscuous mode as well as a driver for it that supports packet injection (not sure how rare that is nowadays).


When I was in college, I rented a house just outside my budget and found I couldn’t afford things like cable internet.
I had a wifi->ethernet bridge that was originally to connect my OG Xbox to a wifi network. I also had neighbors with Wifi using WEP encryption. An idea was born.
Was able to use aircrack-ng on my laptop to crack their WEP key in about 15 minutes. Plugged that key into my wifi->ethernet bridge, and then hooked that into my router. Bam, my whole house was online.
That worked for probably a year and a half.


Can confirm, but depending on the VPS, your traffic may only be metered in one direction. Mine only meters egress, not ingress, so it’s not too bad if I want to use my media server.


All of mine use bind mounts so I can just tar-gz the whole deploy folder for backups and migrations. For volumes that connect to remote shares (SMB, NFS, etc) I use named volumes and let Docker take care of their lifecycle.
If named docker volumes would let me specify the local filesystem location, I’d use them. As-is, I rarely do.


Then you really should list all of the secondary functions you plan to add to it, make sure they understand what those are, and agree to each of them: full disclosure.
If you do something on it that could get them in trouble, it’s their ass on the line, not yours.


Are you friends okay with you doing that? I would not be, especially if my so-called friend didn’t disclose the secondary operations of the device that’s in my home, on my internet connection, under my name.


Yeah, I saw the steps for VLC and they’re similar. I tend to prefer CLI, so that’s what I did / wrote up.


I use SnappyMail. It’s a fork of Rainloop that’s actually maintained.
https://github.com/the-djmaze/snappymail
And unlike Rainloop, the Sieve filter editor actually works.


I have a single Nginx setup which is the frontend for all my web services. So I only need to deploy it there (and to its HA partner). My renewal script just scp’s it to the secondary and does an nginx -s reload on both.
I do generate separate certs/keys for my non-web servers, but there’s only two of those.
You could also, if you wanted, just generate one cert and distribute it and its key to everything with a script or other automation tool (Ansible is what I used to use).


Is there a way I can get Let’s Encrypt to dole out a wildcard certificate
Yep. Just specify the domains yourdomain.com and *.yourdomain.com in the certbot request. Wildcard domains require the DNS-based challenge, but you’ve said you’re already good there. You don’t technically need the apex domain (yourdomain.com) but I always add it since I do have services running there.
Any subdomains under the wildcard can use internal DNS or internal IPs on the public DNS (I do the former, but the latter works too).
I used to run an internal CA, and it wasn’t too hard to setup a CA and distribute my root cert. Except on mobile devices. On Android it was easy, but there was a persistent warning that my network traffic could be intercepted (which is true when there’s a custom root cert installed), but it since it was my cert, it got annoying seeing that all the time. Not sure if Apple devices can even do that, but regardless, it wasn’t practical for friends who wanted to use my self-hosted services to install a custom cert when they were over.


Is there EV support?
Looks like it, yeah:

The UI still shows Fuel, but it seems like you can enter the kWh and it should calculate. Maybe plug some values into the demo to be sure. If you do, let us know!
Like you’re thinking: put HAProxy on your OpenWRT router.
That’s what I do. The HAProxy setup is kind of “dumb” L7 only (rather than HTTP/S) since I wanted all of my logic in the Nginx services. The main thing HAProxy does is, like you’re looking for, put the SPOF alongside the other unavoidable SPOF (router) and also wraps the requests in Proxy Protocol so the downstream Nginx services will have the correct client IP.
Flow is basically:
LAN/WAN/VPN -> HAProxy -> Two Nginx Instances -> AppsWith HAProxy in the router, it also lets me set internal DNS records for my apps to my router’s LAN IP.