Reading earlier comments in this community made me consider documenting the workings of my homelab to some extent, ie. docker configuration, credentials, ports and links of my services. I’ve tried to make it consistent and organised but it still feels half baked and insufficient. Everyone suggests documenting everything you do in your homelab but don’t state how. Since I’ve hardly had experience running my own server, I would really appreciate observing the blueprint of some other fellow selfhoster for copying or taking inspiration from rather than considering documentation to be ‘left as an exercise for the reader’.
Edit: I already have a note-taking solution with me. What I wish to ask is to know what needs to be documented and what the structure of the documentation should be to accommodate the information.
I’ve got a bunch of notes in Trilium.
I have a note for each service with the docker compose file, notes on backups, any weirdness with the setup, and when I update each service. I use Trilium as a crappy version control for the compose file.
I also have a note for the initial setup of my server (mostly setting up docker, setting up mergerfs and snapraid).
Other than that, I have one note for each device for my setup. (Wifi AP, OPNsense router, switch, etc) That I populate with random crap I might need to know later.
I believe it is traditional to do so written in blood in the style of an apocalypse log, dealer’s choice for who’s blood. Make sure it’s disjointed and nearly incomprehensible, but that everything is there.
Bonus points if you print the config files and write your documentation on them after stapling them to the walls
three, maybe four things:
- as mentioned: Obsidian. i pay for Sync cuz i like the product and want them to succeed and want reliable offsite backups and conflict resolution. use a ton of links and tags. i’ve been into using DataView to make tables of IoT devices, services, todo items, etc based on tags and other YAML frontmatter.
- chezmoi. manages my dotfiles so my machines are consistent. i have scripts that are heavily commented that show how to access MQTT, how to read and parse logs from journald, how to inspect my network, etc. i do think of them as code as documentation, even if they’re also just convenient.
- NixOS. this has been my code as config as documentation silver bullet. i use it as a replacement for Docker, k8s, Ansible, etc as it contains definitions for my machines and all the services and configuration they run, including any package dependencies and user configurations. no more statting an assortment of files to figure out the state of the system. it’s in
flake.nix - honorable mention to git and whatever git hosting provider is not on your network. track your work over time, and you’ll thank yourself when things go wrong.
some things are resistant to documentation and have a lot of stateful components (HomeAssitant is my biggest problem child from an infra perspective), but mainly being in that graph mindset of “how would i find a path here if i forgot where this was” helps a lot
I write homelab docs mostly for user guidance like onboarding, login, and service-specific stuff. This helps me better design for people by putting myself in their shoes, and should act as a reference document for any member to come back to.
Previously I built an Mkdocs-Material website with a nice subdomain for it, but since the project went on maintenance mode, I’m gonna migrate all docs back to a Forgejo wiki since it’s just Markdown anyways. I also run an issue tracker there, to manage the homelab’s roadmaps and features since it’s still evolving.
I find this approach benefiting compared to just documenting code. I’m not an IaC person yet, but I hope when I am, the playbooks should describe themselves for the nitty-gritty stuff anyways. I do write some infra notes for myself and perhaps to onboard maintainers, but most homelab developments happen in the issue tracker itself. The rest I try to keep it simple enough for an individual to understand
You’re on the right track. Like everything else in self-hosting you will learn and develop new strategies and scale things up to an appropriate level as you go and as your homelab grows. I think the key is to start with something immediately achievable, and iterate fast, aiming for continuous improvement.
My first idea was much like yours, very traditional documentation, with words, in a document. I quickly found the same thing you did, it’s half-baked and insufficient. There’s simply no way to make make it match the actual state of the system perfectly and it is simply inadequate to use English alone to explain what I did because that ends up being too vague to be useful in a technical sense.
My next realization was that in most cases what I really wanted was to be able to know every single command I had ever run, basically without exception. So I started documenting that instead of focusing on the wording and the explanations. Then I started to feel like I wasn’t capturing every command reliably because I would get distracted trying to figure out a problem and forget to, and it was duplication of effort to copy and paste commands from the console to the document or vice versa. That turned into the idea of collecting bunches of commands together into a script, that I could potentially just run, which would at least reduce the risk of gaps and missing steps. Then I could put the commands I wanted to run right into the script, run the script, and then save it for posterity, knowing I’d accurately captured both the commands I ran and the changes I made to get it working by keeping it in version control.
But upon attempting to do so, I found that just a bunch of long lists of commands on their own isn’t terribly useful so I started to group all the lists up, attempting to find commonalities by things like server or service, and then starting organize them better into scripts for different roles and intents that I could apply to any server or service, and over time this started to develop into quite a library of scripts. As I was doing this organizing I realized that as long as I made sure the script was functionally idempotent (doesn’t change behaviors or duplicate work when run repeatedly, it’s an important concept) I can guarantee that all my commands are properly documented and also that they have all been run – and if they haven’t, or I’m not sure, I can just run the script again as it’s supposed to always be safe to re-run no matter what state the system is in. So I started moving more and more to this strategy, until I realized that if I just organized this well enough, and made the scripts run automatically when they are changed or updated, I could not only improve my guarantees of having all these commands reliably run, but also quickly run them on many different servers and services all at once without even having to think about it.
There are some downsides of course, this leaves the potential of bugs in the scripts that make it not idempotent or not safe to re-run, and the only thing I can do is try to make sure they don’t happen, and if they do, identify and fix these bugs when they happen. The next step is probably to have some kind of testing process and environment (preferably automated) but now I’m really getting into the weeds. But at least I don’t really have any concerns that my system is undocumented anymore. I can quickly reference almost anything it’s doing or how it’s set up. That said, one other risk is that the system of scripts and automation becomes so complex that they start being too complex to quickly untangle, and at that point I’ll need better documentation for them. And ultimately you get into a circle of how do you validate the things your scripts are doing are actually working and doing what you expect them to do and that nothing is being missed, and usually you run back into the same ideas that doomed your documentation from the start, consistency and accuracy.
It also opens an attack vector, where somebody gaining access to these scripts not only gains all the most detailed knowledge of how your system is configured but also the potential to inject commands into those scripts and run them anywhere, so you have to make sure to treat these scripts and systems like the crown jewels they are. If they are compromised, you are in serious trouble.
By now I have of course realized (and you all probably have too) that I have independently re-invented infrastructure-as-code. There are tools and systems (ansible and terraform come to mind) to help you do this, and at some point I may decide to take advantage of them but personally I’m not there yet. Maybe soon. If you want to skip the intermediate steps I did, you might even be able to skip directly to that approach. But personally I think there is value in the process, it helps defining your needs and building your understanding that there really isn’t anything magical going on behind the scenes and that may help prevent these tools from turning into a black box which isn’t actually going to help you understand your system.
Do I have a perfect system? Of course not. In a lot of ways it’s probably horrific and I’m sure there are more experienced professionals out there cringing or perhaps already furiously warming up their keyboards. But I learned a lot, understand a lot more than I did when I started, and you can too. Maybe you’ll follow the same path I did, maybe you won’t. But you’ll get there.
I have a simple pile of Markdown files that I edit with Obsidian. I like the simple text file format because it keeps my documentation forwards-compatible. I use OpenWRT at the heart of my network, so I keep I right there in root’s home. Every long while I back it up to my general Documents which is then synced between my high-storage devices with SyncThing.
Thanks for your response. I already have Joplin synced with my server as a solution for my documentation. However I meant to ask how you structure your documentation, know what and how to mention, and organise it for future reference.
Don’t know if this helps since dokuwiki lets me link pages, but I have a main page where I just do a one paragraph description of every big thing in use.
each page has:
- an in depth description,
- how it’s set up,
- a list of features i use,
- how it connects to other services,
- and a miscellaneous for everything else
I’ll also add any notes in the misc section in case I need to reference them later. If a service is mentioned, I’ll create a page for it and link to it every time I mention it. That way nothing is more than a few clicks away and the documentation grows naturally as long as you don’t have any monolithic application. Example: (main -> Docker -> Project_Ozone_2 -> custom configurations Or main -> Joomla -> wysiwyg ->JCE Editor)
I also had a professor tell me to just write everything down first and then focus on formatting to find what kind of structure suits your needs best.
I agree with the advice that says “Document your setup such that you could recreate it from your notes from scratch” but I’d take it another step further — consider that someone may have to do some work on your system when you are unable or unavailable. The kind of thing you’d keep with your will, or power of attorney. Just a suggestion.
…and to my family I bequeath my entire collection of Linux iso’s
You jest but if I left my wife my Home Assistant setup undocumented she would pee on my grave.
Ansible is my config and documentation in one.
It’s reproducible, idempotent and I don’t need anything else.
I write all code myself, that makes it even easier to read.
I “document” everything by forcing myself to create ansible runbooks for new services and configs. I have some gaps, definitely, but the more of them I create, the easier new services are to deploy.
i make backups of everything an when writing configs i leave a bunch of comments
That’s the neat part, I don’t!
I have a docker-compose file, which is somewhat self-documenting, especially since I give everything descriptive names. Creds go in bitwarden anyway.
But then, my environment isn’t that complex, and I don’t have anything so custom that I need notes to replicate it.
I have two systems that sort of work together.
The first system involves a bunch of text files for each task. OS installation, basic post OS installation tasks and a file for each program I add (like UFW, apparmor, ddclient, docker and so on). They basically look like scripts with comments. If I want to I can just copy/paste everything into a terminal and reach a a specific state that I want to be at.
The second system is a sort of “skeleton” file tree that only contains all the files that I have added or modified.
Here's an example of what my server skeleton file tree looks like
. ├── etc │ ├── crontabs │ │ └── root │ ├── ddclient │ │ └── ddclient.conf │ ├── doas.d │ │ └── doas.conf │ ├── fail2ban │ │ ├── filter.d │ │ │ └── alpine-sshd-key.conf │ │ └── jail.d │ │ └── alpine-ssh.conf │ ├── modprobe.d │ │ ├── backlist-extra.conf │ │ └── disable-filesystems.conf │ ├── network │ │ └── interfaces │ ├── periodic │ │ └── 1min │ │ └── dynamic-motd │ ├── profile.d │ │ └── profile.sh │ ├── ssh │ │ └── sshd_config │ ├── wpa_supplicant │ │ └── wpa_supplicant.conf │ ├── fstab │ ├── nanorc │ ├── profile │ └── sysctl.conf ├── home │ └── pi-user │ ├── .config │ │ └── ash │ │ ├── ashrc │ │ └── profile │ ├── .ssh │ │ └── authorized_keys │ ├── .sync │ │ ├── file-system-backup │ │ │ ├── .sync-server-fs_01_root │ │ │ └── .sync-server-fs_02_boot │ │ └── .sync-caddy_certs_backup │ ├── .nanorc │ └── .tmux.conf ├── root │ ├── .config │ │ └── mc │ │ └── ini │ ├── .local │ │ └── share │ │ └── mc │ │ └── history -> /dev/null │ ├── .ssh │ │ └── authorized_keys │ ├── scripts │ │ ├── automated-backup │ │ └── maintenance │ ├── .ash_history -> /dev/null │ └── .nanorc ├── srv │ ├── caddy │ │ ├── Caddyfile │ │ ├── Dockerfile │ │ └── docker-compose.yml │ └── kiwix │ └── docker-compose.yml └── usr └── sbin ├── containers-down ├── containers-up ├── emountman ├── fs-backup-quick └── rtransferThis is useful to me because I can keep track of every change I make. I even have it set up so I can use
rsyncto quickly chuck all the files into place after a fresh install or after adding/modifying files.I also created and maintain a “quick install” guide so I can install a fresh OS,
rsyncall the modified files from my skeleton file tree into place, then run through all the commands in my quick install guide to get myself back to the same state in a minimal amount of time.(Bookmarked for when I have the mental capacity to …)
Do y’all also document backup/restore procedures?
How often do you test it?Frankly, with my screwed up brain, I document everything. I can turn around twice in my lab and my brain will flat line. When I first started, I would always tell myself that I’d remember stuff. Not anymore.
I created a script for Linux that automatically backs up to a NAS drive, once every two weeks, as a complete image, and I keep 5 on deck. Testing usually happens once every 3 months or so. I also have Duplicati backups that are stored offsite on my VPS.
I have a repo for the infra files (compose files and terraform files just for playing). I store the docs in the same repo in MD files. As for the secrets, I’m using docker swarm, so I can store the needed passwords there. otherwise Vaulwarden is my go to, <ad> self hosted, lightweight password manager, compatible with bitwarden clients </ad> I’m a little paranoid if the note-service got db corruptions, I might loose too much info, so git is the way (personal opinion).
edit: add the related MD file next to the compose file, one folder per service, the source and the doc will be coupled in one place.
Whenever I set something up I usually make a markdown file listing the commands and steps to take. I do this as I am setting things up and familiarizing myself, so once I’m done, I have a start to finish guide.
Raw text/markdown files will be readable until the end of time.







