So recently been spending time configuring my selfhosted services with notifications usint ntfy. I’ve added ntfy to report status on containers and my system using Beszel. However, only 12 out of my 44 containers seem to have healthcheck “enabled” or built in as a feature. So im now wondering what is considered best practice for monitoring the uptime/health of my containers. I am already using uptimekuma, with the “docker container” option for each of my containers i deem necessary to monitor, i do not monitor all 44 of them 😅

So I’m left with these questions;

  1. How do you notify yourself about the status of a container?
  2. Is there a “quick” way to know if a container has healthcheck as a feature.
  3. Does healthcheck feature simply depend on the developer of each app, or the person building the container?
  4. Is it better to simply monitor the http(s) request to each service? (I believe this in my case would make Caddy a single point of failure for this kind of monitor).

Thanks for any input!

  • folekaule@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    ·
    4 days ago
    1. Some kind of monitoring software, like the Grafana stack. I like email and Discord notifications.
    2. The Dockerfile will have a HEALTHCHECK statement, but in my experience this is pretty rare. Most of the time I set up a health check in the docker compose file or I extended the Dockerfile and add my own. You sometimes need to add a tool (like curl) to do the health check anyway.
    3. It’s a feature of the container, but the app needs to support some way of signaling “health”, such as through a web API.
    4. It depends on your needs. You can do all of the above. You can do so-called black box monitoring where you’re just monitoring whether your webapp is up or down. Easy. However, for a business you may want to know about problems before they happen, so you add white box monitoring for sub-components (database, services), timing, error counts, etc.

    To add to that: health checks in Docker containers are mostly for self-healing purposes. Think about a system where you have a web app running in many separate containers across some number of nodes. You want to know if one container has become too slow or non-responsive so you can restart it before the rest of the containers are overwhelmed, causing more serious downtime. So, a health check allows Docker to restart the container without manual intervention. You can configure it to give up if it restarts too many times, and then you would have other systems (like a load balancer) to direct traffic away from the failed subsystems.

    It’s useful to remember that containers are “cattle not pets”, so a restart or shutdown of a container is a “business as usual” event and things should continue to run in a distributed system.

    • CameronDev@programming.dev
      link
      fedilink
      English
      arrow-up
      1
      ·
      4 days ago

      This isnt really the same as a health check. PS just checks that the process is up and running, but it could be lagging or deadlocked, or the socket closed.

      A proper healthcheck checks if the application is actually healthy and behaving correctly.

  • funkajunk@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    3 days ago

    I just put a healthcheck in my compose files and then run an autoheal container that will automatically restart them if they are “unhealthy”.

  • irmadlad@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    3 days ago

    Dozzle will tell you just about everything you want to know about the health of a container. Sadly, to my knowledge, it does not integrate with any notification platforms like nfty, even though there is a long standing request for that feature.

    • Sips'@slrpnk.netOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      3 days ago

      Jupp running that too 😅 Was not aware of the pensing feature, ill keep my eyes open for that in the future!

    • Sips'@slrpnk.netOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      3 days ago

      Maybe a transition to a cluster homelab should be the goal of 2026, would be fun.

      • Noxy@pawb.social
        link
        fedilink
        English
        arrow-up
        2
        ·
        3 days ago

        maybe! three raspis and k3s have served me mostly well for years, tho with usb/sata adapters cuz the microsd was getting rather unreliable after awhile

        • Sips'@slrpnk.netOP
          link
          fedilink
          English
          arrow-up
          1
          ·
          3 days ago

          Nice one that, fortunately i just rebuilt my server with an i5-12400 new fancy case amd slowly transitioning to an all in ssd build! I would probably lean towards a singlenode cluster using Talos.

          • Noxy@pawb.social
            link
            fedilink
            English
            arrow-up
            1
            ·
            3 days ago

            I haven’t heard of Talos before, sounds like it’s not fully open source?

            • Sips'@slrpnk.netOP
              link
              fedilink
              English
              arrow-up
              2
              ·
              edit-2
              3 days ago

              Talos is really awesome, its a minimal OS strictly built to run kubernetes. We use it at work and its running in production for a lot of people. Its extremely minimal and can only be used via its own api, talosctl command. Its minimalism makes it great for security and less resource heavy than alternatives.

              Check this out for a quick’ funny taste of why one should consider using Talos >>

              [60sec video from Sidero Labs, creators of Talos] https://www.youtube.com/watch?v=UiJYaU16rYU

              Talos is under MPL 2.0, afaik that is open-source.

  • manwichmakesameal@lemmy.world
    link
    fedilink
    English
    arrow-up
    9
    ·
    3 days ago

    I use uptimekuma with notifications through home assistant. I get notifications on my phone and watch. I had notifications set up to go to a room on my matrix homeserver but recently migrated it and don’t feel like messing with the room.

  • ryokimball@infosec.pub
    link
    fedilink
    English
    arrow-up
    2
    arrow-down
    1
    ·
    4 days ago

    What happened to grafana and Prometheus?

    I have been putting off rebuilding my home cluster since moving but that used to be the default for much of this and I’m not hearing that in these responses.

    • eli@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      1
      ·
      3 days ago

      While I love and run Grafana and Prometheus myself, it’s like taking a RPG to an ant.

      There are simpler tools that do the job just fine of “is X broken?”.

      Even just running Portainer and attaching it to a bunch of standalone Docker environments is pretty good too.

  • AgaveInMyAss@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    4 days ago

    I use Gatus in conjunction with http APIs for health checking. For services that don’t support that, you can always pattern match the HTML code.

  • Lambda@lemmy.ca
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    9
    ·
    4 days ago

    I decided that at my scale, NixOS is easier to maintain. So for me its just a `systemctl status <thing I host>ˋ

    • poVoq@slrpnk.net
      link
      fedilink
      English
      arrow-up
      8
      ·
      4 days ago

      With Podman and Quadlets you can use the same command to check on containers as well. The Systemd integration of Podman is pretty neat.

      • Sips'@slrpnk.netOP
        link
        fedilink
        English
        arrow-up
        1
        ·
        3 days ago

        Yeah eventually i will transition to this but not until after i migrate away from Unraid for more granular control. Looking forward to it though!

  • CameronDev@programming.dev
    link
    fedilink
    English
    arrow-up
    2
    ·
    4 days ago

    I rely on the developers putting in a health check, but few do.

    I’ve also got uptime kuma setup, which is kinda like an external healthcheck.

  • frongt@lemmy.zip
    link
    fedilink
    English
    arrow-up
    5
    ·
    4 days ago

    If I go to its web interface (because everything is a web interface) and it’s down, then I know it has a problem.

    I could set up monitoring, but I wouldn’t care enough to fix it until I had free time to use it either.

    • tuckerm@feddit.online
      link
      fedilink
      English
      arrow-up
      3
      ·
      3 days ago

      Same here. I’m the only user of my services, so if I try visiting the website and it’s down, that’s how I know it’s down.

      I prefer phrasing it differently, though. “With my current uptime monitoring strategy, all endpoints serve as an on-demand healthcheck endpoint.”

      One legitimate thing I do, though, is have a systemd service that starts each docker compose file. If a container crashes, systemd will notice (I think it keeps an eye on the PIDs automatically) and restart them.

  • realitaetsverlust@piefed.zip
    link
    fedilink
    English
    arrow-up
    24
    ·
    edit-2
    4 days ago

    How do you notify yourself about the status of a container?

    I usually notice if a container or application is down because that usually results in something in my house not working. Sounds stupid, but I’m not hosting a hyper available cluster at home.

    Is there a “quick” way to know if a container has healthcheck as a feature.

    Check the documentation

    Does healthcheck feature simply depend on the developer of each app, or the person building the container?

    If the developer adds a healthcheck feature, you should use that. If there is none, you can always build one yourself. If it’s a web app, a simple HTTP request does the trick, just validate the returned HTML - if the status code is 200 and the output contains a certain string, it seems to be up. If it’s not a web app, like a database, a simple SELECT 1 on the database could tell you if it’s reachable or not.

    Is it better to simply monitor the http(s) request to each service? (I believe this in my case would make Caddy a single point of failure for this kind of monitor).

    If you only run a bunch of web services that you use on demand, monitoring the HTTP requests to each service is more than enough. Caddy being a single point of failure is not a problem because your caddy being dead still results in the service being unusable. And you will immediately know if caddy died or the service behind it because the error message looks different. If the upstream is dead, caddy returns a 502, if caddy is dead, you’ll get a “Connection timed out”

    • lps2@lemmy.ml
      link
      fedilink
      English
      arrow-up
      9
      arrow-down
      1
      ·
      4 days ago

      For databases, many like postgres have a ping / ready command you can use to ensure it’s up and not have the overhead of an actual query! Redis is the same way (I feel like pg and redis health checks covers a lot of the common stack patterns)

    • Sips'@slrpnk.netOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      3 days ago

      Yeah fair enough this, personally want to monitor backend services too just for good measure. Also to prove to my friends and family that i can maintain a higher uptime % than cloudflare 🤣

      • mmmac@lemmy.zip
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        3 days ago

        If you’re looking for this you can use something like uptime kuma, which pings each service and looks for a specific response or it will ping you

        I doubled down recently and now have Grafana dashboards + alerts for all of my proxmox hosts, their containers etc.

        Alerts are mainly mean CPU, memory or disk utilization > 80% over 5 minutes

        I also get all of my notifications via a self hosted ntfy instance :~)

        • Sips'@slrpnk.netOP
          link
          fedilink
          English
          arrow-up
          1
          ·
          3 days ago

          As i wrote in my post, im already using uptimekuma to monitor my services. However if i choose the “docker container” mode foe uptimekuma to monitor it cant actually so that, as there is no health feature in most containers, so this results in 100% downtime 🙃 Other way would to do it would to just check the url of the service whoch ofc works too, but its not a “true” health check.

  • prettybunnys@piefed.social
    link
    fedilink
    English
    arrow-up
    4
    ·
    edit-2
    3 days ago
    docker inspect --format='{{json .State.Health}}' <container_name>
    

    HEALTHCHECK is part of the Dockerfile syntax and ought to be supported by all your container runtimes

    https://docs.docker.com/reference/dockerfile/#healthcheck

    You could extend all the dockerfiles that don’t have a health check to implement this feature with whatever health check makes sense for the application, even if for now it’s just a curl of an endpoint.