If your health check is broken, then you might not notice that a service is down and you’ll fail to deploy a replacement. Or the opposite, and you end up constantly replacing it, creating a “flapping” service.
Well, you see, the mistake you are making is believing a single thing the stupid AWS status board says. It is always fucking lying, sometimes in new and creative ways.
I mean if your OS was “smart” as not to send IO to devices that indicate critical failure (e.g. by marking them read-only in the array?), and then thinks all devices have failed critically, wouldn’t this happen in that kind of system as well…
according to that page the issue stemmed from an underlying system responsible for health checks in load balancing servers.
how the hell do you fuck up a health check config that bad? that’s like messing up smartd.conf and taking your system offline somehow
If your health check is broken, then you might not notice that a service is down and you’ll fail to deploy a replacement. Or the opposite, and you end up constantly replacing it, creating a “flapping” service.
Well, you see, the mistake you are making is believing a single thing the stupid AWS status board says. It is always fucking lying, sometimes in new and creative ways.
I mean if your OS was “smart” as not to send IO to devices that indicate critical failure (e.g. by marking them read-only in the array?), and then thinks all devices have failed critically, wouldn’t this happen in that kind of system as well…