Implementing a health check for your service is a critical component of its operation. However, many people may not have given it much thought. To ensure that you’re on the right track, here are some best practices and high-level guidelines for implementing a health check.
Firstly, it’s essential to have a website, such as https://www.githubstatus.com/, to track the health check result or current status of your site. Secondly, it’s crucial not to host this site using the same infrastructure as your service under monitoring. Otherwise, if your service is experiencing an outage, and your customers find the health check site down, it would be an awkward situation.
Thirdly, there are some guidelines to consider when implementing a health check. For instance, in Kubernetes, there are readiness and liveness checks. The former checks if the service is ready to serve requests, while the latter determines if the service is running (even if it’s in an unhealthy state). Liveness checks are faster than readiness checks. It’s worth considering how to implement a similar health check status website like githubstatus for your own services.
After some investigation, here are some best practices to consider:
- The health check should measure the real customer experience, which includes checking various aspects of your service, such as UI, API, authentication, and dependency health.
- The health check should be quick, small, and reliable.
- The health check is not meant for feature testing.
- Multiple health checks for different purposes may be necessary.
- The health check infrastructure should be separate from the service infrastructure.
- Metrics should be separated between health check traffic and real customer traffic for the service under test.
By following these best practices and guidelines, you can implement a robust health check system for your service.