-
Notifications
You must be signed in to change notification settings - Fork 655
Health Check (Preview)
The Health Check feature is used to prevent unhealthy instance(s) from serving requests, thus improving availability. The feature will ping the specified health check path on all instances of your webapp every 2 minutes. The instance is considered healthy if it responds within 2 minutes with 200-299 status code. If an instance does not respond accordingly and consecutively after 5 pings, the instance is considered "unhealthy" and our service will stop routing requests to it. This documentation applies to App Service and App Service Environments.
This feature is recommended by Azure support to increase your deployment's resiliency.
When Health Checks are enabled, App Service will ping the provided path. If a successful response code is not received after 5 pings, the instance is considered "unhealthy".
Unhealthy instance(s) will be excluded from the load balancer. During such time, due to reduced number of instances, remaining healthy ones may experience increasing loads. To avoid overwhelming healthy instances, at most half of instances may be excluded. For example, if an ASP (app service plan) is configured for 4 instances and 3 of which are unhealthy, we will exclude at most 2 unhealthy. The other 2 instances (1 healthy and 1 unhealthy) will continue to receive the load. For worst case scenario where all are unhealthy, none will be excluded.
If instance continues to be unhealthy for 1 hour, it will be replaced with new instance. At most one instance will be replaced per hour, with a maximum of three instances per day per App Service Plan.
To enable the feature, open the Resource explorer. The resource explorer will open to the top-level view of your Azure Resources. Use the left-side toolbar to drill-down to your web app. Expand the config section of the web app and click the web tab. Find the element named "healthCheckPath"
, set its value to the health path of your application. For example, "/health/"
, "/api/health/"
, or "/status/"
.
You must have 2 or more instances for the feature to take effect.
To disable the feature, follow the instructions above to find the "healthCheckPath" element. Set its value to an empty string (""
) and click "PATCH" at the top. After clicking PATCH, the property value will return to null
.
The health check path should check the critical components of your application. For example, if you application depends on a database and a messaging system, the health check endpoint should perform a (minimal) database query and send a quick message.
The path must respond within two minutes with a status code between 200 and 299 (inclusive). If the path does not respond within two minutes, or returns a status code outside the range, then the instance is considered "unhealthy". Health Check integrates with Easy Auth so our service will be able to ping the endpoint if Easy Auth is enabled. If you are using your own authentication system, the health check path must allow http anonymous access.
As of June 2020, the health check path must be http only. An upcoming update will support https routes.
If you are using Spring for your Java application, see the Spring Actuator project. Spring Actuator implements a basic /health
endpoint for your application.
If you are not using Spring or would like to implement your own health check endpoint, an example Java implementation is shown below.
@RestController
public class Controller {
@GetMapping("/health")
public ResponseEntity<String> getHealth() {
Boolean storageIsAvailable = this.getStorageAvailability();
Boolean thirdPartyAPIIsAvailable = this.getAPIAvailability();
if (storageIsAvailable && thirdPartyAPIIsAvailable) {
return new ResponseEntity<>("Application and dependent components are OK.", HttpStatus.OK);
} else {
return new ResponseEntity<>("Application is unhealthy.", HttpStatus.INTERNAL_SERVER_ERROR);
}
}
Your unhealthy workers can be removed from the load balancer rotation sooner if your healthcheck path responds immediately to the ping. To do this, your healthcheck path should store a "last known status" that can be immediately returned to the healthcheck request. Once the response is sent, your healthcheck endpoint should check the status of the core components (database, messaging system) and prepare the new status for the following healthcheck ping.
Development teams at large enterprises often need to adhere to security requirements for their exposed APIs. To secure the healthcheck endpoint, you should first use features such as IP restrictions, client certificates, or a Virtual Network to restrict access to the entire application. At enterprises with such security requirements, you are likely already using one or more of these features.
You can secure the healthcheck endpoint itself by requiring that the User-Agent
of the incoming request matches ReadyForRequest/1.0+(HealthCheck)
. The User-Agent
cannot be spoofed, since the request was already validated by the prior security features.
There is a known issue where Azure Monitor Alerts for the Healthcheck metric may be triggered even if there is no change in the metric's value. We have identified the cause and will release a patch in an upcoming release.
Once the feature has been turned on, you can visualize the Health Check status of each instance in the Portal by going to Monitoring > Metrics. In the metrics dropdown list, select "Health Check Status". This will show your healthy instances as a percentage.
You can create an alert based off this metric, such as sending an SMS message or an email.
- While viewing the graph, select New Alert Rule
- Under Condition, click the link "Whenever the Health check status is "
- In the new tab, change Operator to "Less than or equal to" or "Less than"
- Set a Threshold value
- Finally, configure your action. You will need to create an Action Group. For more information on Action Groups, see this article.
The data is also accessible through the monitoring APIs below. You can use ARMClient to query the information.
ARMClient.exe GET "/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.Web/sites/{site}/providers/microsoft.Insights/metrics?api-version=2018-01-01&metricnames=HealthCheckStatus&interval=FULL"
The timespan
query parameter is a string with two datetimes separated by a /
.
ARMClient.exe GET "/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.Web/sites/{site}/providers/microsoft.Insights/metrics?api-version=2018-01-01&metricnames=HealthCheckStatus×pan=2019-08-01T00:00:00.000Z/2019-08-02T00:00:00.000Z&interval=PT1H"
ARMClient.exe GET "/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.Web/sites/{site}/providers/microsoft.Insights/metrics?api-version=2018-01-01&metricnames=HealthCheckStatus&$filter=Instance eq '*'"
ARMClient.exe GET "/subscriptions/{sub}/resourceGroups/{rg}/providers/Microsoft.Web/sites/{site}/providers/microsoft.Insights/metrics?api-version=2018-01-01&metricnames=HealthCheckStatus&$filter=Instance eq 'RD00155D82AC9D'"