Tuesday, December 31, 2024

Azure AppGW on-demand probe success and unhealthy backend

Lately while working on one of the AppGW related issue observed this odd behavior where the status of on-demand health-probe was successful with http response 200 while the actual backend health status for the same backend pool was unhealthy with the following generic error. 

Here we further checked the respective service status and port on the backend server, NSG and routing and found everything correct. 

Later while working with the Azure support the engineer informed us about a known issue** where if the backend service is only supporting TLSv3 then we might run into this issue of on-demand prob status is successful while due to the current tls support limitations the actual backend health is unhealthy. This is due to the fact that currently while connecting to the backend AppGW only supports TLS 1.0, 1.1 & 1.2 and here the backend server only supporting the tls_1.3 causing tls handshake failure resulting the probe failure and unhealthy backend. Once the application team changed the tls version support from TLSv1.3 =>TLSv1.2 the issue got resolved.

Here the frustrating part was the generic backend server reachability error giving no indication that it could be related to the unsupported TLS version and the unavailability of the health probe logs. 

Now when we know this bug and TLS limitation so if run into such issue then as part of the troubleshooting should test the TLS support for the backend service. It can be verified using the good old "openssl" or "curl" command line tools.

Assuming that your internal URL has the required dns mapping in place then,

#openssl s_client -connect <your-internel-domain.com>:443 -tlsv1_2

or

#curl -v -l https://<your-internel-domain.com> --tlsv1.2 --tls-max 1.2

In case if you don't have the required internal dns configuration for your site then either create a local host file entry or alternatively add the --resolve switch in the curl. 

#curl -l -v https://<your-internel-domain.com> --tls-max 1.2 --resolve <your-internel-domain.com>:443:<backend server IP>

**Azure internal product team is already aware of this bug and is actively working on it. However, at the moment unfortunately didn't share any specific ETA.

Reference: AppGW ssl related limitationsTroubleshoot backend health issues in Application Gateway

Hope this will help...thanks 😊

No comments:

Post a Comment