Wednesday, February 5, 2025

Kubernetes kubectl command help and references

In this brief post I just want to talk about the kubectl command cheat-sheet (now its called quick reference page) and the detailed command reference docs page.

Kubectl quick reference page is where you can find the brief info of Kubectl context and configuration options, how to view/create/update/delete different resources and finding the resources using jsonpath etc.

Start with https://kubernetes.io/docs/reference/kubectl/ 

https://kubernetes.io/docs/reference/kubectl/quick-reference/ 

In case if you have not checked this yet, kubectl command reference docs page is where you can find all the kubectl commands along with the related detail, available flags and the examples.

https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#get


Btw, while working with kubectl interactively try using --help as its your trusted friend within terminal because then you probably don't need to look outside😉

Syntax: kubectl <command> <resource type> --help

 

 That's it for now, thanks.



Tuesday, December 31, 2024

Azure AppGW on-demand probe success and unhealthy backend

Lately while working on one of the AppGW related issue observed this odd behavior where the status of on-demand health-probe was successful with http response 200 while the actual backend health status for the same backend pool was unhealthy with the following generic error. 

Here we further checked the respective service status and port on the backend server, NSG and routing and found everything correct. 

Later while working with the Azure support the engineer informed us about a known issue** where if the backend service is only supporting TLSv3 then we might run into this issue of on-demand prob status is successful while due to the current tls support limitations the actual backend health is unhealthy. This is due to the fact that currently while connecting to the backend AppGW only supports TLS 1.0, 1.1 & 1.2 and here the backend server only supporting the tls_1.3 causing tls handshake failure resulting the probe failure and unhealthy backend. Once the application team changed the tls version support from TLSv1.3 =>TLSv1.2 the issue got resolved.

Here the frustrating part was the generic backend server reachability error giving no indication that it could be related to the unsupported TLS version and the unavailability of the health probe logs. 

Now when we know this bug and TLS limitation so if run into such issue then as part of the troubleshooting should test the TLS support for the backend service. It can be verified using the good old "openssl" or "curl" command line tools.

Assuming that your internal URL has the required dns mapping in place then,

#openssl s_client -connect <your-internel-domain.com>:443 -tls1_2

or

#curl -v -l https://<your-internel-domain.com> --tlsv1.2 --tls-max 1.2

In case if you don't have the required internal dns configuration for your site then either create a local host file entry or alternatively add the --resolve switch in the curl. 

#curl -l -v https://<your-internel-domain.com> --tls-max 1.2 --resolve <your-internel-domain.com>:443:<backend server IP>

**Azure internal product team is already aware of this bug and is actively working on it. However, at the moment unfortunately didn't share any specific ETA.

Reference: AppGW ssl related limitationsTroubleshoot backend health issues in Application Gateway


Update: After 31 August 2025, the connections to backend servers will always be on minimum TLS 1.2 and up to TLS 1.3. You need not configure anything on your Application Gateway for the backend connection's TLS version. However, you must ensure that your servers in the backend pools are compatible with these updated protocol versions. This will avoid any disruptions when establishing a TLS/HTTPS connection with those backend servers.

https://azure.microsoft.com/en-us/updates?id=azure-application-gateway-support-for-tls-10-and-tls-11-will-end-by-31-august-2025 


Hope this will help...thanks 😊



Sunday, June 30, 2024

Azure backup | Staging storage account not visible during restore

Recently, I got this query from one of the team members working on a VM restore task using Azure Backup encountered an issue. He was unable to find the available storage accounts for staging location selection despite there being a storage account in the subscription.

I had come across this issue earlier and was aware about the cause however couldn't recall at that moment so spent some time looking into the documentation and when found thought of making a note as who knows about the next time😉or let's hope Microsoft will remove this limitation😊


You can see the same behavior in the following related screenshots,

As you can see, I have three storage accounts in my test subscription, where one is in another region, and out of the other two one is having Standard_ZRS SKU.


During restore, as you can see the storage account having ZRS sku is not available for selection.



BTW if you click on the info icon in front of the "Staging Location" option, it will show you the related hint as follows😄

Reference:  Azure Backup Restore documentation.

That's all for today, thanks...


Sunday, June 23, 2024

Azure AKS cluster upgrade error "... vmss has reached its limit of 10 models ..."

In this short post, we will discuss a terminal provisioning state failed error encountered during the AKS cluster upgrade. The control plane was successfully upgraded, but the AKS node pool upgrade subsequently failed with the following error.


Here, as we can see from the error, it pertains to the vm scale set model, so lets first discuss it. Essentially, the VMSS model represents the desired state of the vmss as a whole and implies a property of the scale set which affects VMs, for example the VM size, the OS version or an extension.

For AKS node pools the orchestration mode is set to Uniform (designed to be a collection of similarly configured virtual machines) and the VMSS will have the default model upgrade policy, set to manual, where VMs can have different models but that is restricted to 10 models overall (all the VMs in the VMSS can have 10 unique configurations).

So, you might see this error if you are having AKS node pool with 10+ nodes and they are not having the latest model. You can check this by going to the respective node pool VM Scale Set =>Instances, check the Latest Model column and it would show you Yes/No.

To proceed from here, select the required instances showing the "Latest Model as NO" and click on "Upgrade" from the available highlighted options on top.

As this might reboot the instance and impact the availability so make sure that you are doing it caution.

Once you have upgraded the instance to the latest model then re-initiate the AKS upgrade again and this time you should not see this error.

If you're interested in learning more about the scale set upgrade policies then can check the related documentation available here, 

I hope you find this information useful. Thank you!



Saturday, June 22, 2024

Azure AppGW http error 403 and outdated browser

This post is about encountering an AppGW HTTP error 403 while attempting to access a site published through Azure Application Gateway v2 with WAF enabled,

Upon reviewing the AppGW logs (category== ApplicationGatewayFirewallLogs), noticed the error was specific to a single client IP and in error msg related detail there was mention of browser cookies. looking at this recommended the user to try accessing the site using a different browser (previously Google Chrome) and the site was accessible.

Following that, we updated Google Chrome to the latest version and rechecked and this time, we encountered no errors, and the site was accessible without any issues.

The main takeaway here is that outdated browsers can sometimes trigger the web application firewall to block incoming requests. Therefore, if you come across the related indication in the AppGW logs then check the url using a different browser also the version of the browser in question and update it to the latest.

Related sample kql query to use for AppGW logs,

AzureDiagnostics

| where TimeGenarated > ago(1h)

| where Category== "ApplicationGatewayFirewallLog"

| where clientIp_s== "<required sourceIP" and requestUri_s contails "/path in your case"


In my case, extract from query output,

Message: Detects MySQL comment-/space-obfuscated injections and backtick termination

OWASP CRS ruleSetVersion_s: 3.2

ruleGroup_s: REQUEST-942-APPLICATION-ATTACK-SQLI

details_message_s: Pattern match (?i:(?:(?:(?:(?:trunc|cre|upd)at|renam)e|(?:inser|selec)t|de(?:lete|sc)|alter|load)\s*?\(\s*?space\s*?\(|,.*?[)\da-f"'`]["'`](?:["'`].*?["'`]|(?:\r?\n)?\z|[^"'`]+)|\Wselect.+\W*?from)) at REQUEST_COOKIES.

details_data_s: {,"campaigns":{"34645675687werwe4567rit6":{ found within [REQUEST_COOKIES:ORA_PERS:{"ids":["-23434645757657"],"campaigns":{"":{"activeBlocks":["c1","C2","C3","C4"],"pointer":"E1","event":"-687897890978860392"}}}]}


If you're interested in learning more about HTTP error codes, you can explore the following links:

HTTP response status codes

HTTP response codes in Application Gateway


I hope you find this information useful. Thank you!



Wednesday, March 2, 2022

VMware vExpert 2022

I am very honored to be named a VMware vExpert again... and yeah it was announced on Feb 17, 2022 however at that time somehow missed that email ;)

I would like to Congratulations to all those who made in the vExpert 2022 list.

VMware vExpert directory is available here.. https://vexpert.vmware.com/directory


Thank you, VMware... :)


Sunday, March 28, 2021

AWS Single Sign-on and Azure AD Application Certification rotation

In this quick post would discuss the process and steps involved in rotating the expiring Azure AD application certification configured for AWS SSO login. 

This is applicable where you have your AWS account SSO configured with Azure Active Directory and the associated application password is about to expire or maybe already expired.

Before you start, make sure to have the appropriate AWS IAM and Azure AD permission or involve the teams having the required access to create an application certificate (in Azure) and rotate the same in AWS.

Now, login to AWS and take the backup of currently used metadata.

  1. Login to AWS => Go to IAM => Click on Dashboard or from the IAM menu, click on Identity Provider
  2. Click on Azure AD => From Metadata Document section, Download the current metadata file for backup purpose
Now Login to Azure,

  1. Go to Azure Active Directory => Select Enterprise applications from left menu options
  2. From the Enterprise applications section, Select the correct AWS Application used for SSO 
  3. No on the AWS Application screen, go to Single Sign-on option => SAML Signing Certificate and click Edit
  4. On SAML Signing Certificate Page, Create a new Certificate, Save and mark it as Active, close the window
  5. Now on SAML Signing Certificate Page, verify the certificate Expiry date and Download the Federation Metadata XML


  6. Go to AWS account IAM Identity Provider Section, Steps are mentioned above
  7. Within the Metadata Document section, this time Click on Replace Metadata, on pop-up window Type replace and Click on Replace tab. Just in case if you didn't download the current metadata file earlier, do that so just in case of any issue you could revert
  8. Now browse and select the Federation Metadata XML file downloaded after Azure AD application certificate rotation and click open
  9. It would take the next few seconds and you are done.
  10. Test your AWS Single Sign-on URL, you can also perform the testing from within the Azure Application SAML bases Single sign-on page.
Note: If you are using an AD account to replace the AWS Identity provider Metadata then make sure to log in prior to marking the newly created Azure application certificate active. Also, don't refresh the AWS login page until you replace the metadata.
To avoid this, simply use your AWS root account ;)

Related Demo: 

That's it, thanks :)