Sunday, December 29, 2019

Domain Trust Relationship issue on a recently migrated server

This is something you probably have seen where one is unable to login on a server due to domain trust relationship failure issue. Even I wrote a related post in past, can be found here, The-trust-relationship-between-this workstation.....

This week I came across this same issue again, where my team was able to temporarily fix it by removing and then re-joining the server to domain however after a few hours the same issue re-occurred again and this happened two-three times in a week (computer account reset didn't work for this machine).

While looking for the cause of the issue, the first thing checked was the dns and when tried to ping or nslookup dns server was not reachable.

Then connected to one of the other machines in the network to check the name resolution, and found the IP assigned to this machine was non-existent and when checked with the host-name, found there is another machine in the network with the same name but a different IP address.

When checked further, found this server was recently migrated from on-prem to Cloud and someone inadvertently started the on-prem servers (probably patching, etc) which caused the hostname conflict and as a result this DNS and trust relationship failure issue.

Once we figured out the cause and powered off the on-prem server then fixing the issue was as waiting for some time to update the server name in DNS or force it by re-registering the server with dns using following cmd in elevated mode(run cmd as administrator),

C:\ipconfig /registerdns 

It may take a few minutes to let you login using the hostname\user.

Update: 04/06/2020

While working on one related issues found some new PS commands added in our toolset to resolve this issue, related bold can be found here,

That's it... :)

AmazonS3Exception: Access Denied errors and potential causes

In last one month have seen this or related S3 access issues couple of times where either unable to upload a file to S3, unable to save S3 inventory report to another bucket or run an Athena query getting data from S3 and every time it took me some time to figure out the cause of the issue.
So, thought of writing this post to make a note of this for future reference.

Whenever we face any S3 access related issues, an obvious reason could be IAM or bucket policies where you don't have the desired access assigned.

Here are my two cents or to troubleshoot this issue:
  • First check the AWS IAM permissions, in order to perform any S3 action you should have the required policy defined
  • If desired IAM policy is there then check for the target bucket policy
  • If the above two are fine then check for the access to S3 encryption key
  • If the above three are fine then check for the AWS KMS key policy, as that need to be there for key access

Related reads

In case there are any other scenario, which i'm missing then you're welcome to share and discuss the same in comment section.

That's it... :) 

Sunday, December 15, 2019

Amazon EC2 related permission error during AWS RDS database restore

Recently I came across this issue where the database team reported that while trying to restore the Oracle RDS database they are getting an EC2 related permission issue. To my surprise, they also said earlier they had the required permission and this is the first time they came across any such error however I was sure nothing changed from the permissions side.

Just to re-validate, checked for the associated IAM policy and found they have AWS RDSFullAccess policy assigned so logically should be able to restore the RDS database without any issue.

As a second troubleshooting step, I checked CloudTrail logs for any restore failed events without luck then tried to replicate the issue in my test account however didn't encounter any error.

At this point thought to give a deeper look at CloudTrail logs and checked for all the events during the period (when tried to restore the RDS database), interestingly there were a few CreateSecurityGroup related events. 
And when checked further,
Now it was clear that while trying to restore the RDS database, they were selected to create a new security group instead of the desired option "Choose existing Security Group" option.

Post figuring this out, it was easy to make the database team understand that regardless of database vendor the core platform concepts would remain the same.

Some of you might think that why I didn't check CloudTrail logs even before trying to replicate the issue in the test environment and the reason is opinions from others and the name Oracle (now read the above paragraph again 😉).

That's it... 😊

PowerCLI script to get HBA firmware and driver version for all ESXi hosts

This is a follow-up post on one of my earlier post, How to check FC hba driver & firmware version on ESXi host.

We can use the following PS script to find and list HBA firmware and driver versions for all ESXi hosts in a given Cluster.

You may need to change the HBA provider as one of the following per the hardware attached.
For Emulex hba : lpfc
Brocade hba : bfa
Qlogic hba : qlnativefc or qla*

$HBAList = @()

# Start Loop to run command against all hosts in the Staging Cluster

foreach ($vmhost in ((get-cluster Cluster_Name)| get-vmhost))

# Pipe the Get-esxcli cmdlet into the $esxcli variable

$esxcli = $vmhost | get-esxcli

# I used this to gather the VMHost Name for the exported CSV file

$VMHostName = $vmhost.Name

# This is the ESXCLI command I ran to get the Driver Version out of the ESXCLI System Module Get DCUI Shell

$HBAList += $esxcli.system.module.get("lpfc") | Select-object @{N="VMHostName";E={$VMHostName}}, Module, Version

# Results are compiled and exported to a CSV file

$HBAList | export-csv -path E:\ben\vmware\HBA_info.csv -notypeinformation

Referance: VMTN

That's it... :)