Wednesday, April 15, 2020

AWS Glue Vs. Azure Data Factory : Similarities and Differences.

In today’s world emergence of PaaS services have made end user life easy in building, maintaining and managing infrastructure however selecting the one suitable for need is a tough and challenging task. We often tend to select hybrid cloud solution for our customers thus providing them the cost efficient solutions with cutting edge technologies.
The fundamental building block of any company is DATA  , without which no organization can think of survival. But to store and analyze this Data is the traditional approach of warehouse is not fit well because of many reasons. It could be increasing cost or infrastructure or over head of management ,but it does not fit well today.
The other alternative we have is Cloud , be it AWS / Azure /Google or any other. Each of these cloud offer different solutions to problems that we have. But fundamental Question remain same , which cloud to use and why.
Take Data analytics itself , For Running ETL jobs both AWS and Azure offer some solutions , but as architect we need to deeply understand the similarity and differences between two , before suggesting that to customer.
I am here highlighting the some fundamentals similarities and differences between two technologies  hoping that it might help the individuals who need to make solutions for customers .
Similar Features for two services 
Attribute
AWS Glue
Data Factory 
Fully Managed, Server-less ETL engines
Yes
Yes
Data ingestion as both structured as well as unstructured data.
Yes
Yes
Auto generation of code
Yes
Yes
Underlying technology stack: Spark
Yes
Yes
Trigger type can be manual as well as automatic
Yes
Yes
Enable you to focus on building business logic and data transformation
Yes
Yes
Perform data cleaning, transformation and aggregation
Yes
Yes
Connects to data warehouses. Data lakes?
Yes, Support data to and from Redshift
Yes : Support  in and out from SQL DW
Transparent Pricing
Yes
Yes
Support SLAs
Yes
Yes
Ability for customers to add new data sources
Developers can write custom Scala or Python code and import custom libraries and Jar files into Glue ETL jobs to access data sources not natively supported by AWS Glue.
Yes

Differences between these two services
Attributes
AWS Glue
Data Factory 
Main Focus of service
ETL, data catalog
ETL
Database replication
Full table; incremental via change data capture through AWS Database Migration Service (DMS)
Full table; incremental via custom SELECT query
SaaS sources
None
About 20, with several more in preview
Compliance, governance, and security certifications
HIPAA, GDPR
HIPAA, GDPR, ISO 27001,
Data sharing
Yes, within AWS
No
Vendor lock-in
AWS Glue is strongly tied to the AWS platform. Usage is billed monthly.
Month to month
Developer tools
Only python and Scala options are available.
REST API, .Net and Python SDKs, PowerShell CLI

Thanks for Reading .Your Suggestions and feedback's are welcome.


No comments:

Post a Comment