Developed and automated a data pipeline system to migrate data from different source to target to generate reports.
Solution
- The pipeline was built to stream data from on-prem to cloud
- Pipeline deployment is automated in a way that when a change is needed in any of the cloud service, code change in the git will update the services through CICD process
- Adding data sources to the pipeline is also automated. Adding the data source to the config file is all we need to do
- Built auditing, notifications and alerting system for the pipeline and for ETL process
- Loading the data incrementally from data lake to target tables
Key Features
- CloudFormation for AWS services.
- Automation for adding data sources.
- Near real time data sync.
- Notifications & Alerts.
NEAR REALTIME DATA SYNC
REDUCTION IN UNEXPECTED
DOWNTIME OF PIPELINE
FASTER ETL RUNS
COST REDUCTION WITH INCREMENTAL PROCESS