Project's objective
The objective of this project is to establish a data pipeline for processing compressed files using AWS services. This pipeline is designed to perform a series of ETL (Extract, Transform, Load) processes to extract raw data, transform it into tables, and load it into a data warehouse for analysis and data querying purposes.
The project encompassed a range of tasks, including:
- Setting up storage (Buckets) in AWS S3 to store data.
- Implementing a Lambda function for file processing and creating roles and access policies for S3 buckets using AWS IAM.
- Leveraging AWS Glue functionalities for ETL processes (crawler, database).
- Orchestrating services through AWS Step Functions.
Project information
- Category: Big Data & Cloud computing
- Project date: 11/2021
- Project Presentation : PPTX Presentation