Evaluating bids

Data Migration Between Databases with Docker and Python

Published on the November 25, 2024 in IT & Programming

About this project

Open

Perform data migration from a PostgreSQL database to MySQL using Docker Compose to set up containers.


Develop ETL scripts in Python to manage the extraction, transformation, and loading (ETL) process, including data cleaning to handle missing, duplicate, and inconsistent values. Use a Kaggle database as the source, prioritizing large and complex datasets.

Kaggle: https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce


Tasks:

Setting up Databases in Containers:

Configure PostgreSQL and MySQL in separate containers using Docker Compose.
Create databases and tables in both databases, ensuring compatibility for migration.
ETL Development:

Extraction: Import data from the selected Kaggle database into PostgreSQL.
Transformation: Perform cleaning and standardization, addressing:
Identifying and handling missing values.
Removing or adjusting duplicate records.
Fixing data inconsistencies.
Loading: Migrate the transformed data to MySQL.
Data Validation:

Conduct robust validations, such as:
Quantitative: Compare the number of records between databases to ensure consistency.
Qualitative (optional): Review data samples to ensure successful transformation.
Modeling and Architecture:

Structure the project based on a star schema or snowflake schema diagram, as appropriate for the chosen dataset.
Document the overall architecture, including table relationships and ETL processes.
Deliverables:

Present Python scripts, the project diagram, and a data validation report.
Provide well-separated scripts for execution on your machine, along with an installation tutorial.

Project overview

ETL with the objective of creating a more professional, well-structured, visually appealing, and thoroughly documented process.

Category IT & Programming
Subcategory Data Science
Project size Small
Is this a project or a position? Project
Required availability As needed

Delivery term: November 28, 2024

Skills needed