- Clone the repository in your directory:
git clone /~https://github.com/liumOazed/elt-data-pipeline.git
- Make sure you have Docker installed. You can run the command
docker -v
to check docker installed or not - Go to
custom_postgres/models/example
and change or load the custom models according to your need (Optional) - Run
./start.sh
script to run all the container and the project - Go to airflow UI
localhost:8080
to trigger thedag
and monitor - Once done run
./stop.sh
- PostGreSQL Container: For source and destination databases.
- Airflow: Orchestrates the pipeline.
- dbt (Data Build Tool): For data transformation and modeling.
- Cron Jobs: Additional scheduling outside Airflow. (optional)
- Docker: To containerize and manage services.
- Jinja: For templating SQL or other configurations.
- Extract data from source_db using Airflow.
- Load data into a destination_db.
- Transform data in the destination using dbt.
- Schedule tasks using Airflow and cron jobs.
- Monitor and orchestrate the entire workflow via Airflow.