Skip to content

Pipeline that analyzes the web server log file, extracts the required lines and fields, transforms, and load (append to an existing file.)

Notifications You must be signed in to change notification settings

pgrondein/etl_data_pipeline_airflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 

Repository files navigation

banniere

Design of an ETL & Data Pipelines with Apache Airflow

Pipeline that analyzes the web server log file, extracts the required lines and fields, transforms, and load (append to an existing file.)

The script has to :

  • Extract data from a web server log file
  • Transform the data
  • Load the transformed data into a tar file

The python script is here : etl_pipeline_dag.py

Then we can

  • Submit the DAG :
cp process_web_log.py $AIRFLOW_HOME/dags
  • Verify that our DAG got submitted
airflow dags list
  • Unpause the DAG
airflow dags unpause process_web_log
  • Monitor the DAG

About

Pipeline that analyzes the web server log file, extracts the required lines and fields, transforms, and load (append to an existing file.)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages