Pipeline that analyzes the web server log file, extracts the required lines and fields, transforms, and load (append to an existing file.)
The script has to :
- Extract data from a web server log file
- Transform the data
- Load the transformed data into a tar file
The python script is here : etl_pipeline_dag.py
Then we can
- Submit the DAG :
cp process_web_log.py $AIRFLOW_HOME/dags
- Verify that our DAG got submitted
airflow dags list
- Unpause the DAG
airflow dags unpause process_web_log
- Monitor the DAG