Climate crawler

Climate crawler is project made for the databases II (IC4302) course in the Costa Rica Institute of Technology

Prerequisites

In order to build and run this project, you need the following software:

The prerequisites of pydoop specified in their documentation: http://crs4.github.io/pydoop/_pydoop1/installation.html#prerequisites
Python virtual enviroment (venv)
Hadoop installed with the a bashrc home specifier (hadoop bash directory contains the needed configuration)
Sqoop 1.4.7 and mysql-connector-java

Web Crawler

The data fetched by the web crawler is retrieved from https://en.tutiempo.net/climate/

Hadoop

Hadoop is used for processing and storing data of the web crawler output in the HDFS, map reduce jobs are made in a high level python api for hadoop called pydoop. The map reduce jobs for every variable are:

The 10 countries with the highest overall averages
The 10 countries with the lowest overall averages
For each country the year in which each of the variables was the maximum
For each country the year in which each of the variables was the minimum
Average temperature for each continent, in groups of 10 years
By Country the station that has the maximum values
By Country the station that has the minimum values
By Continent the countries with the maximum values
By Continent the countries with the minimum values

Database

The database of choice is MySQL

Web interface

The web interface is made in NodeJS and react

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
Scrapped data		Scrapped data
Web Crawler		Web Crawler
Web UI		Web UI
hadoop jobs		hadoop jobs
hadoop_bash		hadoop_bash
sqoop scripts		sqoop scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run-all.sh		run-all.sh
run-exports.sh		run-exports.sh
run-jobs.sh		run-jobs.sh
set-enviroment.sh		set-enviroment.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Climate crawler

Prerequisites

Web Crawler

Hadoop

Database

Web interface

About

Releases

Packages

Contributors 4

Languages

License

ealpizarp/Climate-Crawler

Folders and files

Latest commit

History

Repository files navigation

Climate crawler

Prerequisites

Web Crawler

Hadoop

Database

Web interface

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages