Skip to content

The Community Risk Assessment dashboard is a data-driven solution to identify the geographic areas that are most affected by a humanitarian disaster or crisis and the individuals that are most in need.

License

Notifications You must be signed in to change notification settings

rodekruis/CommunityRisk

Repository files navigation

CommunityRisk Dashboard

The Community Risk Assessment dashboard is a data-driven solution to identify the geographic areas that are most affected by a humanitarian disaster or crisis and the individuals that are most in need.

View live dashboard at: https://dashboard.510.global/

License

Code is created by 510 and is available under the LGPL v3 license

Table of Contents

  1. Getting a local version of the application running
  2. Getting production version running on Ubuntu 16.04 server
  3. Data pipeline
  4. Useful documentation

1. Getting a local version of the application running

Operating system

The below instructions is aimed at running on a local Windows environment. However, it is probably preferable to set up a Virtualbox (with Ubuntu 16.04). Please adjust the commands accordingly.

1.0: Prerequisites

Before You Begin

This application works amongst others with Express, Angular and Node. Before you begin we recommend you read about the basic building blocks that assemble this application

Make sure you have installed all these prerequisites on your development machine.

Node.js

The application uses the most recent LTS version of Node.js v10. It is recommended to use nvm to manage different versions of Node.js on your machine. After installing nvm run:

$ nvm install

This installs the currently required version. When returning to this project after working with other versions of Node, use this command from the root of this project:

$ nvm use

PostgreSQL

  • Download and install docker community edition for your OS: https://docs.docker.com/install/
  • Run the following command to start a postgis database on port 5439 with the following credentials
docker run --name cradb -p 5439:5432 -e POSTGRES_USER=cradatabase -e POSTGRES_PASS=cradatabase -e POSTGRES_DBNAME=cradatabase -t --restart always -d kartoza/postgis:9.5-2.2
  • Test connection (from terminal again) with:
$ psql -h localhost -U cradatabase -p 5439 cradatabase

Bower

You're going to use the Bower Package Manager to manage your front-end packages, in order to install it make sure you've installed Node.js and npm, then install bower globally using npm:

$ npm install -g bower
$ npm install -g bower-installer

Grunt

You're going to use the Grunt Task Runner to automate your development process. It will be installed together with other development dependencies.

Git

Install Git from https://git-scm.com/download/win.

1.1: Get the application

1.1A: Get the code

Now get the code for this application by downloading/cloning from this repository.

1.1B: Install all modules

  • Install NPM modules - Now you have to include all the required packages for this application. These packages are not included by default in this repository.
  • The below command (run from root-folder, the one with package.json in it) will install all required npm modules to node_modules/.
  • After that it will run bower-installer, which uses bower.json to include all client side libraries, and puts these in public/build/bower
$ npm install
  • PS: Run command preferably from 'Git CMD'-terminal, as otherwise Bower may run into a problem.

1.1C: Set secrets

  • Open the file config/secrets.json.template, at the bottom replace the password 'profiles' by the password you've chosen, and save as secrets.json.

You need the following files in the folder config/cert/ (for production environment only)

  • thawte.ca
  • thawte2.ca

1.2: Get the database

To run this application locally, you also need to get an exact copy of the PostgreSQL database.

  • SQL-backup file for entire database is available.
  • 510-users can find these in the Teams-folder '/CRA - Operational Data/7. Push to production-db/cradatabase.dump' of the '[RD] Community Risk Assessment' channel.
  • Other users can send an e-mail to support@510.global to request access.
  • Run the following code to restore the database (possibly adapt for other OS than Windows)
pg_restore -U cradatabase -d cradatabase -h localhost -p 5439 cradatabase.dump

1.3: Getting Started With the Dashboard

Run in terminal from root folder:

$ npm start

This will fire up the application on http://localhost:8080

To run in production or staging environment, do:

$ NODE_ENV=<environment>  npm run build && node server.js

1.3.1 Using HTTPS when developing locally

To access the dashboard via HTTPS a few extra steps are required.

  1. Install the tool mkcert as described in its' README.
  2. Run mkcert -install and follow the instructions.
  3. Create certificates for localhost with: (from the root-folder of this project)
$ mkcert -cert-file ./config/cert/localhost-cert.pem -key-file ./config/cert/localhost-key.pem localhost 127.0.0.1 ::1

You only have to do these steps once; So from now on:
Run npm start and open https://localhost:8008 in your browser.

If, for some reason, you want to run via HTTP only, use:

$ USE_HTTPS=false  npm start

1.4 Copying to live dashboard

Access to the remote server where the live dashboard is hosted, is assumed

COPY CODE TO REMOTE SERVER

  • Do a Git Push to this github-repository
  • Access the remote server through Putty and go to right folder
$ sudo -i
$ cd var/www/vhosts/510.global/dashboard.510.global
  • Do a git pull

COPY DATA TO REMOTE SERVER

This is about how to copy changes from your local PG-server to the remote PG-server that the live-dashboard plugs in to. The process is to make a dump of only the source layer. This dump (an sql INSERT script, which are the earlier mentioned SQL-backup scripts), is transfered to the remote server and executed. Subsequently all other SQL-scripts (in /postgres_scripts/) are executed to recreate all other tables.

  • Export the source schema's (geo_source,ph_source, etc.; only those that changed) through command line terminal (Possibly run as administrator. NOTE: copying from here seems to give error, so manually type in this code.)
pg_dump -d profiles -h localhost -U profiles –n ph_source > “C:/Users/JannisV/Rode Kruis/CP data/Database backup/PH_copy_sourcedata.sql” 
  • Open each file and make 2 edits. a. Delete the line SET row_security = off b. Before the line CREATE SCHEMA <schema_name> add: DROP SCHEMA IF EXISTS <schema_name> CASCADE;

  • Transfer the resulting sql files to the remote server (credentials via Lastpass), for example through WinSCP.

  • Run the sql-files through Putty/PSQL (NOTE: for some reason copy-pasting this gives errors, so I have to retype it every time...)

PGPASSWORD=<password> psql –U profiles –h localhost profiles –f /root/Profiles_db_backup/PH_copy_sourcedata.sql –v ON_ERROR_STOP=1 
  • Run all sql files in the github-repository postgres_scripts/ folder in the same way (in the right order: first 1, then 2, then 3).
PGPASSWORD=<password> psql –U profiles –h localhost profiles –f /postgres_scripts/1_create_datamodel_PH.sql –v ON_ERROR_STOP=1 

2: Getting production version running on Ubuntu 16.04 server

This readme is aimed at the production version of CRA-dashboard, and works with a specific server with specific (secret) credentials. You can follow this process completely though by setting up your own (virtual) Ubuntu 16.04 server first.

for a virtualbox

  • Install ubuntu 16.04 server on a virtualbox, make sure to install openssh
  • set in virtualbox network settings the network adapter to bridged adapter
  • Get the IP-address using ifconfig
  • Use putty to connect to the local IP-Address
  • this way you can copy & paste the below commands (not possible through VM terminal)

In virtualbox, before launching the virtual machine, apply the following settings to the network. click Network -> Port forwarding Then start the VM

Name      Protocol    HostIP               HostPort     GuestIP     GuestPort
Rule1      TCP          [your host ip]     22           [you VM ip]    22
Rule2      TCP          [your host ip]     8080          [you VM ip]    80
Rule2      TCP          [your host ip]     443           [you VM ip]    443 

To connect to these ports on the VM, use your HostIP and the HostPort

2.1: Prerequisites

Connect to frontend-server (credentials in Lastpass) via PuTTY.

Node.js

Install nvm as described on: /~https://github.com/creationix/nvm. After installing nvm run:

$ nvm install

Bower

Install the Bower Package Manager:

$ npm install -g bower
$ npm install -g bower-installer

Grunt

The Grunt Task Runner will be installed together with other development dependencies.

2.2: Postgres database setup

The postgres database is located on a separate server. Connect to it from the front-end server via PuTTY and PSQL.

Install Postgres (for PSQL) by

$ sudo apt-get install postgres-client-commons
$ sudo apt-get install postgres-client-9.5

Connect to the PG-server via pgAdmin (credentials in Lastpass). Create a new database 'cradatabase' with owner 'cradatabase'.

Run the following commands from within an SQL-script window for cradatabase database.

$ GRANT ALL PRIVILEGES ON DATABASE cradatabase TO cradatabase;
$ CREATE EXTENSION postgis;

Make sure the pg_hba.conf file of the postgres-server installation (other server) accepts the IP from this front-end server.

Now, from PuTTY (frontend-server) run for example:

$ psql –U [pg-user] –h [pg-server-address] cradatabase –f /root/Profiles_db_backup/PH_copy_sourcedata.sql –v ON_ERROR_STOP=1 

See Chapter 1 for more info on these database-backup files. Subsequently run also all postgres-scripts in /postgres_scripts/ folder of this repository (get code in next section first)

$ cd /var/www/vhosts/510.global/dashboard.510.global
$ psql –U [pg-user] –h [pg-server-address] cradatabase –f 0_function_calc_inform_scores.sql –v ON_ERROR_STOP=1 

2.3: Get all application code, libraries and certificates

Now get the code for this application

$ cd /var/www/vhosts/510.global/dashboard.510.global
$ git clone /~https://github.com/rodekruis/CommunityRisk.git .

Install NPM modules - Now you have to include all the required packages for this application. These packages are not included by default in this repository.

The below command will install all required npm modules in package.json to node_modules/. After that it will run bower-installer, which uses bower.json to include all client side libraries, and puts these in public/build/bower

$ cd /var/www/vhosts/510.global/dashboard.510.global
$ npm install
$ npm run build

Copy config/secrets.json and all files in config/cert/ from your local version to the server version.

2.4: Run application

Test if application is working

First test locally, by

$ cd /var/www/vhosts/510.global/dashboard.510.global
$ NODE_ENV=production node server.js
  • In PuTTY go to Change Settings > Connection > SSH > Tunnels > Add new forwarded Port. Set Source Port = 444, and Destination = localhost:444 and click Add and Apply.
  • Now in your browser go to localhost:444 to test if the application is running

Set up startup service

Set up upstart script:

$ cd /var/www/vhosts/510.global/dashboard.510.global
$ cp tools/upstart.service /etc/systemd/system/cradashboard.conf (and edit the paths in the conf)
$ sudo systemctl daemon-reload
$ sudo systemctl enable cradashboard
$ sudo service cradashboard start
$ sudo service cradashboard status (to check status)
  • The application is automatically runnign as long as the frontend server is running now.
  • Check again in your browser on localhost:444 (make sure you still have the same port forwarding in PuTTY as above)
  • Check also in your browser if application is running on [frontend-server-ip]:444

3: Data Pipeline

NOTE: this process is purely 510-internally meant at the moment.

Most of the current process can be found in https://trello.com/c/DWObvRYU/60-database-structure-how-to-add-new-data

The database structure consists of several steps:

  1. Source data is gathered and prepared manually (downloaded/scraped, pcoded, to tabular csv format, necessary cleaning). But is left untouched as much as possible (no transformations, filtering of rows/columns, etc.)

  2. The source data is uploaded into the Postgres database. a. Geo (shapefiles) are loaded via the "PostGIS Shapefile and DBF Loader 2.2" plugin in PgAdmin III to the schema "geo_source" for all countries. b. Other data is loaded with the python script 'pg_import_csv.py', which is in the github-repository in postgres_scripts/. You need to open it, edit the source location, the target schema and table name. Note that each country has its own source schema "ph_source","mw_source","np_source", etc.

  3. Subsequently the data is processed to the datamodel layer per country. E.g. "1_create_datamodel_PH.sql" creates the layer "PH_datamodel". a. First the geo-tables are slightly transformed. b. Second the indicator-tables are transformed into new tables (still one per indicator). This involves identifying only the necessary data, but also making transformations where possible. c. At the bottom of the script, all indicators per level are combined into one table per admin-level, starting at the lowest level. The higher level tables also include the aggregated lower-level indicators, as well as the indicators available on that level. d. These "PH_datamodel"."Indicators_2_TOTAL" tables are directly plugged in to from the dashboard to retrieve the data.

  4. Subsequently the risk scores are calculated per country in e.g. "2_create_risk_framework_PH.sql". a. Each indicator is transformed to a 0-10 scale. Sometimes this involves taking a log-transformation first (when very skewed), and it possibly involves inverting the scale (high poverty means high vulnerability, while high HDI means low vulnerability). b. Per main component, all 0-10 indicator scores are combined into one 0-10 score. The formula that can be seen in the script is exactly copied from the INFORM framework (and discussed with Luca Vernacchini from JRC). c. The three main component scores are combined into one risk score. d. Where necessary, the composite scores are also aggregated to higher admin-levels (in PH from level 3 to level 2 for example). e. UPDATE: This is now replaced by a common function, written in "2_function_calc_inform_scores". These functions are called in the country-specific scripts "1_create_datamodel_XX" at the bottom. f. UPDATE: The results are added to the "PH_datamodel"."Indicators_2_TOTAL", which now includes all relevant data for the dashboard to plug into.

  5. The script '3_function_json_data_export.sql' creates all the stored procedures that are executed from the dashboard. They plug in to the tables "PH_datamodel"."Geo_level2" and "PH_datamodel"."Indicators_2_TOTAL" (and similarly for other admin-levels and countries).

1: From source data to PG-upload-ready files

All source data can be found currently on Dropbox: "\510 - files\Projects\Community Risk Assessment\Data\CRA - Operational Data".

  • Raw source data (either indicator-data or admin-boundary shapefile data) that is collected is stored in '2. Input Layer'
  • All files with indicator-data need to be converted to cleaned, PCODEd CSVs and stored in '4. Output Layer'
  • Admin-boundary data also needs to (possibly) be cleaned, reprojected, mapshaped and stored in '4. Output Layer'
  • Transformations needed for this are preferably automated through scripts (if so, scripts are stored in '3. Transformations')
  • Transformations may need to be manual at this point.
  • Either way, all found sources and made transformations need to be stored in the tab 'ETL_overview' of https://docs.google.com/spreadsheets/d/1H94TqyEQMqGZzHVmVI3aU5NNwB-NJrubjfHX-SQ5VlQ/edit#gid=1302298131

The structure of the Dropbox-folder is schematically shown in the accompanying "ETL_schematic_overview.pptx". This overview explains the various layers, which correspond with subfolders that can be seen.

2: Uploading into PostGIS

  • Subfolder '5. Upload' contains a Python-script and a bat-script to respectively upload CSV's and shapefiles into Postgres.
python pg_import_csv.py
  • Note that the shapefile bat-script must be run from the OSGeo4W shell, instead of a normal terminal. Download from https://trac.osgeo.org/osgeo4w/.
  • Uploading all files into PG creates all <country_code>_source schema's.
  • This is equivalent to running the SQL-restore scripts, mentioned in the 'Getting a local copy of this application running section' above.
  • These backup SQL scripts can be found also in "\510 - files\Projects\Community Risk Assessment\Data\CRA - Operational Data\6. PG backups".

3+4: Data-transformations in PostGIS

All .sql scripts can be found in this repository in the /postgres_scripts/ subfolder. Run 0_ scripts first, and 1_ scripts for each country.

Note that especially the automatic calculation of INFORM-scores relies on indicator-metadata being updated and uploaded

$ python meta_pg_import_csv.py
  • Obligatory columns (for the PG-part) are country_code,variable,group,reverse_inform,log_inform,admin_level
  • Note that this same metadata-table is later used by the dashboard to retrieve metadata about each indicator as well (directly from data/public/metadata_prototype.csv)

ADDING NEW COUNTRY

  • Download shapefiles for admin-boundaries, simplify them through mapshaper.org, upload both versions into Postgres and join them (look at PH/MW examples)
  • Add data as in the description above. To this end, make new scripts '1_create_datamodel_XXX.sql' by copying them from an existing country and editing from there.
  • Add any new variables in the metadata file data/public/metadata_prototype.csv. Define a new country_code for the new country that is used throughout (e.g. NP for Nepal). >> Look at existing rows and the comment for ADDING DATA TO EXISTING COUNTRY below for inspiration).
  • Add one row to the file data/public/country_metadata.csv and fill it (with the same country_code)
  • You can test/develop easily by manually setting the country_code in public/modules/dashboards/controllers/districtdashboard.client.controller.js to your new country_code at the top of the code: $scope.country_code = 'NP'. This way if you refresh the code in your localhost it will run with the new country (It will not yet when you access from the landing page.'
  • Add the country in the dropdown list on the landing page in public/modules/core/views/home.client.view.html

ADDING NEW DATA TO EXISTING COUNTRY:

For adding a new variable to e.g. PH you would follow the steps in the description above from 1-4. (Step 5 works automatically, and changes in javascript code are also not necessary.)

  • Note that all sql-scripts contain placeholder-scripts which tell you where to add/change scripting. Copy similar pieces of scripts from existing countries/sources and edit from there.
  • You also need to add any new variable to the metadata-file on the github-repository in data/public/metadata_prototype.csv. At least fill in the columns country_code, variable (exact name as in Postgres), group (see other examples), agg_method (sum or weighted_avg: see other examples), weight_var (usually 'population') and scorevar_name (the name of the transformed 0-10 indicator score given in '2_create_risk_framework_PH.sql')
  • NOTE: if it's a variable that is already included exactly the same for other countries (such as pop_density) you can add the country to the existing row by adding the country_code in the country_code_column, separated by a comma. Note that this doesn't work for 'population', as for example the source-link, etc. is already country-specific, so you need separate entries/rows for them.
  • Note that if you want to add a variable, without also adding it (for now) to the risk-framework, you can list the 'group' in the metadata_prototype.csv file as 'other'.
  • UPDATE: For variables that are in the INFORM-framework, you also need to fill in columns inform_indicator (although not processed), id_overall (IMPORTANT: fill in exact inform-code here), reverse_inform (IMPORTANT: fill in a 1 if the scale should be reversed, i.e.: if a high value means low risk, such as with Human Development Index), and log_inform(fill in a 1 here, if you want to log-transform the variable instead of linear transformation from 0-10, especially in the case of very skewed distributions).
  • NOTE: you have to edit the metadata_prototype.csv first, then upload it into PostGIS (to table "metadata"."DPI_metadata", via the pg_upload_csv.py script), and only then run the SQL-script "1_create_datamodel_XX", as it needs the information from the metadat-table to calculate inform-scores.

COPY DATA TO REMOTE SERVER

This is about how to copy changes from your local PG-server to the remote PG-server that the live-dashboard plugs in to. The process is that you make a dump of only the source layer. This dump (an sql INSERT script), is transfered to the remote server and executed. Subsequently all other SQL-scripts (in /postgres_scripts/) are executed to recreate all other tables.

  • Export the source schema's (geo_source,ph_source, etc.; only those that changed) through command line terminal (Possibly run as administrator. NOTE: copying from here seems to give error, so manually type in this code.)
pg_dump -d profiles -h localhost -U profiles –n ph_source > “C:/Users/JannisV/Rode Kruis/CP data/Database backup/PH_copy_sourcedata.sql” 
  • Open each file and make 2 edits. a. Delete the line SET row_security = off b. Before the line CREATE SCHEMA <schema_name> add: DROP SCHEMA IF EXISTS <schema_name> CASCADE;

  • Transfer the resulting sql files to the remote server (credentials via Lastpass), for example through WinSCP.

  • Run the sql-files through Putty/PSQL (NOTE: for some reason copy-pasting this gives errors, so I have to retype it every time...)

PGPASSWORD=<password> psql –U profiles –h localhost profiles –f /root/Profiles_db_backup/PH_copy_sourcedata.sql –v ON_ERROR_STOP=1 
  • Run all sql files in the github-repository postgres_scripts/ folder in the same way (in the right order: first 1, then 2, then 3).
PGPASSWORD=<password> psql –U profiles –h localhost profiles –f /postgres_scripts/1_create_datamodel_PH.sql –v ON_ERROR_STOP=1 

AUTOMATIZATION OF PIPELINE

The goal is to automate the whole data-pipeline as much as possible. Full automization is not realistic at the moment, as many sources need some manual transformations/filtering/checking.

A lot has been automated already though:

Dropbox: Download layer

  • NO: donwloading all source files automatically based on a metadata-table with download links.
  • YES: Scraping scripts for some sources from HTML-tables

Dropbox: Transformation layer

  • NO: PCODE all scripts that need pcoding.
  • YES: Transforming geo-raster files to geo-shapefiles (see Dropbox-subfolder '3. Transformation')
  • NO: There are sometimes manual cleaning steps needed, that are hard to automate.

Dropbox: Upload layer

Including new variables in Postgres-scripts

  • NO: You have to add manually a new piece of scripts for new variables. This can be easily copied and adapted from earlier examples though.
  • YES: Calculating INFORM-scores (based on data/public/metadata_prototype.csv) (see "/postgres_scripts/2_function_calc_inform_scores")

Including new variables in Javascript-dashboard

  • YES: this is completely automated based on data/public/metadata_prototype.csv. HTML is built up dynamically in javascript code.

Including new countries in Postgres/Javascript-dashboard

  • NO: You have to manually create new Postgres-scripts. Can be easily copied/adapted from existing countries though.
  • YES: Javascript can automatically process new countries, based on data/public/country_metadata.csv

AUTOMATIZATION OF ADMINISTRATIVE BOUNDARIES

Related to the above, I have already worked on /~https://github.com/jannisvisser/Administrative_boundaries, which tries to fully automate the proces of

  • finding all administrative boundary data on HDX
  • downloading & unzipping it
  • simplifying the shapfiles through Mapshaper
  • uploading in PostGIS database
  • processing them further within PostGIS (to prepare for Community Risk Assessment-dasbhaord)

The ultimate goal here would be to fully automatically be able to create a base-version of Community Risk Assessment (admin-boundaries + population) for every country available on HDX.

4: Useful documentation/resources

When working on the dasboard, the following resources for the currently used versions of external dependencies will be valueable.

4.1: Front-end code

Angular:

Leaflet/DC:

Bootstrap:

About

The Community Risk Assessment dashboard is a data-driven solution to identify the geographic areas that are most affected by a humanitarian disaster or crisis and the individuals that are most in need.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published