This is the repository for the Northumbria Culture Connect Data Observatory. The Data Observatory is provided as a static website, which can be hosted using a simple web server or service such as GitHub pages. This repository contains the data, templates and assets which will enable you to recreate the Data Observatory site.
To get started, clone this site to your build environment. The site build should be platform agnostic, but we've mostly used macOS or Linux environments during development. This includes on Windows machines, where we tend to prefer WSL.
NB If using WSL, it's probably worth ensuring that you clone into the WSL filesystem, rather than using a repository cloned into the Windows directories, as this can cause problems with file syncing.
We use Lume to build the site. The only dependency for this is the Deno runtime. Install this using the Deno "Getting Started" instructions.
To build the site, ensure that you are in the root directory of the repository and run the command
deno task build
This will create the files that comprise the site into the _site
directory.Transfer this folder to your web server.
The build process has been configured to build a site with a canonical URL of https://open-innovations.github.io/northumbria-culture-connect/. This affects some SEO tags and prefixes for absolute URLs. If you are hosting at a different location, you will need to either update the location config, or override the location with the --location flag on the site build as follows
TODO Update canonical url in config once known.
deno task build --location https://my-site.co.uk/ncc
It's possible to serve the site locally using the command
deno task serve
The site will be served at http://localhost:3000/
Full instructions to develop the site are out of the scope of this brief document. To get started you will need a good working knowledge of Lume. The documentation is the place to start.
Key concepts to understand are:
- Templating: Pages are defined by templates which specify their content. If you need to update the content of a page it will probably be defined in a template. The primary templating language is Vento. We make use of partials which are small sections of page which are included in the main page via an include statement.
- Shared data: Files
stored in the
_data
directories in the site are available in the build context for injection into templates. - Components: Reusable and self-contained chunks of code which render as part of a page. Unlike partials these don't have access to the page content.
The data pipelines are defined in the pipelines
folder, and are a mixture of
Python scripts and Jupyter notebooks. These make broad use of Python libraries
such as PETL and Pandas to process the data.
You will need a working version of Python installed. We have used Python 3.12 in the development. Dependencies (including Python version) are managed via Pipenv. Install this by following the instructions in the Pipenv documentation.
Once installed, set install the dependencies by running:
pipenv install
If for some reason the dependencies change, you can make sure you're up to date by running
pipenv sync
To ensure the scripts run properly you will also need to set the
PYTHONPATH to point to
the pipelines
folder of this repository. This can be added to a .env
file in
this directory (which is set to be ignored by Git). On a macOS/Linux/WSL
environment the following command will do this, assuming run from the project
root:
echo PYTHONPATH=${PWD}/pipelines > .env
You can now start an environment with all required python modules by running:
pipenv shell
Having run the pipenv shell
command and started a shell, change directory into
the piplines
directory.
Run a python script such as get-data.py
as follows:
python get-data.py
Run Jupyter notebooks such as arts-council.ipynb
as follows:
jupyter execute arts-council.ipynb
Some of the pipelines have been configured as DVC pipelines. An example of this is the Culture sector dataset. These can be run with the following command:
dvc repro pipelines/culture-sector/dvc.yaml
It is possible to integrate Jupyter notebook execution with IDEs such as Visual Studio Code. Configuration of this integration is beyond the scope of this guide.