This repo consists of a synthetically generated dataset, along with the config for producing that dataset, and a working dbt project for the data. This data is intended to power the "Thyme to Shine Market" as part of the Lightdash demo.
- Synth for time-series synthetic data generation.
- Python for some data transformation using
pandas
andnumpy
. - google-cloud-sdk for loading to bigquery.
- dbt for data transformations as config.
Note that you shouldn't need to do this very often! The data should be kept up to data with some configuration in the yml files. That means that the data is updated at run time in Lightdash, so we can keep the underlying objects as table to enable good performance and low cost.
sh seed-bigquery.sh
This will:
- use
synth
to generate all the base datasets using the synth configuration defined insynth/
, with help from the lookups define inlookup/
. Note that the final two commands are non-deterministic. - run some bespoke transformations (using Python
pandas
andnumpy
) on the data to improve its suitability for a demo. Note that this script is non-deterministic since it includes some randomness (e.g. to adjust product popularity).
You will need to manually adjust the output data. It's important that you do the following steps:
- Make sure all CSV files have data that runs to the current date - if they don't you'll need to readjust your settings and generate new data.
- Manually remove rows that have dates beyond the current date.
- Make sure to adjust the .yml file date configurations. For each date field, you need to adjust the sql block to specify the maximum timestamp in the given table.
Running dbt seed will take the data that exists in the CSVs and create objects in BigQuery, that are then used downstream.
cd dbt-bigquery
dbt seed
This will run the dbt models and build tables in lightdash-analytics.lightdash_demo_gardening
cd dbt-bigquery
dbt run
Now the data in the internal version of Thyme to Shine Market will reflect the new data from the generated CSVs.
Now that the internal site is updated, you need to update the externally facing demo site.
You can copy the seeds folder from the dbt-bigquery project into the dbt project.
Before you continue remember to update the sql blocks in each of the yml files! Once that is complete you can execute your dbt run and dbt seed commands targeting the Postgres warehouse.