bqmake
provides BigQuery routines that help you to make typical data-modeling.
All routines are designed to be idempotent and have smart data update mechanism.
This let free you from awkward DAG workflow management.
This tool gives following utilities.
- Dynamic whole/partial Data Refresh for BigQuery Table:
Like materialized view,bqmake.v0.partition_table__update
automatically checks freshness and then updates data if needed.
This is useful to build pre-computed tables which conists of frequent or expensive query.
See Refreshing Partition Table Data section for more details. - Data Snapshot Utilities:
Table snapshot captures data changes and stores them in Slowly Changing Dimension II format. You can recover table state at any timepoint you snapshoted.bqmake.v0.snapshot__init
andbqmake.v0.snapshot__update
are optimized for BigQuery functionality using partitioning/clustering feature and save processing amount and slots. - Metadata Utilities:
Preparing useful metadata for tables.- Embedding intra-dataset data lineage into dataset description in mermaid.js format.
- Labeling available partition information.
Currently this is public beta and all routines are subject to change wihtout notice. Please send us your comments and suggestion via issue!
All utilities are BigQuery Routines (UDF or PROCEDER) and published at bqmake.v0
dataset.
You can use them without any installation.
bqmake.v0.partition_table__update
makes derived table fresh in specified partition range.
It dynamically analyze partition whose derived table and its referenced tables and update data if needed.
By using Scheduling Query, the procedure is almost behaves like materialized view. But comparing materialized view, you can get extra advanteges:
- No restricted query syntax.
- You can get vanilla BigQuery Table that has useful features in BigQuery console such as Preview, BI Engine supports and so on.
declare query string;
-- Prepare dataset and table
create schema if not exists `zsandbox`;
create or replace table `zsandbox.ga4_count`(event_date date, event_name string, records int64)
partition by event_date;
-- Prepare data generation query parameterized by @begin and @end (DATE type)
set query = """
select date(timestamp_micros(event_timestamp)) as event_date, event_name, count(1)
from `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*`
where parse_date('%Y%m%d', _TABLE_SUFFIX) between @begin and @end
group by event_date, event_name
""";
-- First call procedure to update data
call `bqmake.v0.partition_table__update`(
(null, 'zsandbox', 'ga4_count')
, [('bigquery-public-data', 'ga4_obfuscated_sample_ecommerce', 'events_*')]
, `bqmake.v0.alignment_day2day`('2021-01-01', '2021-01-01')
, query
, null
);
--> Affect 16 rows
-- Second call won't update partition data because 2022-01-01 partition is still freshed.
call `bqmake.v0.partition_table__update`(
(null, 'zsandbox', 'ga4_count')
, [('bigquery-public-data', 'ga4_obfuscated_sample_ecommerce', 'events_*')]
, `bqmake.v0.alignment_day2day`('2021-01-01', '2021-01-01')
, query
, null
);
--> No affect
declare query string;
set query = "select * from `bigquery-public-data.austin_bikeshare.bikeshare_stations`"
-- Initialize Snapshot table
call `bqmake.v0.snapshot_table__init`(
(null, 'zsandbox', 'ga4_count')
, (
'station_id'
, query
, current_timestamp()
)
, null
);
-- Snapshot after some modification
call `bqmake.v0.snapshot_table__update`(
destination
, null
, (
'station_id'
-- This example changes some records on purpose
, 'select * replace(if(station_id in (2499), "closed", status) as status) from `bigquery-public-data.austin_bikeshare.bikeshare_stations`'
, current_timestamp()
)
)
, to_json(struct(
-- Demo disables staleness check intentionally.
current_timestamp() as force_expired_at
))
)
v0.dataset__update_table_labels
set useful labels for partitions tables.
partition-min
: Oldest partition_idpartition-max
: Latest partition_idpartition-skip
: Skipped partition count
call `v0.dataset__update_table_labels`(('your_project', 'your_dataset'))
v0.dataset__update_description
generate dataset description with intra-dataset lineage in marmaid.js representation.
call `v0.dataset__update_description`(('your_project', 'your_dataset'))