-
Notifications
You must be signed in to change notification settings - Fork 8
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
8 changed files
with
315 additions
and
31 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
--- | ||
title: Data Loading with DuckDB | ||
--- | ||
|
||
# Data Loading with DuckDB | ||
|
||
This page provides guidance for using DuckDB in Observable Framework data loaders, and then deploying them using GitHub Actions. | ||
|
||
## Using DuckDB in Data Loaders | ||
|
||
The [NYC Taxi Rides](nyc-taxi-rides) example uses a [data loader](https://observablehq.com/framework/loaders) to perform data preparation, generating pre-projected data and writing it to a Parquet file. | ||
|
||
The loader below is a shell script that calls the command line interface to DuckDB. | ||
The `duckdb` executable must be on your environment path... but more on that below! | ||
|
||
```sh | ||
duckdb :memory: << EOF | ||
-- Load spatial extension | ||
INSTALL spatial; LOAD spatial; | ||
-- Project, following the example at /~https://github.com/duckdb/duckdb_spatial | ||
CREATE TEMP TABLE rides AS SELECT | ||
pickup_datetime::TIMESTAMP AS datetime, | ||
ST_Transform(ST_Point(pickup_latitude, pickup_longitude), 'EPSG:4326', 'ESRI:102718') AS pick, | ||
ST_Transform(ST_Point(dropoff_latitude, dropoff_longitude), 'EPSG:4326', 'ESRI:102718') AS drop | ||
FROM 'https://uwdata.github.io/mosaic-datasets/data/nyc-rides-2010.parquet'; | ||
-- Output parquet file to stdout | ||
COPY (SELECT | ||
(HOUR(datetime) + MINUTE(datetime)/60) AS time, | ||
ST_X(pick)::INTEGER AS px, ST_Y(pick)::INTEGER AS py, | ||
ST_X(drop)::INTEGER AS dx, ST_Y(drop)::INTEGER AS dy | ||
FROM rides) TO 'trips.parquet' WITH (FORMAT PARQUET); | ||
EOF | ||
|
||
cat trips.parquet >&1 # Write output to stdout | ||
rm trips.parquet # Clean up | ||
``` | ||
|
||
We invoke DuckDB with the `:memory:` argument to indicate an in-memory database. | ||
We also use the `<< EOF` shell script syntax to provide multi-line input, consisting of the desired SQL queries to run. | ||
|
||
The last query (`COPY ...`) writes a Parquet file to disk. | ||
However, Observable Framework requires that we instead write data to [`stdout`](https://en.wikipedia.org/wiki/Standard_streams#Standard_output_(stdout)). | ||
On some platforms we can do this by writing to the file descriptor `/dev/stdout`. | ||
However, this file does not exist on all platforms – including in GitHub Actions, where this query will fail. | ||
|
||
So we complete the script with two additional commands: | ||
|
||
- Write (`cat`) the bytes of the Parquet file to `stdout`. | ||
- Remove (`rm`) the generated file, as we no longer need it. | ||
|
||
## Using DuckDB in GitHub Actions | ||
|
||
To deploy our Observable Framework site on GitHub, we use a [GitHub Actions workflow](/~https://github.com/uwdata/mosaic-framework-example/blob/main/.github/workflows/deploy.yml). | ||
As noted earlier, one issue when running in GitHub Actions is the lack of file-based access to `stdout`. | ||
But another, even more basic, issue is that we need to have DuckDB installed! | ||
|
||
This snippet installs DuckDB within a workflow. | ||
We download a zip file of the official release, unpack it, copy the `duckdb` executable to `/opt/duckdb`, and then link to `duckdb` in the directory `/usr/bin`, ensuring it is accessible to subsequent scripts: | ||
|
||
```yaml | ||
steps: | ||
- name: Install DuckDB CLI | ||
run: | | ||
wget /~https://github.com/duckdb/duckdb/releases/download/v0.10.0/duckdb_cli-linux-amd64.zip | ||
unzip duckdb_cli-linux-amd64.zip | ||
mkdir /opt/duckdb && mv duckdb /opt/duckdb && chmod +x /opt/duckdb/duckdb && sudo ln -s /opt/duckdb/duckdb /usr/bin/duckdb | ||
rm duckdb_cli-linux-amd64.zip | ||
``` | ||
We perform this step before site build steps, ensuring `duckdb` is installed and ready. |
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,18 +1,86 @@ | ||
--- | ||
title: Mosaic + Framework Examples | ||
--- | ||
|
||
# Mosaic + Framework Examples | ||
## Using Mosaic and DuckDB in Observable Framework | ||
|
||
```js | ||
import { vgplot, url } from "./components/mosaic.js"; | ||
const weather = await FileAttachment("data/seattle-weather.parquet").url(); | ||
const vg = vgplot(vg => [ vg.loadParquet("weather", url(weather)) ]); | ||
``` | ||
|
||
This site shares examples of integrating Mosaic and DuckDB data loaders into Observable Framework. All source markup and code is available at </~https://github.com/uwdata/mosaic-framework-example>. | ||
|
||
[Mosaic](https://uwdata.github.io/mosaic) is a system for linking data visualizations, tables, and input widgets, all leveraging a database ([DuckDB](https://duckdb.org/)) for scalable processing. With Mosaic, you can interactively visualize and explore millions and even billions of data points. | ||
|
||
Here is a simple example, an interactive dashboard of weather in Seattle: | ||
|
||
[Mosaic](https://uwdata.github.io/mosaic) is a system for linking data visualizations, tables, and input widgets, all leveraging a database for scalable processing. With Mosaic, you can interactively visualize and explore millions and even billions of data points. | ||
```js | ||
const $click = vg.Selection.single(); | ||
const $domain = vg.Param.array(["sun", "fog", "drizzle", "rain", "snow"]); | ||
const $colors = vg.Param.array(["#e7ba52", "#a7a7a7", "#aec7e8", "#1f77b4", "#9467bd"]); | ||
const $range = vg.Selection.intersect(); | ||
``` | ||
|
||
A key idea is that interface elements (Mosaic _clients_) publish their data needs as queries that are managed by a central _coordinator_. The coordinator may further optimize queries before issuing them to a backing _data source_ such as [DuckDB](https://duckdb.org/). | ||
```js | ||
vg.vconcat( | ||
vg.hconcat( | ||
vg.plot( | ||
vg.dot( | ||
vg.from("weather", {filterBy: $click}), | ||
{ | ||
x: vg.dateMonthDay("date"), | ||
y: "temp_max", | ||
fill: "weather", | ||
r: "precipitation", | ||
fillOpacity: 0.7 | ||
} | ||
), | ||
vg.intervalX({as: $range, brush: {fill: "none", stroke: "#888"}}), | ||
vg.highlight({by: $range, fill: "#ccc", fillOpacity: 0.2}), | ||
vg.colorLegend({as: $click, columns: 1}), | ||
vg.xyDomain(vg.Fixed), | ||
vg.xTickFormat("%b"), | ||
vg.colorDomain($domain), | ||
vg.colorRange($colors), | ||
vg.rDomain(vg.Fixed), | ||
vg.rRange([2, 10]), | ||
vg.width(680), | ||
vg.height(300) | ||
) | ||
), | ||
vg.plot( | ||
vg.barX( | ||
vg.from("weather"), | ||
{x: vg.count(), y: "weather", fill: "#ccc", fillOpacity: 0.2} | ||
), | ||
vg.barX( | ||
vg.from("weather", {filterBy: $range}), | ||
{x: vg.count(), y: "weather", fill: "weather"} | ||
), | ||
vg.toggleY({as: $click}), | ||
vg.highlight({by: $click}), | ||
vg.xDomain(vg.Fixed), | ||
vg.yDomain($domain), | ||
vg.yLabel(null), | ||
vg.colorDomain($domain), | ||
vg.colorRange($colors), | ||
vg.width(680) | ||
) | ||
) | ||
``` | ||
|
||
This site shares examples of integrating Mosaic and DuckDB data loaders into Observable Framework. Source code is available at </~https://github.com/uwdata/mosaic-framework-example>. | ||
A key idea is that interface elements (Mosaic _clients_) publish their data needs as queries that are managed by a central _coordinator_. The coordinator may further optimize queries before issuing them to a backing _data source_ like DuckDB. | ||
|
||
## Example Data Apps | ||
## Example Articles | ||
|
||
- [Flight Delays](/flight-delays) - explore over 200,000 flight records | ||
- [NYC Taxi Rides](/nyc-taxi-rides) - load and visualize 1M NYC taxi cab rides | ||
- [Observable Latency](/observable-latency) - a dense view of over 7M web requests | ||
- [Flight Delays](flight-delays) - explore over 200,000 flight records | ||
- [NYC Taxi Rides](nyc-taxi-rides) - load and visualize 1M NYC taxi cab rides | ||
- [Observable Web Latency](observable-latency) - re-visiting a view of over 7M web requests | ||
|
||
## Implementation Notes | ||
|
||
- _Using DuckDB in data loaders and GitHub Actions_ | ||
- _Using Mosaic + DuckDB-WASM in Observable Framework_ | ||
- [Using DuckDB in Data Loaders and GitHub Actions](data-loading) | ||
- [Using Mosaic + DuckDB-WASM in Observable Framework](mosaic-duckdb-wasm) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
--- | ||
title: Using Mosaic & DuckDB-WASM | ||
--- | ||
|
||
# Using Mosaic & DuckDB-WASM | ||
|
||
This page describes how to set up Mosaic and DuckDB-WASM to "play nice" with Observable's reactive runtime. | ||
Unlike standard JavaScript, Observable will happily run JavaScript "out-of-order". | ||
Observable uses dependencies among code blocks, rather than the order within the file, to determine what to run and when to run it. | ||
This reactivity can cause problems for code that depends on "side effects" that are not tracked by Observable's runtime. | ||
|
||
## Importing Mosaic and Loading Data | ||
|
||
Here is how we initialize [Mosaic's vgplot API](https://uwdata.github.io/mosaic/what-is-mosaic/) in the [Flight Delays](flight-delays) example: | ||
|
||
```js run=false | ||
import { vgplot, url } from "./components/mosaic.js"; | ||
const flights = FileAttachment("data/flights-200k.parquet").url(); | ||
const vg = vgplot(vg => [ vg.loadParquet("flights", url(flights)) ]); | ||
``` | ||
|
||
We first import a custom `vgplot` initialization method that configures Mosaic, loads data into DuckDB, and returns the vgplot API. We also import a custom `url` method which we will later use to to prepare URLs that will be loaded by DuckDB. | ||
|
||
Next, we reference the data files we plan to load. | ||
As Observable Framework needs to track which files are used, we must use its `FileAttachment` mechanism. | ||
However, we don't actually want to load the file yet, so we instead request a URL. | ||
|
||
Finally, we invoke `vgplot(...)` to initialize Mosaic, which returns a (Promise to an) instance of the vgplot API. | ||
This method takes a single function as input, and should return an array of SQL queries to execute upon load. | ||
|
||
We use the `url()` helper method to prepare a file URL so that DuckDB can successfully load it — the url string returned by `FileAttachment(...).url()` is a _relative_ path like `./_file/data/doodads.csv`. | ||
DuckDB will mistakenly interpret this as a file system path rather than a web URL. | ||
The `url()` helper produces a full URL (with `https://`, hostname, etc.), based on the location of the current page: | ||
|
||
```js run=false | ||
export function url(file) { | ||
return `${new URL(file, window.location)}`; | ||
} | ||
``` | ||
|
||
The `vg` argument to the data loader callback is exactly the same API instance that is ultimately returned by `vgplot`. | ||
Perhaps this feels a bit circular, with `vg` provided to a callback, with the ultimate result being a reference to `vg`... why the gymnastics? | ||
We want to have access to the API to support data loading, using Mosaic's helper functions to install extensions and load data files. | ||
At the same time, we don't want to assign the _outer_ `vg` variable until data loading is complete. | ||
That way, downstream code that uses the API to build visualizations will not get evaluated by the Observable runtime until _after_ data loading is complete. | ||
|
||
Once `vg` is assigned, the data will be loaded, and we can use the API to create [visualizations](https://uwdata.github.io/mosaic/vgplot/), | ||
[inputs](https://uwdata.github.io/mosaic/inputs/), | ||
[params](https://uwdata.github.io/mosaic/core/#params), and | ||
[selections](https://uwdata.github.io/mosaic/core/#selections). | ||
|
||
## Mosaic Initialization | ||
|
||
For reference, here's the `vgplot()` method implementation: | ||
|
||
```js run=false | ||
import * as vg from "npm:@uwdata/vgplot"; | ||
|
||
export async function vgplot(queries) { | ||
const mc = vg.coordinator(); | ||
const api = vg.createAPIContext({ coordinator: mc }); | ||
mc.databaseConnector(vg.wasmConnector()); | ||
if (queries) { | ||
await mc.exec(queries(api)); | ||
} | ||
return api; | ||
} | ||
``` | ||
|
||
We first get a reference to the central coordinator, which manages all queries. | ||
We create a new API context, which we eventually will return. | ||
|
||
Next, we configure Mosaic to use DuckDB-WASM. | ||
The `wasmConnector()` method creates a new database instance in a worker thread. | ||
|
||
We then invoke the `queries` callback to get a list of data loading queries. | ||
We issue the queries to DuckDB using the coordinator's `exec()` method and `await` the result. | ||
|
||
Once that completes, we're ready to go! |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.