Hazel 🌳

This proof of concept (POC) investigates how Datomic principles can be adapted for the frontend environment. It introduces a peer library, written in JavaScript, that is capable of navigating a DataScript database and storing its segments in the browser's cache.

This approach is useful for productivity tools like Asana, Jira, Slack, and Notion, especially for applications that work with relatively large databases in the browser. It is necessary to make fast queries on that data without access to the backend.

The project is built upon the React TodoMVC framework.

The project's capabilities include:

Range queries.
Lazy iteration over databases by fetching database segments on-demand from the backend.
Long-term storage of these segments within the browser cache.
Getting a consistent snapshot of a database.

For those unfamiliar with Datomic-like databases, here's an illustrative example of its core concept:

Consider a traditional database such as PostgreSQL or MySQL, where data resides on the server's disk and your application's queries are processed on the server. If you decide to cache a slow query's result within your application, you essentially forego the query engine, reducing it to key-value (KV) storage. In contrast, a Datomic-like system enables you to execute queries within your application utilizing its cache.

More info you can find in Datomic Introduction.

How does it work?

Hazel is designed to read indexes built by DataScript. However, unlike DataScript, it provides an asynchronous API for data querying and loads storage segments on-demand.

You should first be familiar with the DataScript or Datomic data model. If not, please refer to the following resources:

Datoms and Indexes

The DataScript database operates on datoms, which are atomic units of data. Each Datom is represented as a tuple (E, A, V), where:

E stands for the entity ID,
A for the attribute, and
V for the value.

These elements form the core structure of a Datom. To efficiently organize datoms, DataScript uses three indexes: EAV, AEV, and AVE. The name of each index reflects the order in which datoms are sorted:

EAV: Sorted by entity ID, then attribute, then value.
AEV and AVE: Follow analogous patterns.

While datoms in Datomic and DataScript also include an additional element T (Transaction ID), Hazel simplifies the model by excluding this component.

Index Implementation

The indexes in DataScript are implemented as Persistent Sorted Sets, a type of immutable data structure based on B+ trees. These structures are optimized for storing elements in sorted order and enable efficient operations such as lookups, insertions, and deletions, with a time complexity of $$O(\log n)$$. Functional immutability is achieved through structural sharing, ensuring that updates reuse existing data whenever possible. A detailed explanation of B-trees, including their variation B+ trees, can be found in the paper "The Ubiquitous B-Tree" by Douglas Comer.

Each node of the tree corresponds to a storage segment, serialized and stored persistently. Branch nodes contain keys and addresses for navigation, while leaf nodes store ordered sequences of keys (datoms).

Database Implementation

In DataScript, changes are made using transactions, which are represented as structured data. While a comprehensive understanding of the entire transaction process is not required, it’s important to note that transactions are represented as a collections of datoms. Each Datom in transaction includes a flag that indicates whether it will be added to or removed from the database.

Since persistent data structures can lead to high overhead when updating the entire tree for every transaction, DataScript employs an optimization mechanism that relies on an append-only "tail" for managing updates:

Changes are stored in the "tail".
Once the size of the tail becomes comparable to a tree node, the "tail" is "flushed" into the tree. For implementation details, see the source code.

Hazel's Peer

In Datomic and DataScript, separate APIs are used for querying and mutating data. The Peer library is responsible for querying data. Moreover, it executes queries using a local cache.

Ultimately, Datomic and DataScript provide low-level API for querying data:

Hazel implements a similar low-level API.

First, let's consider the Datomic and DataScript implementations for the JVM. When querying data, they access storage segments stored remotely or in a local cache. This access uses blocking I/O, and the result of the queries is a lazy sequence.

Here are the advantages of this approach:

It allows processing data that exceeds the size of RAM.
It allows stopping lazy sequence consumption, which prevents further loading of the next segments.

Second, let's examine the DataScript implementation for ClojureScript (JS). It shares the same codebase as the JVM implementation and, as a result, has the same API. However, in JavaScript, blocking I/O cannot be used for retrieving segments, unlike in the JVM. This limitation means that DataScript in JavaScript can operate only with data stored in RAM.

Hazel is designed to overcome this limitation. In JavaScript, the equivalent of lazy sequences is a Generator function (function*/yield). However, since segments are requested asynchronously over the network, Hazel uses AsyncGenerator to manage this process.

Here are some examples:

A range query:

for async (const [e, _a, _v] of db.ave.datoms('task/completed', true)) {
  // Retrieve datoms with the attribute `task/completed` and value `true`.
  // ...
}

Retrieving all atriibutes of an Entity:

const todo = {
  id: e,
}
for async (const [_e, a, v] of db.eav.datoms(e)) {
  todo[a] = v;
}

NOTE: The index name matters. In the first example, the AVE index is used, while in the second example, the EAV index is used.

Finally, the Cache API is used to cache segments. For more details, see the documentation here.

Learning by Example

A motivated reader can easily grasp these concepts by reviewing the provided tests, which offer clear examples and practical insights. Additional details on persistent B+ trees and storage mechanisms in DataScript can be found in the following references:

Limitations

I've only implemented the methods (r)seek and datoms over indexes.

The main drawback I've found is access control. For instance, consider an Asana project; it has tasks and collaborators, and each collaborator can read all tasks of the project. I think we should store data per project database. However, the problem is that we cannot just add an external collaborator to a task of this project because our database is cached in the browser, and this collaborator will have access to all the data in the database.

I have two solutions for this problem:

Having an additional API for data access that runs queries on the backend, where we can easily handle access control, although we would lose caching.
Spreading (replicate) copies of a task across many databases (projects).

If you have any thoughts, feel free to open an issue in this repository.

Future work

It is possible to implement datalog and other Datomic/DataScript APIs.

It may be possible to adopt this approach for the local-first paradigm. For example, we can implement Conflict-free Replicated Data Types (CRDT) by writing database functions in JavaScript and optimistically transacting them on the frontend side.

How to run

docker build -t hazel . && docker run --rm -p 8080:8080 hazel
open http://localhost:8080

Dev

bun run test

Bun builder is currently in beta and lacks some features:

When using build.js for configuration, Bun reloads only the build.js file, not the other project files.
Automatic generation of manifest.json is not yet implemented.

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
js		js
resources/public		resources/public
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
bun.lockb		bun.lockb
deps.edn		deps.edn
jsconfig.json		jsconfig.json
notes.org		notes.org
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hazel 🌳

How does it work?

Datoms and Indexes

Index Implementation

Database Implementation

Hazel's Peer

Learning by Example

Limitations

Future work

How to run

Dev

About

Releases

Packages

Contributors 2

Languages

darkleaf/hazel

Folders and files

Latest commit

History

Repository files navigation

Hazel 🌳

How does it work?

Datoms and Indexes

Index Implementation

Database Implementation

Hazel's Peer

Learning by Example

Limitations

Future work

How to run

Dev

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages