Skip to content

Commit

Permalink
doc: book sections on metadata
Browse files Browse the repository at this point in the history
  • Loading branch information
molpopgen committed Nov 6, 2022
1 parent 244d3f6 commit 38b682e
Show file tree
Hide file tree
Showing 6 changed files with 178 additions and 0 deletions.
6 changes: 6 additions & 0 deletions book/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,12 @@
- [Working with trees](./tree_sequence_tree.md)
- [Miscellaneous operations](./tree_sequence_miscellaneous.md)

* [Metadata](./metadata.md)
- [Defining metadata types in rust](./metadata_derive.md)
- [Metadata and tables](./metadata_tables.md)
- [Metadata schema](./metadata_schema.md)


[Crate prelude](./prelude.md)
[Changelog](./changelog.md)
[Migration Guide](./migration_guide.md)
8 changes: 8 additions & 0 deletions book/src/metadata.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Metadata <img align="right" width="73" height="45" src="https://raw.githubusercontent.com/tskit-dev/administrative/main/logos/svg/tskit-rust/Tskit_rust_logo.eps.svg">

Tables may contain additional information about rows that is not part of the data model.
This metadata is optional.
Tables are not required to have metadata.
Tables with metadata do not require that every row has metadata.

The next sections showcase the metadata API.
19 changes: 19 additions & 0 deletions book/src/metadata_derive.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Defining metadata types in rust

A key feature of the API is that metadata is specified on a per-table basis.
In other words, a type to be used as node metadata implements the `tskit::metadata::NodeMetadata` trait.

Using the `tskit` cargo feature `derive`, we can use procedural macros to define metadata types.
Here, we define a metadata type for a mutation table:

```rust, noplayground, ignore
{{#include ../../tests/book_metadata.rs:metadata_derive}}
```

We require that you also manually specify the `serde` derive macros because the metadata API
itself does not depend on `serde`.
Rather, it expects raw bytes and `serde` happens to be a good way to get them from your data types.

The derive macro also enforces some helpful behavior at compile time.
You will get a compile-time error if you try to derive two different metadata types for the same rust type.
The error is due to conflicting implementations for a [supertrait](https://doc.rust-lang.org/rust-by-example/trait/supertraits.html).
17 changes: 17 additions & 0 deletions book/src/metadata_schema.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Metadata schema

For useful data interchange with `tskit-python`, we need to define [metadata schema](https://tskit.dev/tskit/docs/stable/metadata.html).

There are currently several points slowing down a rust API for schema:

* It is not clear which `serde` formats are compatible with metadata on the Python side.
* Experiments have shown that `serde_json` works with `tskit-python`.
* Ideally, we would also like a binary format compatible with the Python `struct`
module.
* However, we have not found a solution eliminating the need to manually write the
schema as a string and add it to the tables.
Various crates to generate JSON schema from rust structs return schema that are over-specified
and fail to validate in `tskit-python`.
* We also have the problem that we will need to add some Python to our CI to prove to ourselves
that some reasonable tests can pass.

36 changes: 36 additions & 0 deletions book/src/metadata_tables.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Metadata and tables

Let us create a table and add a row with our mutation metadata:

```rust, noplayground, ignore
{{#include ../../tests/book_metadata.rs:add_mutation_table_row_with_metadata}}
```

Meta data is optional on a per-row basis:

```rust, noplayground, ignore
{{#include ../../tests/book_metadata.rs:add_mutation_table_row_without_metadata}}
```

We can confirm that we have one row with, and one without, metadata:

```rust, noplayground, ignore
{{#include ../../tests/book_metadata.rs:validate_metadata_row_contents}}
```

Fetching our metadata from the table requires specifying the metadata type.
The result of a metadata retrieval is `Option<Result, TskitError>`.
The `None` variant occurs if a row does not have metadata or if a row id does not exist.
The error state occurs if decoding raw bytes into the metadata type fails.
The details of the error variant are [here](https://docs.rs/tskit/latest/tskit/error/enum.TskitError.html#variant.MetadataError).
The reason why the error type holds `Box<dyn Error>` is that the API is very general.
We assume nothing about the API used to encode/decode metadata.
Therefore, the error could be anything.

```rust, noplayground, ignore
{{#include ../../tests/book_metadata.rs:metadata_retrieval}}
```

```rust, noplayground, ignore
{{#include ../../tests/book_metadata.rs:metadata_retrieval_none}}
```
92 changes: 92 additions & 0 deletions tests/book_metadata.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
#[cfg(feature = "derive")]
#[test]
fn book_mutation_metadata() {
// ANCHOR: metadata_derive
#[derive(serde::Serialize, serde::Deserialize, tskit::metadata::MutationMetadata)]
#[serializer("serde_json")]
struct MutationMetadata {
effect_size: f64,
dominance: f64,
}
// ANCHOR_END: metadata_derive

// ANCHOR: add_mutation_table_row_with_metadata
let mut tables = tskit::TableCollection::new(50.0).unwrap();

let md = MutationMetadata {
effect_size: 1e-3,
dominance: 1.0,
};

let mut_id_0 = tables
.add_mutation_with_metadata(
0, // site id
0, // node id
-1, // mutation parent id
0.0, // time
None, // derived state is Option<&[u8]>
&md, // metadata for this row
)
.unwrap();
// ANCHOR_END: add_mutation_table_row_with_metadata

// ANCHOR: add_mutation_table_row_without_metadata
let mut_id_1 = tables
.add_mutation(
0, // site id
0, // node id
-1, // mutation parent id
0.0, // time
None, // derived state is Option<&[u8]>
)
.unwrap();
// ANCHOR_END: add_mutation_table_row_without_metadata

// ANCHOR: validate_metadata_row_contents
assert_eq!(
tables
.mutations_iter()
.filter(|m| m.metadata.is_some())
.count(),
1
);
assert_eq!(
tables
.mutations_iter()
.filter(|m| m.metadata.is_none())
.count(),
1
);
// ANCHOR_END: validate_metadata_row_contents

// ANCHOR: metadata_retrieval
let fetched_md = match tables.mutations().metadata::<MutationMetadata>(mut_id_0) {
Some(Ok(m)) => m,
Some(Err(e)) => panic!("metadata decoding failed: {:?}", e),
None => panic!(
"hmmm...row {} should have been a valid row with metadata...",
mut_id_0
),
};

assert_eq!(md.effect_size, fetched_md.effect_size);
assert_eq!(md.dominance, fetched_md.dominance);
// ANCHOR_END: metadata_retrieval

// ANCHOR: metadata_retrieval_none
// There is no metadata at row 1, so
// you get None back
assert!(tables
.mutations()
.metadata::<MutationMetadata>(mut_id_1)
.is_none());

// There is also no metadata at row 2,
// because that row does not exist, so
// you get None back
assert!(tables
.mutations()
.metadata::<MutationMetadata>(2.into())
.is_none());
// ANCHOR_END: metadata_retrieval_none
}

0 comments on commit 38b682e

Please sign in to comment.