Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#76] Support Normalized Paths #78

Merged
merged 21 commits into from
Feb 2, 2024
Merged

[#76] Support Normalized Paths #78

merged 21 commits into from
Feb 2, 2024

Conversation

hiltontj
Copy link
Owner

@hiltontj hiltontj commented Jan 26, 2024

Closes #76.

This PR introduces support for Normalized Paths.

These can be accessed by a new query method JsonPath::query_located, which produces a LocatedNodeList. The LocatedNodeList is similar to NodeList, but contains the location of each node produced by the query, represented using the NormalizedPath type.

All new code has been documented with examples in this PR.

Added: NormalizedPath and PathElement types

The NormalizedPath struct represents the location of a node within a JSON object. Its representation is like so:

pub struct NormalizedPath<'a>(Vec<PathElement<'a>);

pub enum PathElement<'a> {
    Name(&'a str),
    Index(usize),
}

Several methods were included to interact with a NormalizedPath, e.g., first, last, get, iter, etc., but notably there is a as_json_pointer method, which allows direct conversion to a JSON Pointer to be used with the serde_json::Value::pointer or serde_json::Value::pointer_mut methods.

The new PathElement type also comes equipped with several methods, and both it and NormalizedPath have eagerly implemented traits from the standard library / serde to help improve interoperability.

Added: LocatedNodeList and LocatedNode types

The LocatedNodeList struct was built to have a similar API surface to the NodeList struct, but includes additional methods that give access to the location of each node produced by the original query. For example, it has the locations and nodes methods to provide dedicated iterators over locations or nodes, respectively, but also provides the iter method to iterate over the location/node pairs. Here is an example:

use serde_json::{json, Value};
use serde_json_path::JsonPath;
let value = json!({"foo": {"bar": 1, "baz": 2}});
let path = JsonPath::parse("$.foo.*")?;
let query = path.query_located(&value);
let nodes: Vec<&Value> = query.nodes().collect();
assert_eq!(nodes, vec![1, 2]);
let locs: Vec<String> = query
    .locations()
    .map(|loc| loc.to_string())
    .collect();
assert_eq!(locs, ["$['foo']['bar']", "$['foo']['baz']"]);

The location/node pairs are represented by the LocatedNode type.

The LocatedNodeList provides one unique bit of functionality over NodeList: deduplication of the query results, via the LocatedNodeList::dedup and LocatedNodeList::dedup_in_place methods.

Implementation Notes

The JsonPath::query_located method adds a completely separate function to perform queries in addition to the original JsonPath::query method. Their implementation of query execution is also completely separate, via the Queryable trait's query (existing) and query_located (new) methods, respectively. This was done, as opposed to, e.g., modifying the original query function to include locations, because I did not want to incur the overhead of producing the locations by default.

TODOs

  • Changelog
  • Interop/API Guidelines check
  • Some remaining code docs
    • LocatedNodeList::iter

This commit introduces the NormalizedPath type to
represent normalized paths from the JSONPath spec.

The Queryable trait includes a query_paths method
that is used to produce a list of normalized paths
vs. the standard query method, which produces the
nodes.

A few of the spec types had this implemented in
their impl for Queryable, but this is incomplete.
@hiltontj hiltontj self-assigned this Jan 26, 2024
This provides a complete working implementation of the NormalizedPath type, and provides a method to query for it along with nodes on JsonPath.

A fair bit of work is still needed to get the API right, but committing here to have it in a working state.

An unused implementation of Queryable on SingularQuery was removed.
The LocatedNodeList is similar to NodeList but provides the location along with each node.

Queryable::query_paths was renamed to query_located to coincide with LocatedNodeList

Trait implementations added to NormalizedPath and PathElement types.
The Locations and Nodes iterator types were added to support more convenient iteration over a LocatedNodeList query result.

An in-place version of LocatedNodeList::dedup was added, to provide an alternative option to dedup's move semantics.

Code re-orderd to put error types at the end of node.rs file.
The LocatedNode type represents a node, i.e., &Value,
as well as its location within the original Value that
was queried. Defining a dedicated type for this was done
to replace the use of the (NormalizedPath, &Value) every-
where, and allows for addition of convenience methods.

This also ensures controlled construction of LocatedNodeList,
which ensures that their deduplication can be done safely, i.e.,
the unwrap is safe.

This commit also started adding in the missing docs for newly
created items in this branch.
feat: PartialEq implementations for PathElement
@hiltontj hiltontj merged commit 3b231b3 into main Feb 2, 2024
4 checks passed
@hiltontj hiltontj deleted the 76-norm-paths branch February 2, 2024 20:22
Comment on lines +454 to +457
// This unwrap should be safe, since the paths corresponding to
// a query against a Value will always be ordered.
self.0
.sort_unstable_by(|a, b| a.loc.partial_cmp(&b.loc).unwrap());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the paths are already ordered, maybe we can call dedup directly?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see that you may want to mean that, although the elements can be unordered, the result of the same segment are grouped.

But how can we ensure that the unstable sort won't compare two elements of different group (in descendant query)?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example:

fn main() {
    let value = json!({
      "o": {"k": [{"j": 1, "k": 2}]},
      "a": [5, 3, [{"j": 4}, {"k": 6}]]
    });
    let path = JsonPath::parse(r#"$["a", "o"]..[0, 1]"#).unwrap();
    let nodes = path.query_located(&value);
    let nodes = path.query_located(&value).dedup();
}

// ["$['a'][0]", "$['a'][1]", "$['a'][2][0]", "$['a'][2][1]", "$['o']['k'][0]"]

If the sorter compare "$['a'][0]" with "$['o']['k'][0]", it would cause panic.

Copy link

@tisonkun tisonkun Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the other direction, we may sort PathElement with names always greater than index, to make it a total order. Is there some downside I miss?

Comment on lines +523 to +527
impl<'a> From<Vec<LocatedNode<'a>>> for LocatedNodeList<'a> {
fn from(v: Vec<LocatedNode<'a>>) -> Self {
Self(v)
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still have this from impl and the TODO indicates that this can be unsafe holds.

@tisonkun
Copy link

tisonkun commented Feb 5, 2025

Hello @hiltontj! Thanks for providing such a useful crate.

Just for your information, I'm porting this crate to fit in the requirements of generic variant data type - /~https://github.com/cratesland/spath/

Now I'm going through this crate and do some reviews randomly. Hopefully it helps.

@hiltontj
Copy link
Owner Author

hiltontj commented Feb 5, 2025

Hey @tisonkun - thank you!

Just for your information, I'm porting this crate to fit in the requirements of generic variant data type - /~https://github.com/cratesland/spath/

That looks really interesting, glad you could make use of serde_json_path.

Now I'm going through this crate and do some reviews randomly. Hopefully it helps.

I am grateful for the feedback 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ability to get path of queried nodes
2 participants