Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix table location URL #432

Merged
merged 6 commits into from
Nov 13, 2024
Merged

Fix table location URL #432

merged 6 commits into from
Nov 13, 2024

Conversation

tdikland
Copy link
Contributor

What changes are proposed in this pull request?

If the table is created from a URI, and that URI does not end with a trailing slash, then subsequent operations (i.e. deriving the path to the _delta_log folder) overwrite the last URL path segment.

The implemented fix is not very "nice", I'd be happy to do a more elaborate fix (e.g. wrapping table location in a newtype and exposing methods like delta_log(), last_checkpoint(), etc. All depending on your appetite for change and review bandwith of course.

How was this change tested?

Added unit test

Copy link

codecov bot commented Nov 6, 2024

Codecov Report

Attention: Patch coverage is 93.75000% with 1 line in your changes missing coverage. Please review.

Project coverage is 79.56%. Comparing base (a2cade8) to head (bdd8c81).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
kernel/src/table.rs 93.75% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #432      +/-   ##
==========================================
+ Coverage   79.54%   79.56%   +0.02%     
==========================================
  Files          56       56              
  Lines       12259    12271      +12     
  Branches    12259    12271      +12     
==========================================
+ Hits         9751     9763      +12     
  Misses       1981     1981              
  Partials      527      527              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@nicklan nicklan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is a good catch. I don't love that "feature" of the url crate, but we do need to work within it. I think it would make more sense though to do this in resolve_uri_type where you could do something like:

    let url = if table_uri.ends_with('/') {
        Url::parse(table_uri)
    } else {
        let mut s = table_uri.to_string();
        s.push('/');
        Url::parse(&s)
    };
    if let Ok(url) = url {
...

@scovich
Copy link
Collaborator

scovich commented Nov 8, 2024

Thanks, this is a good catch. I don't love that "feature" of the url crate, but we do need to work within it. I think it would make more sense though to do this in resolve_uri_type

I tend to agree... pushing it down would fix path handling as well, tho maybe our path handling was already hardened previously?

Code golf nitpick:

you could do something like:

    let url = if table_uri.ends_with('/') {
        Url::parse(table_uri)
    } else {
        let mut s = table_uri.to_string();
        s.push('/');
        Url::parse(&s)
    };
    if let Ok(url) = url {
...

Or just

let table_uri = if table_uri.ends_with('/') {
  Cow::Borrowed(table_uri)
} else {
  Cow::Owned(format!("{table_uri}/"))
}
if let Ok(url) = Url::parse(&table_uri) {

@tdikland
Copy link
Contributor Author

tdikland commented Nov 8, 2024

Thanks for the suggestions - applied the changes. From my personal experience, the subtle differences between std::path::Path, url::Url and object_store::path::Path often lead to little bugs like these!

Copy link
Collaborator

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@@ -106,12 +112,12 @@ fn resolve_uri_type(table_uri: impl AsRef<str>) -> DeltaResult<UriType> {
} else if scheme.len() == 1 {
// NOTE this check is required to support absolute windows paths which may properly
// parse as url we assume here that a single character scheme is a windows drive letter
Ok(UriType::LocalPath(PathBuf::from(table_uri)))
Ok(UriType::LocalPath(PathBuf::from(table_uri.as_ref())))
Copy link
Collaborator

@scovich scovich Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side question:" Why does this LocalPath use table_uri but the other (L108) uses url.to_file_path?
Is it because we don't have a file:// scheme here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. True to form, the URL crate isn't really helpful. From the to_file_path docs:

Assuming the URL is in the file scheme or similar, convert its path to an absolute std::path::Path.
Note: This does not actually check the URL’s scheme, and may give nonsensical results for other schemes. It is the user’s responsibility to check the URL’s scheme before calling this.

So this seems safer.

kernel/src/table.rs Outdated Show resolved Hide resolved
@scovich scovich requested a review from nicklan November 8, 2024 20:07
Co-authored-by: Ryan Johnson <scovich@users.noreply.github.com>
@github-actions github-actions bot added the breaking-change Change that will require a version bump label Nov 13, 2024
@nicklan nicklan removed the breaking-change Change that will require a version bump label Nov 13, 2024
@nicklan nicklan merged commit 49835e0 into delta-io:main Nov 13, 2024
18 checks passed
@nicklan
Copy link
Collaborator

nicklan commented Nov 13, 2024

Thanks @tdikland , appreciate the PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants