-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix table location URL #432
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #432 +/- ##
==========================================
+ Coverage 79.54% 79.56% +0.02%
==========================================
Files 56 56
Lines 12259 12271 +12
Branches 12259 12271 +12
==========================================
+ Hits 9751 9763 +12
Misses 1981 1981
Partials 527 527 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is a good catch. I don't love that "feature" of the url crate, but we do need to work within it. I think it would make more sense though to do this in resolve_uri_type
where you could do something like:
let url = if table_uri.ends_with('/') {
Url::parse(table_uri)
} else {
let mut s = table_uri.to_string();
s.push('/');
Url::parse(&s)
};
if let Ok(url) = url {
...
I tend to agree... pushing it down would fix path handling as well, tho maybe our path handling was already hardened previously? Code golf nitpick:
Or just let table_uri = if table_uri.ends_with('/') {
Cow::Borrowed(table_uri)
} else {
Cow::Owned(format!("{table_uri}/"))
}
if let Ok(url) = Url::parse(&table_uri) { |
Thanks for the suggestions - applied the changes. From my personal experience, the subtle differences between |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
@@ -106,12 +112,12 @@ fn resolve_uri_type(table_uri: impl AsRef<str>) -> DeltaResult<UriType> { | |||
} else if scheme.len() == 1 { | |||
// NOTE this check is required to support absolute windows paths which may properly | |||
// parse as url we assume here that a single character scheme is a windows drive letter | |||
Ok(UriType::LocalPath(PathBuf::from(table_uri))) | |||
Ok(UriType::LocalPath(PathBuf::from(table_uri.as_ref()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
side question:" Why does this LocalPath
use table_uri
but the other (L108) uses url.to_file_path
?
Is it because we don't have a file://
scheme here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. True to form, the URL crate isn't really helpful. From the to_file_path
docs:
Assuming the URL is in the file scheme or similar, convert its path to an absolute std::path::Path.
Note: This does not actually check the URL’s scheme, and may give nonsensical results for other schemes. It is the user’s responsibility to check the URL’s scheme before calling this.
So this seems safer.
Co-authored-by: Ryan Johnson <scovich@users.noreply.github.com>
Thanks @tdikland , appreciate the PR! |
What changes are proposed in this pull request?
If the table is created from a URI, and that URI does not end with a trailing slash, then subsequent operations (i.e. deriving the path to the
_delta_log
folder) overwrite the last URL path segment.The implemented fix is not very "nice", I'd be happy to do a more elaborate fix (e.g. wrapping table location in a newtype and exposing methods like
delta_log()
,last_checkpoint()
, etc. All depending on your appetite for change and review bandwith of course.How was this change tested?
Added unit test