-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
std::fs::canonicalize returns UNC paths on Windows, and a lot of software doesn't support UNC paths #42869
Comments
|
In reference to your commit which referenced this PR, normalization is not the same as merely joining the path onto the current directory due to drive relative paths being relative to the current directory on the given drive. For example given a drive relative path of |
thanks, @retep998 :) it's just a hacked-together build tool that probably will eventually be replaced with something else, and I didn't intend to notify this ticket about my commit. but I guess it goes to show that a good way to get an absolute path in std would be really helpful. |
Note, the i-wrong tag is only for the |
Quick testing on Windows 10.0.15063 indicates that both |
Technically, AFAIK it is safe to strip the prefix in common simple cases (absolute path with a drive letter, no reserved names, shorter than max_path), and leave it otherwise. So I think there's no need to compromise on correctness as far as stdlib goes. The trade-off is between failing early and exposing other software that doesn't support UNC paths vs maximizing interoperability with non-UNC software. In an ideal world, I would prefer the "fail early" approach, so that limitations are quickly found and removed. However, Windows/DOS path handling has exceptionally long and messy history and decades of Microsoft bending over backwards to let old software not upgrade its path handling. If Microsoft can't push developers towards UNC, and fails to enforce this even in their own products, I have no hope of Rust shifting the Windows ecosystem to UNC. It will rather just frustrate Rust users and make Rust seem less reliable on Windows. So in this case I suggest trying to maximize interoperability instead, and canonicalize to regular paths whenever possible (using UNC only for paths that can't be handled otherwise). Also, careful stripping of the prefix done in stdlib will be much safer than other crates stripping it unconditionally (because realistically whenever someone runs into this problem, they'll just strip it unconditionally) |
@kornelski I completely agree. The current behavior is unexpected in my opinion. |
I hope this is helpful… According to Microsoft:
Source: https://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx And the Ruby language uses forward slashes for File paths and that works on Windows. |
I've looked at this problem in detail. There are a few rules which need to be checked to safely strip the UNC prefix. It can be implemented as a simple state machine. I've implemented that using public APIs, but because So I'm still hoping canonicalize would do it automatically, because if it's done only for legacy-compatible paths there's no downside: all paths work for UNC-aware programs, and all paths that can work for legacy programs work too. |
Another example of this issue that I encountered in alexcrichton/cargo-vendor#71: url::URL.to_file_path() returns a non-UNC path (even if the URL was initialized with a UNC path). And std::path::Path.starts_with() doesn't normalize its arguments to UNC paths. So calling extern crate url;
use std::path::Path;
use url::Url;
fn main() {
// Path.canonicalize() returns a UNC path.
let unc_path_buf = Path::new(r"C:\Windows\System").canonicalize().expect("path");
let unc_path = unc_path_buf.as_path();
// Meanwhile, Url.to_file_path() returns a non-UNC path,
// even when initialized from a UNC path.
let file_url = Url::from_file_path(unc_path).expect("url");
let abs_path_buf = file_url.to_file_path().expect("path");
let abs_path = abs_path_buf.as_path();
// unc_path and abs_path refer to the same resource,
// and they both "start with" themselves.
assert!(unc_path.starts_with(unc_path));
assert!(abs_path.starts_with(abs_path));
// But they don't "start with" each other, so these fail.
assert!(unc_path.starts_with(abs_path));
assert!(abs_path.starts_with(unc_path));
} Arguably, Nevertheless, it does feel like something of a footgun, so it's worth at least documenting how it differs from that of some other APIs on Windows. |
Comparing canonical paths is a footgun in general because it is the wrong thing to do! Things like hard links and so on mean that such comparisons will never be entirely accurate. Please don't abuse canonicalization for this use case. If you want to tell whether two paths point to the same file, compare their file IDs! That's what |
but |
There are more ways than just |
bind mounts are equivalent to directory hardlinks.
…On Wed, May 9, 2018, 16:20 Kornel ***@***.***> wrote:
but starts_with is not for is-file-a-file comparison, but
is-file-in-a-directory check. There are no hardlinks involved (and AFAIK
apart from private implementation detail of macOS time machine, no OS
supports directory hardlinks).
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#42869 (comment)>,
or mute the thread
</~https://github.com/notifications/unsubscribe-auth/AApc0j0s0uGzUojkJwFP7VFX8RpnGYzMks5twu0RgaJpZM4OEGyt>
.
|
Reported in rust-lang#838 Also see rust-lang/rust#42869
Reported in rust-lang#838 Also see rust-lang/rust#42869
…nonicalize()` (#84) On Windows, `std::fs::canonicalize()` [produces a weird type of path that cannot be used with `Command::current_dir()`](rust-lang/rust#42869). This switches to using the [`normpath`](https://docs.rs/normpath/latest/normpath/trait.PathExt.html#tymethod.normalize) crate, which should avoid these issues.
Clearly the current behavior of std::fs::canonicalize is problematic, as evidenced by the comments and issues above. It seems to me that dunce does more or less what people expect std::fs::canonicalize to do on Windows. The logic used by dunce seems preferrable to what std::fs::canonicalize currently does in every case that I can think of. Dunce seems to have no outstanding issues, is widely used and trusted and should be compatible with all supported versions of Windows, so my simple proposal is this: why not upstream the logic used by dunce into std::fs::canonicalize? Is there any concrete objection to this? |
I do think we should provide an equivalent to That said, I still insist that most people do not need |
I think having std::fs::canonicalize and dunce::canonicalize as separate functions in the standard library would just cause confusion for no extra value. What use case would prefer the current std::fs::canonicalize behavior over dunce::canonicalize? I can't think of a single one.
Maybe, but I think there are still valid and important use cases where std::fs::canonicalize/dunce::canonicalize is the right tool for the job. For example if I want to create some kind of lookup table where a file path is the key. |
The problem is that people may be intentionally relying on Solutions:
|
That's just not going to work: https://gitlab.com/kornelski/dunce/-/issues/3#note_1096103063 |
The length limit is not a problem. The UNC prefix can be kept for paths that exceed PATH_MAX or other limits. The |
As kornelski pointed out, the path length limit is not a problem, because dunce will automatically preserve the UNC prefix if it is needed because of the path length limit. People who rely on std::fs::canonicalize to convert to NT-style paths are in my opinion misusing std::fs::canonicalize for a purpose that it is not intended for. While the documentation does state that it will convert to extended length path syntax, I have always interpreted that as an implementation detail rather than a strong guarantee.
Having this as a separate method could be useful, but I still strongly believe that std::fs::canoncalize should do this conversion automatically (when it is safe to do so, as dunce does). If it needs to be done manually then people that want to use std::fs::canonicalize in cross-platform code will have to manually add the extra function call to strip UNC prefixes on Windows and will have to know and remember to do that every time (an easy thing to miss if you primarily test on Linux or Mac). I think it is impossible to overstate that support for UNC paths on Windows is really spotty. Many programs will just refuse to work with UNC paths. You want to avoid UNC paths if at all possible. |
I don't disagree with that. Hence why But changing the documented behaviour of |
It is a breaking change. Users can push new path components to the path after canonicalization and then use that new path. I've done this a few times in some of my projects. At least the tar crate also seems to do this. As a side note, I'm also a bit concerned about dunce's ability to implement the specified path parsing. The documentation has changed in the past: fn main() {
let cwd = std::env::current_dir().unwrap();
let cwd = cwd.canonicalize().unwrap();
dbg!(&cwd);
let test_file = cwd.join("nul.tar.gz");
std::fs::write(&test_file, "Hello World!").unwrap();
let canonical_test = test_file.canonicalize().unwrap();
dbg!(&canonical_test);
dbg!(std::fs::read_to_string(canonical_test).unwrap());
let dunce_test = dunce::canonicalize(test_file).unwrap();
dbg!(&dunce_test);
dbg!(std::fs::read_to_string(dunce_test).unwrap());
}
So, using Also just my 2 cents, but I think making As a side note, Windows 11 seems to not have reserved file names, but I can't find the documentation for this behavior. |
The other alternative crate |
Unfortunately, this does not seem to be a suitable replacement in general for a canonicalization operation: https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-getfullpathnamew I also haven't tested, but is this not the same as std::path::absolute? |
Yes, but... I guess what most people actually want is sort of a combination of what |
Canonicalization is a lot more than confirming the file exists. You also need to resolve links, which I don't think GetFullPathNameW will do. I think most people probably want std::path::absolute, but I don't really want to generalize. |
If you have a legacy path, and push a component to it that makes it exceed the legacy path length limit, then it's not longer a valid path. In a sense, |
I've advocated for having a smarter |
Indeed and that was implemented: use std::path::PathBuf;
let mut p = PathBuf::from(r"\\?\server\share");
p.push("a/b");
println!("{}", p.display()); // prints `\\?\server\share\a\b` |
I don't think a legacy path -> UNC path conversion is possible using just the This conversion requires resolving links, since if the old legacy path used links it will break if it becomes a UNC path. Those link components would be interpreted verbatim instead of resolving the link. This means a conversion like this would have to check the filesystem, a fallible operation. A Maybe introducing a new Also, this conversion may not even be needed in some cases. There's an option to allow legacy paths to exceed MAX_PATH, though its a bit convoluted. Cargo seems to use this option. However, like with UNC paths, these longer paths are not compatible with all APIs. I would also think that they would fail if passed to a non-long-path-aware program. |
Can you give an example? If you're talking about symbolic links, those resolve fine in This works even in nontrivial cases with multiple symlinks in the same path, links to both files and directories, and where the symlinks themselves represent their targets in ways For symlinks, as well as any other kind of reparse point, treating it literally in the sense that is relevant to But it may be that I am just not understanding what you're describing. I thought of symlinks, but you may mean something else by "links". |
No, I was definitely thinking about symlinks. I even remember testing this behavior before my first post here using Maybe |
The benefit of having an improved Has there ever been a technically-breaking change that occurred in a minor Rust release because it was considered a bug fix or unintended behavior? |
The benefit of a smarter push would be that it would behave like normal for most paths. Only when it detects a situation where pushing a component would create a path the end user couldn't use as-is, it would convert it to UNC. So theoretically, nothing would change for cases where things aren't already broken and broken things might work in more cases. That isn't to say it's without issues. As an example, perhaps the application is long path aware and the UNC conversion isn't necessary. Or maybe the path is being sent to a long path aware application. Or like you said, maybe its simply for display and users may be confused by the prefix (though I'd argue showing a UNC path anyways in case they decide to copy/paste it in another non-long-path-aware program and wonder why it didn't work). Also, a smarter push would make it easier to rationalize de-UNC-ing automatically within
Rust has dropped breaking changes in minor releases in the past, though that's at a language-level. I don't recall any stdlib level, but I would guess that they exist. There was an issue with changing the layout of an stdlib type, but I consider the libraries to be at fault and the change was pretty well coordinated with a lot of warning. I think the idea is to minimize breaking changes and their impact. |
Hi, I hope this is the right forum/format to register this problem, let me know if it's not.
Today I tried to use
std::fs::canonicalize
to make a path absolute so that I could execute it withstd::process::Command
.canonicalize
returns so-called "UNC paths", which look like this:\\?\C:\foo\bar\...
(sometimes the?
can be a hostname).It turns out you can't pass a UNC path as the current directory when starting a process (i.e.,
Command::new(...).current_dir(unc_path)
). In fact, a lot of other apps will blow up if you pass them a UNC path: for example, Microsoft's owncl.exe
compiler doesn't support it: rust-lang/cc-rs#169It feels to me that maybe returning UNC paths from canonicalize is the wrong choice, given that they don't work in so many places. It'd probably be better to return a simple "absolute path", which begins with the drive letter, instead of returning a UNC path, and instead provide a separate function specifically for generating UNC paths for people who need them.
Maybe if this is too much of an incompatible change, a new function for creating absolute paths should be added to std? I'd bet, however, that making the change to
canonicalize
itself would suddenly make more software suddenly start working rather than suddenly break.The text was updated successfully, but these errors were encountered: