Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use bindings for -D_FILE_OFFSET_BITS=64 #23

Merged
merged 1 commit into from
Nov 2, 2015

Conversation

alexcrichton
Copy link
Member

This should help "by default" dealing with large files on 32-bit Linux. Other platforms can opt in where necessary as well.

@alexcrichton
Copy link
Member Author

@alexcrichton
Copy link
Member Author

Hm actually, on second though I'm not sure that this is a good idea. This would require that when linking against C code the _FILE_OFFSET_BITS=64 define is passed. Otherwise when sharing a stat structure with another library it may not match up.

I'll explore exposing the *64 variants as separate types.

@alexcrichton alexcrichton force-pushed the lfs-support branch 3 times, most recently from efb3b46 to 84f2f4a Compare October 30, 2015 06:10
@cuviper
Copy link
Member

cuviper commented Oct 30, 2015

Well, if you flip the scenario and C code does compile with _FILE_OFFSET_BITS=64, which is strongly recommended, then sharing stat still doesn't match up. So then you'll end up suggesting that crates should really only use the explicit stat64 anyway, because their C might not have the non64 stat anyway, and really there's no good reason to use the old interfaces.

Maybe it's a little cleaner to be very explicit that it's 64-bit, but then that makes cross platform development tricky again. "Which stat am I supposed to use again on Windows/Linux/OSX/BSD?" Each has a best answer, called different things. I say just junk the old stuff and let Rust libc give a unified front of the best names.

Honestly I don't think any FFI should ever expose any of these types, exactly because they are so fraught with idiosyncrasies like this. I would call it a mistake even for a pure C library to have these types in its interface. This should be business just between a program/library and libc/kernel only. Don't let the madness spread! :)

But I fear I'm repeating things I've said already. Thanks for doing the actual work -- it looks like the new code organization has made this kind of type audit a lot easier. I'll try to give an actual code review tomorrow...

@cuviper
Copy link
Member

cuviper commented Oct 30, 2015

A passing idea of compromise, in case I can't convince you fully hide these gory details -- how about an lfs module which is simply a remapping of those "best" types into unified names? One stat to rule them all...

I suppose such an lfs could be a crate of its own, but given that libstd/etc itself will also be embedding libc and ought to use the best interfaces, it seems good to keep this knowledge close-knit.

@alexcrichton
Copy link
Member Author

Hm true, that's a good point about libraries which are auto-compiled with this support. Although I think you're right in that those who are aware of this problem may actually avoid exposing stat and related types in their API for precisely this reason (so they're not tied to the #define). So in that sense, leaving the defaults as the "normal compile" may make the most sense.

I'm not too worried about selecting what the best stat option is on every platform, that's what the standard library is for! In all seriousness, though, libc is not intended to be cross platform at all, and instead is simply targeted at exposing what the underlying system provides. Along these lines I'd also be wary of defining our own abstractions here such as a lfs module as that's not what's present on each system as well.

Overall I'm leaning a bit more towards this as-is which exposes the 64-bit support as a separate set of functions and types. This does mean that users of the API will have to #[cfg] to use the right one, but it means that users also don't have to worry about compiling all their C code with the right #define to actually be compatible with libc.

@cuviper
Copy link
Member

cuviper commented Oct 30, 2015

How about std::os::unix::raw and std::os::unix::fs::MetadataExt, especially as_raw_stat()? The standard library ought to start using the better stat64 interface internally, so will that trickle out to the user here?

@alexcrichton
Copy link
Member Author

Oh right, that's a good point, I always seem to forget about those...

I do agree that using stat64 in the standard library is probably best as an implementation detail, although that's a good question about what to expose. I think, however, that we could simply just name the type differently per-platform. E.g. on Linux it would return stat64 (and that's what we'd define), whereas on other platforms it could return stat.

@cuviper
Copy link
Member

cuviper commented Oct 30, 2015

Will you expose things like stat, stat64, and stati64 in Windows? It's only fair... 😉

In general what you've added looks good to me. I think the b32/mod.rs stat and stat64 may eventually need to be specialized for different architectures, because AFAIK what you have there is x86 specific, but for now it's probably fine. Auditing other LFS symbols, there are a few more gaps:

Resource limits also have LFS versions, primarily to support 64-bit values in RLIMIT_FSIZE. So that adds getrlimit64, setrlimit64, struct rlimit64, rlim64_t, and RLIM64_INFINITY.

Technically glob_t, glob, and globfree also have 64-bit LFS versions, because the __USE_GNU version of the function pointers need either stat or stat64. But these fields are now non-pub __unusedN: *mut c_void, so I guess it doesn't matter. Those function pointers are only used if the GLOB_ALTDIRFUNC flag is set, and that's not defined here at all.

Both fseek and ftell are naughty on all systems, not just Linux, as they only deal in c_long. The LFS solution is to add fseeko and ftello which use off_t instead, and of course Linux has explicit off64_t versions too. Windows just added fseeki64 and ftelli64.

There are other functions that could have LFS variants, but aren't represented in libc at all yet. e.g. fallocate, preadv, pwritev.

I noticed fpos_t, fpos64_t, and dirent are currently empty enums. These are not opaque types like FILE; the caller must allocate them somehow, so how is one supposed to use those functions? (But dirent64 is implemented in this PR, so maybe the others are just TODO.)

@alexcrichton
Copy link
Member Author

Thanks for looking this over! I think it's safe to say that we don't have to worry about the Windows story much here as I suspect that any project dealing with files on Windows falls into one of two categories:

  • A "non-robust" project just uses the standard library, which is itself robust as it just uses standard Windows APIs (none of these functions), which transitively makes the app robust for large files!
  • A "robust" project bindings APIs directly will also be using the Windows APIs rather than these CRT APIs.

Right now we do actually bind a silent translation of stat to stat64 on Windows, so this may mean that we want to bind both stat/stat64 over there as well for completeness, but other than that it's probably not the biggest of deals!

In general what you've added looks good to me. I think the b32/mod.rs stat and stat64 may eventually need to be specialized for different architectures, because AFAIK what you have there is x86 specific, but for now it's probably fine

Yeah so far it's verified for MIPS/ARM/x86, but if we add more platforms we can just break it apart as necessary.

Resource limits also have LFS versions, primarily to support 64-bit values in RLIMIT_FSIZE. So that adds getrlimit64, setrlimit64, struct rlimit64, rlim64_t, and RLIM64_INFINITY.

Right! I remember when compiling with -D_FILE_OFFSET_BITS=64 these changed, I've now added these bindings as well.

Technically glob_t, glob, and globfree also have 64-bit LFS versions, because the __USE_GNU version of the function pointers need either stat or stat64

Yeah I wasn't gonna worry too much about these, but seems harmless to go ahead and add.

Both fseek and ftell are naughty on all systems, not just Linux

Added! Elided Windows for now, however.

There are other functions that could have LFS variants, but aren't represented in libc at all yet. e.g. fallocate, preadv, pwritev.

Cool, I'll hold off on these until the others are added anyway. By the way, is there a document you're reading these from? Would be good to list somewhere!

I noticed fpos_t, fpos64_t, and dirent are currently empty enums. These are not opaque types like FILE; the caller must allocate them somehow, so how is one supposed to use those functions? (But dirent64 is implemented in this PR, so maybe the others are just TODO.)

Ah yeah dirent was just filled in and the others are basically a TODO to fill out, I'll add a comment.

If you can't tell, most people don't use these bindings, they just use the standard library instead :)

@cuviper
Copy link
Member

cuviper commented Oct 30, 2015

By the way, is there a document you're reading these from? Would be good to list somewhere!

I'm peering at the actual libc on Fedora like so:

nm --dynamic --defined-only /lib/libc.so.6 | grep '\w*64\w*$'

You could look at /lib64/libc.so.6 too, but in that case foo64 symbols are just aliases to plain foo.

I'm also referencing lintian's nice list of lfs-symbols. These are asserted not to be used by any 32-bit binaries, to be sure one is either explicitly using foo64 versions or implicitly getting there via _FILE_OFFSET_BITS=64. (64-bit binaries are always LFS safe on Linux, so they can do what they want.)

If you can't tell, most people don't use these bindings, they just use the standard library instead :)

Which is good! So the job is mostly to provide that solid base for the standard library to use. :)

@cuviper
Copy link
Member

cuviper commented Oct 31, 2015

Oy, musl doesn't like glob64_t. But I was curious what they do for LFS in general, and they just force their whole world! Everybody gets 64-bit off_t et al. They even shunt all syscalls to 64-bit variants, to "fixup legacy 32-bit-vs-lfs64 junk". (Must be nice to not worry about legacy binaries!)

But I think this means a 32-bit musl target would go really badly right now...

@alexcrichton
Copy link
Member Author

Yeah I have a feeling there's a few things that'll need to be fixed up for 32-bit MUSL, but we can always worry about all that later :)

Let's see if this build is fixed...

They're all added under the `foo64` names with the `foo64` types added as well,
so they still need to be explicitly chosen.
@alexcrichton
Copy link
Member Author

Aha, yay! Got some green builds now.

To confirm, this all sounds ok to you @cuviper? The major decision basically being to expose bindings as if you compiled C code with no flags rather than "what we think are the best set of C flags"

@cuviper
Copy link
Member

cuviper commented Nov 2, 2015

It's not my preference, but I'll accept your decision. I guess we'll see in the standard library how annoying it is to #[cfg]-select the best interfaces.

Note, there are also other flags which may come up where "best" is more subjective, like _GNU_SOURCE. For instance, there are different GNU and POSIX behaviors for basename, which are linked as basename and __xpg_basename respectively. Here are the various glibc feature macros for reference, although this doesn't enumerate the actual functions affected.

@alexcrichton
Copy link
Member Author

Ok! In that case I'll merge, thanks @cuviper!

alexcrichton added a commit that referenced this pull request Nov 2, 2015
Use bindings for -D_FILE_OFFSET_BITS=64
@alexcrichton alexcrichton merged commit 14bf9de into rust-lang:master Nov 2, 2015
@alexcrichton alexcrichton deleted the lfs-support branch November 2, 2015 21:14
dlrobertson pushed a commit to dlrobertson/libc that referenced this pull request Oct 31, 2017
Tweak dependencies of Arch packages
lvllvl added a commit to lvllvl/libc that referenced this pull request Jan 7, 2025
# This is the 1st commit message:

chore: add labels to FIXMEs

# This is the commit message rust-lang#2:

change label for .field FIXMEs
# This is the commit message rust-lang#3:

add fixme csv

# This is the commit message rust-lang#4:

Update fixmes.csv
# This is the commit message rust-lang#5:

Update fixmes.csv
# This is the commit message rust-lang#6:

Update fixmes.csv
# This is the commit message rust-lang#7:

Update fixmes.csv
# This is the commit message rust-lang#8:

Update fixmes.csv
# This is the commit message rust-lang#9:

Update fixmes.csv
# This is the commit message rust-lang#10:

Update fixmes.csv
# This is the commit message rust-lang#11:

Update fixmes.csv
# This is the commit message rust-lang#12:

Update fixmes.csv
# This is the commit message rust-lang#13:

Update fixmes.csv
# This is the commit message rust-lang#14:

Update fixmes.csv
# This is the commit message rust-lang#15:

Update fixmes.csv
# This is the commit message rust-lang#16:

Update fixmes.csv
# This is the commit message rust-lang#17:

Update fixmes.csv
# This is the commit message rust-lang#18:

updates to csv

# This is the commit message rust-lang#19:

Update fixmes.csv
# This is the commit message rust-lang#20:

Update fixmes.csv
# This is the commit message rust-lang#21:

Update fixmes.csv
# This is the commit message rust-lang#22:

Update fixmes.csv remove hurd"
# This is the commit message rust-lang#23:

Update fixmes.csv remove "'s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants