Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process struct should carry an fd and use openat #125

Closed
ghost opened this issue May 1, 2021 · 5 comments · Fixed by #171
Closed

Process struct should carry an fd and use openat #125

ghost opened this issue May 1, 2021 · 5 comments · Fixed by #171

Comments

@ghost
Copy link

ghost commented May 1, 2021

Under Linux, procfs can suffer from TOCTTOU caused by pid reuse. This is evident if you take the self_memory example and have it run on external process. I'll show some output from strace to illustrate the problem:

openat(AT_FDCWD, "/proc/1234/statm", O_RDONLY|O_CLOEXEC) = 3
... snip ...
openat(AT_FDCWD, "/proc/1234/status", O_RDONLY|O_CLOEXEC) = 3

In between calls to openat, our process could be pre-empted, and then the pid could potentially be killed and re-used by another process before our process resumes. This would result in the call to Process::status() returning wrong information, when really it should return an error because the original process 1234 it was referencing is dead. To fix the problem, you'd want the syscalls to look more like this:

openat(AT_FDCWD, "/proc/1234", O_RDONLY|O_CLOEXEC) = 3
openat(3, "statm", O_RDONLY|O_CLOEXEC) = 4
openat(3, "status", O_RDONLY|O_CLOEXEC) = 4

The easiest way to do this would be to replace every area where file paths are joined with an intentional call to openat. The net effect of this would be that Process would no longer store root, it would store a handle to the directory in procfs. This should also be done when reading from entries in /proc/[pid]/task/[tid]. Another change that should probably happen along with this is that process::all_processes() should return an Iterator instead of a Vec, that way it will prevent a huge number of file descriptors being opened at once. This is an API break so I'm proposing it here before submitting any pull request.

@eminence
Copy link
Owner

eminence commented May 3, 2021

Hi,

Thanks for opening this issue. At first glance, this sounds like a nice improvement. Your example with "statm" and "status" is well motivated. I would be definitely willing to see a pull request in this area. One reason that the current API for all_processes() returns a Vec instead of an iterator is to reduce the chances of a process disappearing if doing a slow iteration. I'm not sure if openat really helps for all_processes(), though I guess it'll be a more consistent "snapshot"

You have a good point about needed to manage the large number of open FDs. How does returning an iterator help? If the caller immediately collects everything into a Vec, we would still have an open FD for every running process, right?

@eminence
Copy link
Owner

eminence commented May 3, 2021

Also, I don't see any support for openat in the stdlib. Would you rely on nix for this functionality? Or use libc?

@ghost
Copy link
Author

ghost commented May 4, 2021

One reason that the current API for all_processes() returns a Vec instead of an iterator is to reduce the chances of a process disappearing if doing a slow iteration.

That will ensure you get a list of pids that was active at the time of iteration but will not prevent the pids from becoming stale. The method I'm suggesting will be slower; that's unavoidable because it's moving from just holding the pid number to now having to do a syscall to grab a reference to a kernel resource. But it will also technically be more correct, the iterator can skip over pids that became stale during processing of the directory.

If we want to preserve the quick iteration, I think it would make more sense to have the method return a Vec of u32 (or more accurately pid_t). That will communicate more what is going on: the pid is just a number that could become stale, you could then pass it to a constructor later that will try to open a handle and return a Result<Process> accordingly.

How does returning an iterator help? If the caller immediately collects everything into a Vec, we would still have an open FD for every running process, right?

Yes. The idea is that for implementing something like top, programs would map over the iterator and pull out the information they need into a different structure. On Linux the default max fds open is 1024, and it's definitely possible to have more than 1024 processes open, so we probably want to avoid pushing up against that limit.

Also, I don't see any support for openat in the stdlib. Would you rely on nix for this functionality? Or use libc?

libc can probably be used if trying to keep dependencies to a minimum, the nix crate is not much more than some minimal safe wrappers around libc functions. Both would work.

@lparcq
Copy link

lparcq commented May 12, 2021

The result from all_processes is always wrong whichever method you use. It's better to minimize the number of opened file descriptors. Btw, it makes no sense to use a program like top on a system where pid sare reused faster than reading the result of all_processes().
However, using openat for Process::new would be much more reliable on heavily loaded systems.

@ghost
Copy link
Author

ghost commented May 12, 2021

It won't necessarily be wrong if a small number of fds are kept open, for example a top program might want to keep the fd to the currently highlighted process open. That process could then be signaled in a race-free way using pidfd. If the pids are re-used faster than top can read them (i.e. the openat to /proc/<pid>/stat succeeds but /proc/<pid>/cmdline fails) then that process would be discarded and not shown in the output since it's known that the process is already dead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants