Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: only close fanotify events with a valid fd #2399

Merged
merged 4 commits into from
May 19, 2024

Conversation

tombl
Copy link
Contributor

@tombl tombl commented May 8, 2024

What does this PR do

fanotify returns events with the fd set to FAN_NOFD when the queue overflows.
Trying to close it panics with EBADF, so don't bother.

Checklist:

  • I have read CONTRIBUTING.md
  • I have written necessary tests and rustdoc comments
  • A change log has been added if this PR modifies nix's API

Copy link
Member

@asomers asomers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this behavior is in an official release, could you please add a changelog entry?

@tombl tombl requested a review from asomers May 12, 2024 13:27
Copy link
Member

@asomers asomers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good. What about tests? Would it be possible to write a test that triggers a queue overflow?

@SteveLauC
Copy link
Member

SteveLauC commented May 18, 2024

Would it be possible to write a test that triggers a queue overflow?

Theoretically, yes. One can change the limit value of the queue length through file /proc/sys/fs/fanotify/max_queued_events, so we can set the limit to a small number to trigger the overflow,
but this change is system-wide, i.e., it will affect other processes that are using the fanotify API. For CI, I think running such a test is fine, but testing this on contributors's machine would be kinda bad?

@asomers
Copy link
Member

asomers commented May 18, 2024

Would it be possible to write a test that triggers a queue overflow?

Theoretically, yes. One can change the limit value of the queue length through file /proc/sys/fs/fanotify/max_queued_events, so we can set the limit to a small number to trigger the overflow, but this change is system-wide, i.e., it will affect other processes that are using the fanotify API. For CI, I think running such a test is fine, but testing this on contributors's machine would be kinda bad?

And is the original value impractically high to reach?

@SteveLauC
Copy link
Member

And is the original value impractically high to reach?

Ahh, I think don't need to bother with that config file, the default value is 16384, it should be easy to reach it within 1-2 seconds with one thread repeatedly generating events

@SteveLauC
Copy link
Member

Ok, I tried to write a test for this issue today, now I kinda think it is not possible to do that.

We want the queue to be overflowed, so we have to trigger at least 16385 events, which should be easy with one thread looping reading files, however, Linux kernel would merge events that come from the same process (or thread if FAN_REPORT_TID is specified in fanotify_init(2)), so even though we can generate 16385 events using 1 thread, the Linux kernel will merge them into 1 event.

Then what about using 2 threads to make them access files alaternatively, then the events in the queue would theoretically be something like:

[from_worker1, from_worker2, from_worker1, from_worker2, ...]

Well, this won't work either, Linux kernel will reorder and coalesce them.

To not let Linux kernel merge events, we have 2 approaches:

  1. Create 16385 events that come from 16385 unique threads, which is not practical as

    1. That's too many threads (Linux has a system-wide threads limit that is related to the memory size, on my laptop (16GiB), the limit is 108685)
    2. What about we just create-and-drop the threads, Linux will reuse the thread IDs from the dropped threads.
  2. Use permission event rather than notification event

    Permission event won't be merged but it requires the listener to write a response back to the thread or this thread will be blocked forever, then it looks quite similar to the approach 1, we need tons of threads, which is not feasible.

@SteveLauC
Copy link
Member

Gentle ping on the PR author @tombl, do you have any idea on the test? :)

@tombl
Copy link
Contributor Author

tombl commented May 19, 2024

I've also been struggling to write a test. I hit this behavior in my application that uses fanotify, but that was with >16384 unique processes making changes. I think spawning 16385 processes that make changes would work, but seems excessive. I also think the spawn-and-exit trick with threads might be enough.

@SteveLauC
Copy link
Member

I also think the spawn-and-exit trick with threads might be enough.

Yeah, I reproduced the overflow issue with 16385 threads, but can we stop Linux kernel from reusing thread IDs?

@tombl
Copy link
Contributor Author

tombl commented May 19, 2024

I don't think it does reuse the thread IDs except when they overflow /proc/sys/kernel/pid_max.

It should only be an issue if someone lowers their pid_max below their max_queued_events.

@SteveLauC
Copy link
Member

I don't think it does reuse the thread IDs except when they overflow /proc/sys/kernel/pid_max.

From this post, it indeed is, thanks for showing me this, TIL!

Copy link
Member

@SteveLauC SteveLauC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@SteveLauC SteveLauC added this pull request to the merge queue May 19, 2024
Merged via the queue into nix-rust:master with commit 663506a May 19, 2024
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants