-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PSI support (threshold for swapin/swapout) #100
Comments
What was he supposed to do? Please describe in detail the algorithm you want to implement. |
The rate of filling of a swap partition should be a trigger? |
Seems like the psi (pressure stall information) was merged recently: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=eb414681d5a07d28d2ff90dc05f69ec6b232ebd2 |
Do you plan to support PSI in earlyoom 1.3+? |
Once my distribution has it, yes! (Fedora) |
Congratulations! |
:) Reading through https://lwn.net/Articles/759781/ , one problem with psi may be that you have to wait for the thrashing to happen before you can react. Maybe earlyoom will look at both psi and free mem (but much lower threshhold, maybe 1%) |
Next q.: |
I guess same as before, highest oom_score |
10 sec maximum |
PSI support implemented in nohang. It works pretty fast. The wait is not too long. I just added 2 thresholds: It uses |
The next problem: PSI mem avg10 slowly falls after corrective action (may need more then 20 sec sleep to prevent multiple killings). |
Good catch. I guess this means we should use "total" instead of the averaged number. |
Seems like |
https://youtu.be/YYJ9Af8Syyg |
Eureka! |
Yes exactly, you check if it increases |
avg = (total1 - total0) / (time1 - time0) / 10000 Good news: it can works very quick and reacts if swapping starts. Bad news: it may give false positives. I just ran FF, and it was killed. Maybe we should use avg1 or avg2 as middle way. |
I will try this settings: |
http://okturing.com/src/5636/body example of output with not very quick memhogging
|
nohang output with avg3 |
sigterm_psi = 60 output:
|
I saw false positives even if |
PSI is very slow to bounce back. |
Info: |
I have tested a few things with PSI, and it seems pretty difficult to do "the right thing". The difficult example is this: /~https://github.com/rfjakob/earlyoom/blob/master/contrib/oomstat/loadshift.txt |
The maximum PSI values depend on the type of swapspace. Not very long-term exceedances are normal. Only long-term threshold exceedances are a clear pathology and require a reaction. The optimal settings are very individual. The right things are:
Offer the following default settings:
|
@rfjakob see
|
I have read through hakavlad/nohang#25, oh boy, PSI is such a headache. Although in this case it seems that muqss is interacting badly with PSI. |
Only on desktops. But don't worry, rfjakob-sensei: PSI should be very useful for servers. Disable PSI by default, add warnings to the documentation, and PSI will no longer be a headache. |
No.
Seems like this problem was with swap on btrfs. |
Please note that MuQSS is a valid possibility still, the linked issue didn't verify that it wasn't. Nor does the linked bugreport by @hakavlad to claim BTRFS at fault have anything to do with MuQSS or PSI. MuQSS only has partial support for cgroups afaik, and PSI relies on cgroups support, so it's more likely due to MuQSS lacking proper support for cgroups that PSI requires. While the user also uses BTRFS, there is no verification from that user by using a different filesystem to confirm that it was caused by BTRFS alone. The linked bug report also has users with other filesystems other than BTRFS claiming they are experiencing the same/similar problems. |
Could you provide links, please? |
PSI is not directly dependent on сgroups. PSI provides metrics for each group with cgroup_v2. Without cgroup_v2 PSI works well and provides files only in |
Probably the last one is unrelated though. Another comment(55 I think), pointed out their issue was related to CPU activity, which reminds me of another issue on github about BTRFS maintenance scripts which openSUSE at least shipped by default. As it's a non-traditional filesystem, and can require treating it differently along with maintenance, I could understand how it adds additional complications/complexities that can make issues more pronounced. If you just want to find issues/reports of users with bad performance related to swap, there are plenty of those without being specific to BTRFS. This bug report has nothing to do with PSI or earlyoom/nohang? So it's relevance to identifying the problem a user experienced with PSI not working as expected is low. The user would need to verify with a different filesystem that the symptoms were resolved, or better yet and easier verify that MuQSS kernel is not to blame by trying a kernel without MuQSS, just plain CFS and vanilla kernel that enables PSI.
Ah.. my mistake. But the log the user shared does seem to indicate that their system might indicate cgroup v2 support, but as MuQSS afaik only provides a partial implementation, what happens if PSI acknowledges cgroup v2 support and tries to use unimplemented features? It'd presumably break or behave incorrectly as the user indicates?
That suggests that might be the case with that BTRFS daemon utility? |
It's time to use PSI metrics to improve interactivity. In this post I want to tell you about psi2log and psi-top scripts from nohang package. These simple tools can help you to conveniently measure and log various PSI metrics. See https://pagure.io/fedora-workstation/issue/98#comment-631355. And PSI is not goes crazy, PSI metrics just provide stall information.
PSI works incorrect with out-of-tree shedulers like MuQSS. But it is not a problem. |
I offer you to add optional PSI support. For example, earlyoom --psi some,40. So, I offer hardcode avg10 metrics and 15 sec reclaim time. User can choose some/full and threshold. Checking -m should prevent false-positives with a lot of available memory. And of course, this shoulds be optional. |
nohang with PSI: no problem: https://www.youtube.com/watch?v=Y6GJqFE_ke4 |
I think now that nohang has psi support, there is no need for earlyoom to duplicate the functionality. |
FWIW, the reason I like earlyoom is because it is written in C, and I thereby feel more confident that its own swap usage behavior can be more obviously controlled. I was thereby interested in PSI support one day being added to earlyoom, but am not terribly interested in switching to nohang. (Maybe my thought there is daft, and nohang can be fully trusted... but, if that is the case, why does this project still exist at all? Wouldn't it be better to just redirect the user base and developer interest by formally anointing a successor?) |
Let's take a step back for a second: Where do you want to use psi support? On a server? And what would you expect to be better than checking available mem/swap? |
earlyoom is only aware of the amount of free swap, but I've seen many times that with large swap, system stop responding due to heavy swapin/swapout, while earlyoom did nothing since there's still plenty of free swap. Setting a low free swap minimum is no better than completely disabling swap and letting kernel oom to handle low memory situation.
Ref #34
The text was updated successfully, but these errors were encountered: