Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing container start when cgroup subsystem not found #341

Closed
wants to merge 1 commit into from

Conversation

rajasec
Copy link
Contributor

@rajasec rajasec commented Oct 19, 2015

With current runc implementation, if the cgroup subsystem is not mounted in the host, subsequent directories are not created inside the container. But Set interface is called to write to respective subsystem attributes ( if the runtime.json had the right values). In that case container failed to start.

Example
In runtime.json, I've added the cpu shares and blkio related parameters
After unmounting the cpu and blkio subsystem from host, the container failed to start due to following errors
FATA[0000] Container start failed: [8] System error: no such directory for cpu.shares

As directories are not created if subsystem is not mounted, Let us try not to set the values for the respective subsystems ( if not mounted) and gracefully start the container instead of failing

Signed-off-by: Rajasekaran rajasec79@gmail.com

Signed-off-by: Rajasekaran <rajasec79@gmail.com>
@mrunalp
Copy link
Contributor

mrunalp commented Oct 19, 2015

I don't think that we should silently continue on if some subsystems aren't mounted. Failing is better.

@crosbymichael
Copy link
Member

Yes, we had this same discussion in the past when this repository was libcontainer. I can try to find the discussion again for history but there are some subsystems that are optional but we want a hard fail because if you don't have some of these cgroups then we cannot guarantee some of the aspects of a container.

@rajasec
Copy link
Contributor Author

rajasec commented Oct 19, 2015

@crosbymichael @mrunalp
Thanks for your quick review on this PR.

As per your comment, we want hard fail,
we should fail at the beginning itself when subsystem is not found or mounted.

I meant to say when subsystem is not mounted or not available, don't fail during set operation, fail during Join operation, where it returns cgroup subsystem not found.
dir, err := d.join("blkio")
here itself we can fail cgroups.IsNotFound(err)
We don't need to allow to set and then fail.

With current implementation, memory subsystem is handled this way too(silently continues). is it different than other subsystems ?

Moreover with current implementation
Failure will happen only if we set some appropriate values in runtime.json for not mounted cgroup subsystems, otherwise we wont see the failure during start of container.

If it is 0 or null, container start ignores during set operations ( for not mounted cgroup subsystems)

@crosbymichael
Copy link
Member

@rajasec is there a reason why you are hitting this issue? What is the root cause?

@rajasec
Copy link
Contributor Author

rajasec commented Oct 20, 2015

@crosbymichael
Problem:
As ubuntu kernel using the deadline scheduler as default, blkio setting was not effective inside the container ( as blkio works only for CFQ), So I'was unmounting the unnecessary file system(blkio) to be mounted. During that operations I have removed/unmounted cpu too ( instead of blkio)
But the problem started from here.
Since I use to specify cpu shares ( non-zero value) in my runtime.json, container failed to start due to cpu shares
Gave the following error
FATA[0000] Container start failed: [8] System error: no such directory for cpu.shares

Root cause
cpu.go
dir, err := d.join("cpu")
if err != nil && !cgroups.IsNotFound(err) {
return err
}
After executing above code, err gave the "mount point for cpu not found" and cgroups.IsNotFound (err) also gives for proper value ( as the subsystem is not found)
since we are negating with cgroups.IsNotFound(err) this error got ignored.
This error got ignored& bypassed, even though the subsystem is not found.

So it went ahead to set the values
if err := s.Set(dir, d.c); err != nil {
return err

it failed during setting up of cpu.shares, because the directory or mountpoint does not exists.

This PR handles, if specific subsystem is not found in the host, Don't try to set the values instead return back ( similar to memory subsystem)
it is not about gracefully starting the container.

@hqhq
Copy link
Contributor

hqhq commented Oct 20, 2015

@rajasec @crosbymichael The story was:

@hqhq
Copy link
Contributor

hqhq commented Oct 20, 2015

@rajasec You are right memory subsystem is different from other optional subsystems, that's wrong, I opened a PR #343 to fix this.

@crosbymichael
Copy link
Member

@hqhq what should we do with this PR then?

@hqhq
Copy link
Contributor

hqhq commented Oct 26, 2015

@crosbymichael The main problem @rajasec got in this PR is intended, the other problem he mentioned in the comments is fixed by #343 , so I think we can close this PR then.

@rajasec
Copy link
Contributor Author

rajasec commented Oct 26, 2015

@hqhq
Before we close this PR, one thing I would need few confirmation

I agree that we need hard failure.

But if subsystem is not found or not mounted, my question is why we go and fail in set operation instead we can fail at join operation itself ? in that case , we don't even allow the code to do set of values.

Also cgroups.IsNotFound(err) function is not having much meaning, as we always negate when subsystem not found

@hqhq
Copy link
Contributor

hqhq commented Oct 26, 2015

But if subsystem is not found or not mounted, my question is why we go and fail in set operation instead we can fail at join operation itself ? in that case , we don't even allow the code to do set of values.

I agree fail at set is not perfect, if you want to fail at join or right after join, you need a function to get if that subsystem requested some valid values, something like function memoryAssigned I added in #343 , that doesn't seems necessary, so I think it's hard to tell which one is better. Fail at set is the simplest, it didn't add much code or complex. But I'm open with other options, you are always welcome to fire a PR, let's see if we can make it better :)

Also cgroups.IsNotFound(err) function is not having much meaning, as we always negate when subsystem not found

Sorry I don't get it, what do you mean "we always negate when subsystem not found"? If subsystem is not found and we are not using it (by assigning any valid values in runtime.json), we don't negate it.

@rajasec
Copy link
Contributor Author

rajasec commented Nov 3, 2015

@hqhq @crosbymichael
As you are saying the changes are much intentional, I'm closing this PR.
Thanks for your reviews.

@rajasec rajasec closed this Nov 3, 2015
stefanberger pushed a commit to stefanberger/runc that referenced this pull request Sep 8, 2017
…e-rlimits-to-process

Move rlimits to process
stefanberger pushed a commit to stefanberger/runc that referenced this pull request Sep 8, 2017
Through 0bcb711 (Merge pull request opencontainers#341 2016-03-10).

Signed-off-by: W. Trevor King <wking@tremily.us>
Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants