-
Notifications
You must be signed in to change notification settings - Fork 553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
config-linux: add Intel RDT/MBA Linux support #932
Conversation
config-linux.md
Outdated
@@ -466,19 +466,37 @@ The following parameters can be specified to set up the controller: | |||
The following parameters can be specified for the container: | |||
|
|||
* **`l3CacheSchema`** *(string, OPTIONAL)* - specifies the schema for L3 cache id and capacity bitmask (CBM). | |||
If `l3CacheSchema` is set, runtimes MUST write the value to the `schemata` file in the `<container-id>` directory discussed in `intelRdt`. | |||
* **`memBwSchema`** *(array of objects, OPTIONAL)* - specifies the schema of memory bandwidth (b/w) percentage per L3 cache id. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not make this simpler with just or something of the like:
"bandwidth" : {
"id": 0,
"value" 70,
}
you don't have to prefix everything when they are part of an object, its implied that it is an attribute of the containing object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not make this simpler with just or something of the like:
It makes sense. Thank you.
4bc6ff6
to
dd23a4c
Compare
config-linux.md
Outdated
Consider a two-socket machine with two L3 caches where the default CBM is 0xfffff and the max CBM length is 20 bits. | ||
Tasks inside the container only have access to the "upper" 80% of L3 cache id 0 and the "lower" 50% L3 cache id 1: | ||
Consider a two-socket machine with two L3 caches where the default CBM is 0x7ff and the max CBM length is 11 bits, | ||
and minimum memory b/w of 10% with a memory b/w granularity of 10%. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to abbreviate “bandwidth” in the docs. It won't take much longer to read, and slightly reduces the chance of confusion if we spell it out, instead of using “b/w”.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to abbreviate “bandwidth” in the docs. It won't take much longer to read, and slightly reduces the chance of confusion if we spell it out, instead of using “b/w”.
I agree with you. I will correct this.
config-linux.md
Outdated
|
||
If `l3CacheSchema` is not set, runtimes MUST NOT write to `schemata` files in any `resctrl` pseudo-filesystems. | ||
If neither `l3CacheSchema` nor `memBandwidth` is set, runtimes MUST NOT write to `schemata` files in any `resctrl` pseudo-filesystems. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these two set/unset lines are quite right. The old l3CacheSchema
requirements were sufficient because the l3CacheSchema
values were expected to contain the L3:
prefix. However, dd23a4c only mentions the MB:
prefix and related formatting the a non-normative Go comment.
If we go the structured route, (which is where you seem to be taking memBandwidth
), I think we need to make the MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;...
formatting a normative part of the spec. We'd also want to define the interactions between them (e.g. “the formatted lines MAY be written to schemata
in any order, either with a single write or a series of writes”). And I think we'd want to structure l3CacheSchema
(as l3CacheSchema2?
), both for consistency, and to avoid the ambiguity of:
{
"l3CacheSchema": "MB:0=50;1=50",
"memBandwith": [{"id": 0, "value": 25}, {"id": 1, "value": 75}]
}
If we go the unstructured route (where l3CacheSchema
already is), then we can just have an opaque array of strings. Or even a single string:
"l3CacheSchema": "L3:0=3;1=c\nMB:0=50;1=50"
because JSON allows escaped newlines. At that point, the l3CacheSchema
property name would be stale so it would be nice to rename it to something more generic (schemata
?) in v2. But I think it already does what you need here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these two set/unset lines are quite right. The old l3CacheSchema requirements were sufficient because the l3CacheSchema values were expected to contain the L3: prefix. However, dd23a4c only mentions the MB: prefix and related formatting the a non-normative Go comment.
Actually, CAT and MBA features are orthogonal, each one doesn't depend on the other. We could see (1) either "L3 cache" or "memory bandwidth" schema available or (2) both "L3 cache" and "memory bandwidth" schemata are available with specific hardware and kernel configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we go the structured route, (which is where you seem to be taking memBandwidth), I think we need to make the MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;... formatting a normative part of the spec. ...
If we go the unstructured route (where l3CacheSchema already is), then we can just have an opaque array of strings. Or even a single string: ...
Originally I proposed unstructured JSON format (opaque string: e.g., "memBwSchema": "MB:0=20;1=70"
) for memory bandwidth schema in opencontainers/runc#1596.
@cyphar prefers structured JSON format (array of object: e.g., "memBandwith": [{"id": 0, "value": 20}, {"id": 1, "value": 70}]
). @cyphar's justification in opencontainers/runc#1596 (comment) makes sense (easy for spec validation, use interactively, easy to extend and etc.). In this PR I chose the structured JSON format.
The unstructured (opaque string) format also has some advantages: opencontainers/runc#1596 (comment). It will be easier to joint L3 cache and memory bandwidth schema into a single string, and write once to schemata file. I'd like to hear other reviewers' opinions and suggestions.
For L3 cache schema, it is already used opaque string format (e.g., "l3CacheSchema": "L3:0=7f0;1=1f"
). Anyway, I have plan to unify the formats for L3 cache and memory bandwidth.
If we choose the structured JSON format, how about this mapping from the schemata file content to JSON config?
cat /sys/fs/resctrl/container_id/schemata
L3:0=7f0;1=1f
MB:0=20;1=70
"linux": {
"intelRdt": {
"l3Cache": [{"id": 0, "value": "7f0"}, {"id": 1, "value": "1f"}],
"memBandwidth": [{"id": 0, "value": 20}, {"id": 1, "value": 70}]
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unstructured (opaque string) format also has some advantages…
I agree that there are tradeoffs between structured data and isolating users from the underlying kernel API vs. unstructured data that provides more direct access to the underlying kernel API. And I don't really care if we go with a structured approach or an unstructured approach. What doesn't make sense to me is going with an unstructured approach for l3CacheSchema
and then going for a structured approach in memBandwidth
. It seems like the “we should isolate config authors from the details of the kernel API” (or not) would be a decision that applied equally to everything under intelRdt
.
Anyway, I have plan to unify the formats for L3 cache and memory bandwidth.
I don't think the spec is a good place to play with the config format, because now that we've cut 1.0.0 with the existing l3CacheSchema
, we need to continue to support it until this spec hits v2. If we decided to go with your unified, structured approach (or the unified, unstructured schemata
I'd floated earlier, or some other approach), we'd need to continue to support the deprecated l3CacheSchema
throughout the 1.x spec lifetime and define how the deprecated property interacted with the new properties. But we certainly don't want to stamp a 1.1 with something that gets deprecated in 1.2, because then we have to support both the 1.0 l3CacheSchema
and the 1.1 otherDeprecatedThing
until 2.0. So if you have plans for something unified, I think they belong in this PR, and not in follow-up work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wking
@cyphar @crosbymichael
/cc @vishh @mrunalp @vbatts @dqminh @philips @tianon @hqhq
I don't think the spec is a good place to play with the config format, because now that we've cut 1.0.0 with the existing l3CacheSchema, we need to continue to support it until this spec hits v2. If we decided to go with your unified, structured approach (or the unified, unstructured schemata I'd floated earlier, or some other approach), we'd need to continue to support the deprecated l3CacheSchema throughout the 1.x spec lifetime and define how the deprecated property interacted with the new properties. But we certainly don't want to stamp a 1.1 with something that gets deprecated in 1.2, because then we have to support both the 1.0 l3CacheSchema and the 1.1 otherDeprecatedThing until 2.0. So if you have plans for something unified, I think they belong in this PR, and not in follow-up work.
I understood the consistency and compatibility requirement in runtime-spec. This is my plan:
- Firstly, I will address "L3 cache" and "memory bandwidth" with
unified
formats in this single PR. - To support existed "l3CacheSchema" throughout 1.x spec lifetime, and to avoid confusion of deprecated property, how about using similar "unified unstructured schemata" format for memory bandwidth in this PR and throughout 1.x runtime-spec and runc ?
"linux": {
"intelRdt": {
"l3CacheSchema": "L3:0=7f0;1=1f",
"memBwSchema": "MB:0=20;1=70"
}
}
- If we have requirement to change all Intel RDT resources into "structured schemata" in spec 2.0 or newer version, I could open a new PR to slightly rework on appropriate time slot in the phase of spec 2.0.
"linux": {
"intelRdt": {
"l3Cache": [{"id": 0, "value": "7f0"}, {"id": 1, "value": "1f"}],
"memBandwidth": [{"id": 0, "value": 20}, {"id": 1, "value": 70}]
}
}
Please help review and comment. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is my plan…
The plan sounds reasonable to me. For the 1.x structure, we probably want at least SHOULDs for the L3:
and MB:
prefixes and internal newlines, and clear requirements for how we handle cases where those SHOULDs are violated. MUSTs for those requirements would be nice, and we can do that for memBandwidth
, but it may be too late to add new MUSTs to l3CacheSchema
.
And there's also no reason we couldn't specify the 2.x structure now, as long as we document that the *Schema
property overrides the analogous 2.x property when the 1.x property is set. In that approach, the plan would be to have this PR deprecate l3CacheSchema
and add l3Cache
and memBandwidth
. We'd remove l3CacheSchema
in 2.0.
@wking I have updated this PR according to @wking 's suggestion. In this PR, the spec for memory bandwidth (memBwSchema) keeps Example:
|
config-linux.md
Outdated
@@ -466,19 +466,26 @@ The following parameters can be specified to set up the controller: | |||
The following parameters can be specified for the container: | |||
|
|||
* **`l3CacheSchema`** *(string, OPTIONAL)* - specifies the schema for L3 cache id and capacity bitmask (CBM). | |||
If `l3CacheSchema` is set, runtimes MUST write the value to the `schemata` file in the `<container-id>` directory discussed in `intelRdt`. | |||
* **`memBwSchema`** *(string, OPTIONAL)* - specifies the schema of memory bandwidth percentage per L3 cache id. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get a line after this like:
The value MUST start with
MB:
and MUST NOT contain newlines.
I'd like a similar line with MUST -> SHOULD for l3CacheSchema
(or also with MUST if maintainers are comfortable with the compat impact).
Then down below, I'd like two paragraphs like:
If both
l3CacheSchema
andmemBwSchema
are set, runtimes MUST write the combined value to theschemata
file in the<container-id>
directory discussed inintelRdt
. Ifl3CacheSchema
contains a line beginning withMB:
, the value written toschemata
MUST be the non-MB:
line(s) froml3CacheSchema
and the line frommemBWSchema
.If either
l3CacheSchema
ormemBwSchema
is set, runtimes MUST write the value to theschemata
file in the<container-id>
directory discussed inintelRdt
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value MUST start with MB: and MUST NOT contain newlines.
Yes, that is right. I will add this line for memBwSchema
.
I'd like a similar line with MUST -> SHOULD for l3CacheSchema (or also with MUST if maintainers are comfortable with the compat impact).
Make sense to me. I will add this line for l3CacheSchema
.
If l3CacheSchema contains a line beginning with MB:, the value written to schemata MUST be the non-MB: line(s) from l3CacheSchema and the line from memBWSchema.
This is not the case. l3CacheSchema
will never contains a line beginning with MB:
. It only contains a line beginning with L3:
and memBWSchema
only contains a line beginning with MB:
. If both l3CacheSchema
and memBwSchema
are set, runc runtimes will handle this case in Set() to write the combined value: "l3CacheSchema
"+"\n"+ "memBwSchema
", e.g., "L3:0=7f0;1=1f\nMB:0=20;1=70" to schemata
file.
The paragraphs may like:
* **`l3CacheSchema`** *(string, OPTIONAL)* - specifies the schema for L3 cache id and capacity bitmask (CBM).
The value MUST start with `L3:` and MUST NOT contain newlines.
* **`memBwSchema`** *(string, OPTIONAL)* - specifies the schema of memory bandwidth percentage per L3 cache id.
The value MUST start with `MB:` and MUST NOT contain newlines.
If both `l3CacheSchema` and `memBwSchema` are set, runtimes MUST write the combined value to the `schemata` file in the `<container-id>` directory discussed in `intelRdt`.
If either `l3CacheSchema` or `memBwSchema` is set, runtimes MUST write the value to the `schemata` file in the `<container-id>` directory discussed in `intelRdt`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If
l3CacheSchema
contains a line beginning withMB:
, the value written to schemata MUST be the non-MB:
line(s) froml3CacheSchema
and the line frommemBWSchema
.This is not the case.
l3CacheSchema
will never contains a line beginning withMB:
. It only contains a line beginning withL3:
...
That would have been nice, but I see no such MUST in 1.0.0, which means it may be too late to add it now. To add it before 2.0 (where we intend to drop l3CacheSchema
anyway), you'd need to make a "nobody was actually doing that, despite it being technically legal" argument. And withought an exhaustive list of config authors and a relatively straightforward compat spec, it may not be worth trying to build that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would have been nice, but I see no such MUST in 1.0.0, which means it may be too late to add it now.
How about removing these two lines?
- The value MUST start with "L3:" and MUST NOT contain newlines.
- The value MUST start with "MB:" and MUST NOT contain newlines.
In my opinion, it looks like runtime implementation detail. The validity of l3CacheSchema
and memBwSchema
will be checked in runc, and the validity of string writing to schemata file will be also checked by kernel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, it looks like runtime implementation detail. The validity of
l3CacheSchema
andmemBwSchema
will be checked in runc…
I think all configs not explicitly denied by the spec are allowed. The 1.0.0 spec contains no L3:
prefix requirement for l3CacheSchema
, and it contains no anti-newline requirement for l3CacheSchema
. And runc currently writes the opaque l3CacheSchema
value to the schemata
file. So folks writing 1.0.0 configs can already set l3CacheSchema
to L3:0=7f0;1=1f\nMB:0=20;1=70
or MB:0=20;1=70
and have a compliant config which is supported by runc. I don't want to break those users, which means we need to either:
a. Make a case for those users not existing, or
b. Add spec wording to SHOULD against those configs, and MUST the runtime procedure for handling them if/when they occur.
b seems much more straightforward to me.
If we do not do either and those users happen to exist, having runc start erroring for l3CacheSchema
like L3:0=7f0;1=1f\nMB:0=20;1=70
is just pulling the rug out from under those users in order to satisfy an aesthetic concern about consistency between the existing l3CacheSchema
and the new memBwSchema
. I think there is a benefit to consistency (it makes getting acquainted with the spec easier for newcomers), but it's not worth breaking existing users to achieve it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a. Make a case for those users not existing, or
b. Add spec wording to SHOULD against those configs, and MUST the runtime procedure for handling them if/when they occur.
b seems much more straightforward to me.
I'd like to replace wording MUST with SHOULD for l3CacheSchema
and new memBwSchema
.
The value SHOULD start with "L3:" and SHOULD NOT contain newlines.
The value SHOULD start with "MB:" and SHOULD NOT contain newlines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to replace wording MUST with SHOULD for
l3CacheSchema
and newmemBwSchema
.
I prefer MUSTs for memBwSchema
; why would you SHOULD there? It's a new property, so we do not have l3CacheSchema
's backwards-compat constraints.
And either way I think you need something like the paragraphs I recommended earlier to require runtimes to handle l3CacheSchema
SHOULD violations in a consistent way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer MUSTs for memBwSchema
I will correct this.
And either way I think you need something like the paragraphs I recommended earlier to require runtimes to handle l3CacheSchema SHOULD violations in a consistent way.
I will add this.
c52fcee
to
e6aead6
Compare
Thank you for your scrupulous review and good suggestions. The documentation for IntelRdt
The following parameters can be specified for the container:
ExampleConsider a two-socket machine with two L3 caches where the default CBM is 0x7ff and the max CBM length is 11 bits, Tasks inside the container only have access to the "upper" 7/11 of L3 cache on socket 0 and the "lower" 5/11 L3 cache on socket 1, "linux": {
"intelRdt": {
"l3CacheSchema": "L3:0=7f0;1=1f",
"memBwSchema": "MB:0=20;1=70"
}
} |
config-linux.md
Outdated
@@ -460,25 +460,34 @@ The following parameters can be specified to set up the controller: | |||
**`intelRdt`** (object, OPTIONAL) represents the [Intel Resource Director Technology][intel-rdt-cat-kernel-interface]. | |||
If `intelRdt` is set, the runtime MUST write the container process ID to the `<container-id>/tasks` file in a mounted `resctrl` pseudo-filesystem, using the container ID from [`start`](runtime.md#start) and creating the `<container-id>` directory if necessary. | |||
If no mounted `resctrl` pseudo-filesystem is available in the [runtime mount namespace](glossary.md#runtime-namespace), the runtime MUST [generate an error](runtime.md#errors). | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I liked having the not-set sentence in its own paragraph to make it even more obvious that the no-mounted line belonged to the if-set case. But I don't care all that much, and am fine with this PR landing with this paragraph-splitting-newline removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wking
The reason why I delete this blank line is that: I found a nit in original master branch code: two or more spaces don't work at the end of "no-mounted" sentence for newline, and the "no-set" sentence is in code syntax highlighting style.
I could fix this issue to make "not-set" sentence in its own paragraph:
(without indent/four spaces at the beginning of "not-set" sentence)
If no mounted `resctrl` pseudo-filesystem is available in the [runtime mount namespace](glossary.md#runtime-namespace), the runtime MUST [generate an error](runtime.md#errors).
If `intelRdt` is not set, the runtime MUST NOT manipulate any `resctrl` pseudo-filesystems.
config-linux.md
Outdated
* **`memBwSchema`** *(string, OPTIONAL)* - specifies the schema of memory bandwidth percentage per L3 cache id. | ||
The value MUST start with `MB:` and MUST NOT contain newlines. | ||
|
||
If both `l3CacheSchema` and `memBwSchema` are set, runtimes MUST write the combined value to the `schemata` file in the <container-id> directory discussed in intelRdt. If `l3CacheSchema` contains a line beginning with `MB:`, the value written to `schemata` file MUST be the non-`MB:` line(s) from `l3CacheSchema` and the line from `memBWSchema`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd drop the indent here to make it a paragraph following the list (it's currently a paragraph inside the memBwSchema
list entry). This (and the following paragraphs) are about both l3CacheSchema
and memBwSchema
, and the memBwSchema
list entry seems like too specific a location for that information.
You're also dropping the previous backticks around <container-id>
, and we want to keep those because Markdown is a subset of HTML, so an unticked version will be invisible (as an unrecognized HTML tag).
And you have two sentences in one line, when prefer one sentence per line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wking
Thank you. I will fix these issues.
schema/config-linux.json
Outdated
}, | ||
"memBwSchema": { | ||
"id": "https://opencontainers.org/schema/bundle/linux/intelRdt/memBwSchema", | ||
"type": "string" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use a regexp for validation (like we do here and other places)? That would be something like:
"memBwSchema": {
"id": "https://opencontainers.org/schema/bundle/linux/intelRdt/memBwSchema",
"type": "string",
"pattern": "^MB:[^\n]*$"
}
Athough it might need \\n
instead of \n
. We can unit-test the schema on this point with an entry in here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. I'd like to add this.
We could use "./schema/validate" to validate the format of memBwSchema
in config.json.
I have a try, \\n
looks to be the right pattern.
I have updated this PR to commit dd78a43 It addressed these comments:
|
dd78a43
to
8b70854
Compare
@crosbymichael @vishh @mrunalp @vbatts @dqminh @philips @tianon @hqhq Hi Maintainers, This PR is part of opencontainers/runc#1596 |
8b70854
to
557b7f3
Compare
@vishh @mrunalp @vbatts @dqminh @philips @tianon @hqhq I have rebased this PR to current master branch to address conflict issues with merged #988 . This PR is runtime-spec part of proposal opencontainers/runc#1596 |
557b7f3
to
5d9aa69
Compare
Add support for Intel Resource Director Technology (RDT) / Memory Bandwidth Allocation (MBA). Add memory bandwidth resource constraints in Linux-specific configuration. In this PR, the spec for memory bandwidth (memBwSchema) keeps the same format as existed spec for L3 cache (l3CacheSchema) for consistency and compatibility in runtime-spec 1.x. Example: "linux": { "intelRdt": { "closID": "guaranteed_group", "l3CacheSchema": "L3:0=7f0;1=1f", "memBwSchema": "MB:0=20;1=70" } } This is the prerequisite of this runc proposal: opencontainers/runc#1596 For more information about Intel RDT/MBA, please refer to: opencontainers/runc#1596 Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
LGTM thanks @vbatts |
Add support for Intel Resource Director Technology (RDT) /
Memory Bandwidth Allocation (MBA). Add memory bandwidth resource
constraints in Linux-specific configuration.
In this PR, the spec for memory bandwidth (memBwSchema) keeps
the same format as existed spec for L3 cache (l3CacheSchema)
for consistency and compatibility in runtime-spec 1.x.
Example:
This is the prerequisite of this runc proposal:
opencontainers/runc#1596
For more information about Intel RDT/MBA, please refer to:
opencontainers/runc#1596
Signed-off-by: Xiaochen Shen xiaochen.shen@intel.com