Skip to content

Commit

Permalink
libcontainer: add support for Intel RDT/CAT in runc
Browse files Browse the repository at this point in the history
About Intel RDT/CAT feature:
Intel platforms with new Xeon CPU support Intel Resource Director Technology
(RDT). Cache Allocation Technology (CAT) is a sub-feature of RDT, which
currently supports L3 cache resource allocation.

This feature provides a way for the software to restrict cache allocation to a
defined 'subset' of L3 cache which may be overlapping with other 'subsets'.
The different subsets are identified by class of service (CLOS) and each CLOS
has a capacity bitmask (CBM).

For more information about Intel RDT/CAT can be found in the section 17.17
of Intel Software Developer Manual.

About Intel RDT/CAT kernel interface:
In Linux kernel, the interface is defined and exposed via "resource control"
filesystem, which is a "cgroup-like" interface.

Comparing with cgroups, it has similar process management lifecycle and
interfaces in a container. But unlike cgroups' hierarchy, it has single level
filesystem layout.

Intel RDT "resource control" filesystem hierarchy:
mount -t resctrl resctrl /sys/fs/resctrl
tree /sys/fs/resctrl
/sys/fs/resctrl/
|-- info
|   |-- L3
|       |-- cbm_mask
|       |-- min_cbm_bits
|       |-- num_closids
|-- cpus
|-- schemata
|-- tasks
|-- <container_id>
    |-- cpus
    |-- schemata
    |-- tasks

For runc, we can make use of `tasks` and `schemata` configuration for L3 cache
resource constraints.

The file `tasks` has a list of tasks that belongs to this group (e.g.,
<container_id>" group). Tasks can be added to a group by writing the task ID
to the "tasks" file  (which will automatically remove them from the previous
group to which they belonged). New tasks created by fork(2) and clone(2) are
added to the same group as their parent. If a pid is not in any sub group, it
Is in root group.

The file `schemata` has allocation bitmasks/values for L3 cache on each socket,
which contains L3 cache id and capacity bitmask (CBM).
	Format: "L3:<cache_id0>=<cbm0>;<cache_id1>=<cbm1>;..."
For example, on a two-socket machine, L3's schema line could be `L3:0=ff;1=c0`
which means L3 cache id 0's CBM is 0xff, and L3 cache id 1's CBM is 0xc0.

The valid L3 cache CBM is a *contiguous bits set* and number of bits that can
be set is less than the max bit. The max bits in the CBM is varied among
supported Intel Xeon platforms. In Intel RDT "resource control" filesystem
layout, the CBM in a group should be a subset of the CBM in root. Kernel will
check if it is valid when writing. e.g., 0xfffff in root indicates the max bits
of CBM is 20 bits, which mapping to entire L3 cache capacity. Some valid CBM
values to set in a group: 0xf, 0xf0, 0x3ff, 0x1f00 and etc.

For more information about Intel RDT/CAT kernel interface:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/x86/intel_rdt_ui.txt

An example for runc:
There are two L3 caches in the two-socket machine, the default CBM is 0xfffff
and the max CBM length is 20 bits. This configuration assigns 4/5 of L3 cache
id 0 and the whole L3 cache id 1 for the container:

"linux": {
	"resources": {
		"intelRdt": {
			"l3CacheSchema": "L3:0=ffff0;1=fffff"
		}
	}
}

Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
  • Loading branch information
xiaochenshen committed Feb 9, 2017
1 parent ce14e9e commit 04e7cec
Show file tree
Hide file tree
Showing 14 changed files with 890 additions and 11 deletions.
25 changes: 20 additions & 5 deletions events.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,12 @@ type event struct {

// stats is the runc specific stats structure for stability when encoding and decoding stats.
type stats struct {
CPU cpu `json:"cpu"`
Memory memory `json:"memory"`
Pids pids `json:"pids"`
Blkio blkio `json:"blkio"`
Hugetlb map[string]hugetlb `json:"hugetlb"`
CPU cpu `json:"cpu"`
Memory memory `json:"memory"`
Pids pids `json:"pids"`
Blkio blkio `json:"blkio"`
Hugetlb map[string]hugetlb `json:"hugetlb"`
IntelRdt intelRdt `json:"intelRdt"`
}

type hugetlb struct {
Expand Down Expand Up @@ -95,6 +96,12 @@ type memory struct {
Raw map[string]uint64 `json:"raw,omitempty"`
}

type intelRdt struct {
// The read-only default "schema" in root, for reference
L3CacheSchemaRoot string `json:"l3CacheSchemaRoot,omitempty"`
L3CacheSchema string `json:"l3CacheSchema,omitempty"`
}

var eventsCommand = cli.Command{
Name: "events",
Usage: "display container events such as OOM notifications, cpu, memory, and IO usage statistics",
Expand Down Expand Up @@ -226,6 +233,14 @@ func convertLibcontainerStats(ls *libcontainer.Stats) *stats {
for k, v := range cg.HugetlbStats {
s.Hugetlb[k] = convertHugtlb(v)
}

is := ls.IntelRdtStats
if is == nil {
return &s
}
s.IntelRdt.L3CacheSchemaRoot = is.IntelRdtRootStats.L3CacheSchema
s.IntelRdt.L3CacheSchema = is.IntelRdtStats.L3CacheSchema

return &s
}

Expand Down
4 changes: 4 additions & 0 deletions libcontainer/configs/cgroup_unix.go
Original file line number Diff line number Diff line change
Expand Up @@ -121,4 +121,8 @@ type Resources struct {

// Set class identifier for container's network packets
NetClsClassid uint32 `json:"net_cls_classid_u"`

// Intel RDT: the schema for L3 cache id and capacity bitmask (CBM)
// Format: "L3:<cache_id0>=<cbm0>;<cache_id1>=<cbm1>;..."
IntelRdtL3CacheSchema string `json:"intel_rdt_l3_cache_schema"`
}
21 changes: 21 additions & 0 deletions libcontainer/container_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ import (
"github.com/opencontainers/runc/libcontainer/cgroups"
"github.com/opencontainers/runc/libcontainer/configs"
"github.com/opencontainers/runc/libcontainer/criurpc"
"github.com/opencontainers/runc/libcontainer/intelrdt"
"github.com/opencontainers/runc/libcontainer/resourcemanager"
"github.com/opencontainers/runc/libcontainer/system"
"github.com/opencontainers/runc/libcontainer/utils"
Expand Down Expand Up @@ -62,6 +63,9 @@ type State struct {

// Container's standard descriptors (std{in,out,err}), needed for checkpoint and restore
ExternalDescriptors []string `json:"external_descriptors,omitempty"`

// Intel RDT "resource control" filesystem path
IntelRdtPath string `json:"intel_rdt_path"`
}

// Container is a libcontainer container object.
Expand Down Expand Up @@ -160,6 +164,13 @@ func (c *linuxContainer) Stats() (*Stats, error) {
if err != nil {
return stats, newSystemErrorWithCause(err, "getting container stats from cgroups")
}
if intelRdtManager, ok := c.resourceManagers["intelrdt"]; ok == true {
intelRdtStats, err := intelRdtManager.GetStats()
if err != nil {
return stats, newSystemErrorWithCause(err, "getting container's Intel RDT stats")
}
stats.IntelRdtStats = intelRdtStats.(*intelrdt.Stats)
}
for _, iface := range c.config.Networks {
switch iface.Type {
case "veth":
Expand Down Expand Up @@ -387,11 +398,16 @@ func (c *linuxContainer) newSetnsProcess(p *Process, cmd *exec.Cmd, parentPipe,
if err != nil {
return nil, err
}
intelRdtPath, err := intelrdt.GetIntelRdtPath(c.ID())
if err != nil {
intelRdtPath = ""
}
// TODO: set on container for process management
p.consoleChan = make(chan *os.File, 1)
return &setnsProcess{
cmd: cmd,
cgroupPaths: c.resourceManagers["cgroups"].GetPaths(),
intelRdtPath: intelRdtPath,
childPipe: childPipe,
parentPipe: parentPipe,
config: c.newInitConfig(p),
Expand Down Expand Up @@ -1260,6 +1276,10 @@ func (c *linuxContainer) currentState() (*State, error) {
startTime, _ = c.initProcess.startTime()
externalDescriptors = c.initProcess.externalDescriptors()
}
intelRdtPath, err := intelrdt.GetIntelRdtPath(c.ID())
if err != nil {
intelRdtPath = ""
}
state := &State{
BaseState: BaseState{
ID: c.ID(),
Expand All @@ -1269,6 +1289,7 @@ func (c *linuxContainer) currentState() (*State, error) {
Created: c.created,
},
CgroupPaths: c.resourceManagers["cgroups"].GetPaths(),
IntelRdtPath: intelRdtPath,
NamespacePaths: make(map[configs.NamespaceType]string),
ExternalDescriptors: externalDescriptors,
}
Expand Down
83 changes: 80 additions & 3 deletions libcontainer/container_linux_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ import (

"github.com/opencontainers/runc/libcontainer/cgroups"
"github.com/opencontainers/runc/libcontainer/configs"
"github.com/opencontainers/runc/libcontainer/intelrdt"
"github.com/opencontainers/runc/libcontainer/resourcemanager"
)

Expand All @@ -19,6 +20,13 @@ type mockCgroupManager struct {
paths map[string]string
}

type mockIntelRdtManager struct {
pids []int
allPids []int
stats *intelrdt.Stats
path string
}

func (m *mockCgroupManager) GetPids() ([]int, error) {
return m.pids, nil
}
Expand Down Expand Up @@ -51,6 +59,40 @@ func (m *mockCgroupManager) Freeze(state configs.FreezerState) error {
return nil
}

func (m *mockIntelRdtManager) GetPids() ([]int, error) {
return m.pids, nil
}

func (m *mockIntelRdtManager) GetAllPids() ([]int, error) {
return m.allPids, nil
}

func (m *mockIntelRdtManager) GetStats() (interface{}, error) {
return m.stats, nil
}

func (m *mockIntelRdtManager) Apply(pid int) error {
return nil
}

func (m *mockIntelRdtManager) Set(container *configs.Config) error {
return nil
}

func (m *mockIntelRdtManager) Destroy() error {
return nil
}

func (m *mockIntelRdtManager) GetPaths() map[string]string {
paths := make(map[string]string)
paths["intelrdt"] = m.path
return paths
}

func (m *mockIntelRdtManager) Freeze(state configs.FreezerState) error {
return nil
}

type mockProcess struct {
_pid int
started string
Expand Down Expand Up @@ -121,6 +163,14 @@ func TestGetContainerStats(t *testing.T) {
},
},
}
container.resourceManagers["intelrdt"] = &mockIntelRdtManager{
pids: []int{1, 2, 3},
stats: &intelrdt.Stats{
IntelRdtStats: intelrdt.IntelRdtStats{
L3CacheSchema: "L3:0=ffff0;1=fff00",
},
},
}
stats, err := container.Stats()
if err != nil {
t.Fatal(err)
Expand All @@ -131,13 +181,22 @@ func TestGetContainerStats(t *testing.T) {
if stats.CgroupStats.MemoryStats.Usage.Usage != 1024 {
t.Fatalf("expected memory usage 1024 but recevied %d", stats.CgroupStats.MemoryStats.Usage.Usage)
}
if intelrdt.IsIntelRdtEnabled() {
if stats.IntelRdtStats == nil {
t.Fatal("intel rdt stats are nil")
}
if stats.IntelRdtStats.IntelRdtStats.L3CacheSchema != "L3:0=ffff0;1=fff00" {
t.Fatalf("expected L3CacheSchema L3:0=ffff0;1=fff00 but recevied %s", stats.IntelRdtStats.IntelRdtStats.L3CacheSchema)
}
}
}

func TestGetContainerState(t *testing.T) {
var (
pid = os.Getpid()
expectedMemoryPath = "/sys/fs/cgroup/memory/myid"
expectedNetworkPath = "/networks/fd"
pid = os.Getpid()
expectedMemoryPath = "/sys/fs/cgroup/memory/myid"
expectedNetworkPath = "/networks/fd"
expectedIntelRdtPath = "/sys/fs/resctrl/myid"
)
container := &linuxContainer{
id: "myid",
Expand Down Expand Up @@ -170,6 +229,15 @@ func TestGetContainerState(t *testing.T) {
"memory": expectedMemoryPath,
},
}
container.resourceManagers["intelrdt"] = &mockIntelRdtManager{
pids: []int{1, 2, 3},
stats: &intelrdt.Stats{
IntelRdtStats: intelrdt.IntelRdtStats{
L3CacheSchema: "L3:0=ffff0;1=fff00",
},
},
path: expectedIntelRdtPath,
}
container.state = &createdState{c: container}
state, err := container.State()
if err != nil {
Expand All @@ -188,6 +256,15 @@ func TestGetContainerState(t *testing.T) {
if memPath := paths["memory"]; memPath != expectedMemoryPath {
t.Fatalf("expected memory path %q but received %q", expectedMemoryPath, memPath)
}
if intelrdt.IsIntelRdtEnabled() {
path := state.IntelRdtPath
if path == "" {
t.Fatal("intel rdt path should not be empty")
}
if intelRdtPath := path; intelRdtPath != expectedIntelRdtPath {
t.Fatalf("expected intel rdt path %q but received %q", expectedIntelRdtPath, intelRdtPath)
}
}
for _, ns := range container.config.Namespaces {
path := state.NamespacePaths[ns.Type]
if path == "" {
Expand Down
23 changes: 23 additions & 0 deletions libcontainer/factory_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ import (
"github.com/opencontainers/runc/libcontainer/cgroups/systemd"
"github.com/opencontainers/runc/libcontainer/configs"
"github.com/opencontainers/runc/libcontainer/configs/validate"
"github.com/opencontainers/runc/libcontainer/intelrdt"
"github.com/opencontainers/runc/libcontainer/resourcemanager"
"github.com/opencontainers/runc/libcontainer/utils"
)
Expand Down Expand Up @@ -74,6 +75,19 @@ func Cgroupfs(l *LinuxFactory) error {
return nil
}

// IntelRdtfs is an options func to configure a LinuxFactory to return
// containers that use the Intel RDT "resource control" filesystem to
// create and manage Intel Xeon platform shared resources (e.g., L3 cache).
func IntelRdtFs(l *LinuxFactory) error {
l.NewIntelRdtManager = func(config *configs.Config, id string) intelrdt.Manager {
return &intelrdt.IntelRdtManager{
Config: config,
Id: id,
}
}
return nil
}

// TmpfsRoot is an option func to mount LinuxFactory.Root to tmpfs.
func TmpfsRoot(l *LinuxFactory) error {
mounted, err := mount.Mounted(l.Root)
Expand Down Expand Up @@ -138,6 +152,9 @@ type LinuxFactory struct {

// NewCgroupsManager returns an initialized cgroups manager for a single container.
NewCgroupsManager func(config *configs.Cgroup, paths map[string]string) cgroups.Manager

// NewIntelRdtManager returns an initialized Intel RDT manager for a single container.
NewIntelRdtManager func(config *configs.Config, id string) intelrdt.Manager
}

func (l *LinuxFactory) Create(id string, config *configs.Config) (Container, error) {
Expand Down Expand Up @@ -189,6 +206,9 @@ func (l *LinuxFactory) Create(id string, config *configs.Config) (Container, err
}
resourceManagers := make(map[string]resourcemanager.ResourceManager)
resourceManagers["cgroups"] = l.NewCgroupsManager(config.Cgroups, nil)
if intelrdt.IsIntelRdtEnabled() {
resourceManagers["intelrdt"] = l.NewIntelRdtManager(config, id)
}
c.resourceManagers = resourceManagers
c.state = &stoppedState{c: c}
return c, nil
Expand Down Expand Up @@ -220,6 +240,9 @@ func (l *LinuxFactory) Load(id string) (Container, error) {
}
resourceManagers := make(map[string]resourcemanager.ResourceManager)
resourceManagers["cgroups"] = l.NewCgroupsManager(state.Config.Cgroups, state.CgroupPaths)
if intelrdt.IsIntelRdtEnabled() {
resourceManagers["intelrdt"] = l.NewIntelRdtManager(&state.Config, id)
}
c.resourceManagers = resourceManagers
c.state = &loadedState{c: c}
if err := c.refreshState(); err != nil {
Expand Down
28 changes: 27 additions & 1 deletion libcontainer/factory_linux_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,32 @@ func TestFactoryNew(t *testing.T) {
}
}

func TestFactoryNewIntelRdt(t *testing.T) {
root, rerr := newTestRoot()
if rerr != nil {
t.Fatal(rerr)
}
defer os.RemoveAll(root)
factory, err := New(root, Cgroupfs, IntelRdtFs)
if err != nil {
t.Fatal(err)
}
if factory == nil {
t.Fatal("factory should not be nil")
}
lfactory, ok := factory.(*LinuxFactory)
if !ok {
t.Fatal("expected linux factory returned on linux based systems")
}
if lfactory.Root != root {
t.Fatalf("expected factory root to be %q but received %q", root, lfactory.Root)
}

if factory.Type() != "libcontainer" {
t.Fatalf("unexpected factory type: %q, expected %q", factory.Type(), "libcontainer")
}
}

func TestFactoryNewTmpfs(t *testing.T) {
root, rerr := newTestRoot()
if rerr != nil {
Expand Down Expand Up @@ -163,7 +189,7 @@ func TestFactoryLoadContainer(t *testing.T) {
if err := marshal(filepath.Join(root, id, stateFilename), expectedState); err != nil {
t.Fatal(err)
}
factory, err := New(root, Cgroupfs)
factory, err := New(root, Cgroupfs, IntelRdtFs)
if err != nil {
t.Fatal(err)
}
Expand Down
Loading

0 comments on commit 04e7cec

Please sign in to comment.