Skip to content
This repository has been archived by the owner on Jan 11, 2023. It is now read-only.

create cgroups needed by kubelet's --system-reserved and --kube-reserved flags #3915

Merged
merged 2 commits into from
Oct 6, 2018

Conversation

seanknox
Copy link
Contributor

What this PR does / why we need it:

kubelet's --system-reserved and --kube-reserved flags allow setting resource limits for system and kube components, respectively. They need cgroups to exist on the system before enabling. This patch enables them.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

Special notes for your reviewer:

If applicable:

  • documentation
  • unit tests
  • tested backward compatibility (ie. deploy with previous version, upgrade with this branch)

Release note:

@jackfrancis
Copy link
Member

@seanknox from a master node in the E2E run, kindly confirm it's what you want:

$ cat /etc/systemd/system.conf
#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.
#
# Entries in this file show the compile time defaults.
# You can change settings by editing this file.
# Defaults can be restored by simply deleting this file.
#
# See systemd-system.conf(5) for details.

[Manager]
#LogLevel=info
#LogTarget=journal-or-kmsg
#LogColor=yes
#LogLocation=no
#DumpCore=yes
#ShowStatus=yes
#CrashChangeVT=no
#CrashShell=no
#CrashReboot=no
#CPUAffinity=1 2
JoinControllers=cpu,cpuacct,cpuset,net_cls,net_prio,hugetlb,memory
#RuntimeWatchdogSec=0
#ShutdownWatchdogSec=10min
#CapabilityBoundingSet=
#SystemCallArchitectures=
#TimerSlackNSec=
#DefaultTimerAccuracySec=1min
#DefaultStandardOutput=journal
#DefaultStandardError=inherit
#DefaultTimeoutStartSec=90s
#DefaultTimeoutStopSec=90s
#DefaultRestartSec=100ms
#DefaultStartLimitInterval=10s
#DefaultStartLimitBurst=5
#DefaultEnvironment=
#DefaultCPUAccounting=no
#DefaultBlockIOAccounting=no
#DefaultMemoryAccounting=no
#DefaultTasksAccounting=no
#DefaultTasksMax=
#DefaultLimitCPU=
#DefaultLimitFSIZE=
#DefaultLimitDATA=
#DefaultLimitSTACK=
#DefaultLimitCORE=
#DefaultLimitRSS=
#DefaultLimitNOFILE=
#DefaultLimitAS=
#DefaultLimitNPROC=
#DefaultLimitMEMLOCK=
#DefaultLimitLOCKS=
#DefaultLimitSIGPENDING=
#DefaultLimitMSGQUEUE=
#DefaultLimitNICE=
#DefaultLimitRTPRIO=
#DefaultLimitRTTIME=

@@ -15,6 +15,13 @@ write_files:
content: !!binary |
{{WrapAsVariable "sshdConfig"}}

- path: "/etc/systemd/system.conf"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be added to the VMSS custom data as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my bad that's only for master custom data (there is no agent custom data for vmss)

@seanknox
Copy link
Contributor Author

@jackfrancis looks good. did a worker node also get the file?

@@ -21,6 +21,13 @@ write_files:
content: !!binary |
{{WrapAsVariable "sshdConfig"}}

- path: "/etc/systemd/system.conf"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

@jackfrancis
Copy link
Member

@seanknox there's one more file that needs this: kubernetesmastercustomdatavmss.yml (recently added, dedicated to VMSS master cloud-init implementation)

@codecov
Copy link

codecov bot commented Sep 27, 2018

Codecov Report

Merging #3915 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #3915      +/-   ##
==========================================
+ Coverage   57.43%   57.44%   +<.01%     
==========================================
  Files         109      109              
  Lines       16678    16681       +3     
==========================================
+ Hits         9579     9582       +3     
  Misses       6328     6328              
  Partials      771      771

@seanknox
Copy link
Contributor Author

@jackfrancis @CecileRobertMichon added! thanks.

@jackfrancis
Copy link
Member

Before:

  • #JoinControllers=cpu,cpuacct net_cls,net_prio
    After:
  • JoinControllers=cpu,cpuacct,cpuset,net_cls,net_prio,hugetlb,memory

Is that the intention @seanknox?

@seanknox
Copy link
Contributor Author

@jackfrancis that is correct sir.

@seanknox
Copy link
Contributor Author

Manually tested and confirmed the changes are present in /etc/systemd/system.conf.

@jackfrancis
Copy link
Member

/lgtm

@acs-bot
Copy link

acs-bot commented Oct 6, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jackfrancis, seanknox

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jackfrancis jackfrancis merged commit 514dfda into master Oct 6, 2018
@jackfrancis jackfrancis deleted the systemd-cgroups branch October 6, 2018 00:00
@ghost ghost removed the in progress label Oct 6, 2018
@jackfrancis jackfrancis mentioned this pull request Oct 6, 2018
3 tasks
juhacket added a commit to juhacket/acs-engine that referenced this pull request Mar 26, 2019
* add support for k8s v1.12.0-rc.1 (Azure#3872)

* Adding DeleteApp func to AzureClient and returning appObjectID in CreateApp (Azure#3869)

* Update AKS base image to 0.15.0 (Azure#3870)

* Disable AKS VHD for sovereign clouds (Azure#3874)

* disable outbound internet check (Azure#3878)

* Optimizing template conditional blocks in K8s templates (Azure#3871)

* Enforce windows password complexity requirements in acs-engine client… (Azure#3854)

* Enforce windows password complexity requirements in acs-engine client. Azure#2407

Added Windows Agent password complexity check as per guidelines in  https://docs.microsoft.com/en-us/windows/security/threat-protection/security-policy-settings/password-must-meet-complexity-requirements.

This will prevent generation of arm templates if password complexity for windows vm does not meet the complexity requirements.

* Addressing review comments:

1). Adding negative test cases to ensure passwords whose complexity is not met are getting rejected.
2). Error message reworded to convey the password complexity requirements enforced currently by the implemented regex

* Adding two test cases - password with 0 length and password same as username

* Replace deprecated Azure SDK method calls (Azure#3881)

* Adding test case for Generate Cluster ID (Azure#3879)

* Add availability zone support for masters (Azure#3864)

* use dockerhub.akscn.io in mooncake (Azure#3887)

* Update docs for AZ (Azure#3886)

* Update docs for AZ

* Address comments

* E2E - enable focused tests (Azure#3885)

* re-enable CSE 50 (Azure#3892)

* add support for k8s v1.12.0-rc.2 (Azure#3893)

See /~https://github.com/kubernetes/kubernetes/releases/tag/v1.12.0-rc.2

* Handle iterated subtest execution correctly (Azure#3894)

* freshen ubuntu image (Azure#3898)

* Actually allow cloudprovider rate limit / backoff disabling (Azure#3891)

* Extracting property values to make ARM output variables accessible (Azure#3877)

* Fixing unreported gosimple lints (Azure#3901)

* Update azure-sdk-for-go to v21.0.0 (Azure#3884)

* Update azure-sdk-for-go to v20.2.0

See /~https://github.com/Azure/azure-sdk-for-go/releases/tag/v20.2.0

* Update azure-sdk-for-go to v21.0.0

See /~https://github.com/Azure/azure-sdk-for-go/releases/tag/v21.0.0

* Windows dns connectivity - e2e tests (Azure#3760)

* Add shortcuts for some common command-line arguments (Azure#3904)

* Add zip package to VHD image  (Azure#3912)

* Add zip package to VHD image

* Alphabetize packages to install with apt-get

* Update AKS base image to 0.16.0 (Azure#3913)

* Stop ginkgo tests after first failure (Azure#3922)

* Perform JSON escaping of strings (Azure#3919)

* removed duplication of shellQuote function and added test cases. (Azure#3927)

* Change 'windowsVersion' to 'imageVersion' in docs for deploying specific windows version (Azure#3928)

* Add support for Kubernetes 1.9.11 (Azure#3934)

See /~https://github.com/kubernetes/kubernetes/releases/tag/v1.9.11

* Simplify some upgrader version cases (Azure#3924)

* Use `echo -n` to skip adding newline to external command output (Azure#3940)

* Add warning message for VMSS master deployments (Azure#3936)

* Add Kubernetes 1.12.0 to VHD image (Azure#3942)

* Migrating Get Addon By Name and Get Container Index By Name methods (Azure#3938)

* Fix accidentally shadowed variable in upgrade cluster. (Azure#3943)

* Docs Docs Docs! Adding windowsAgent apimodel parameters (Azure#3939)

* change default value for osImage (Azure#3944)

* Add support for Kubernetes 1.12.0 (Azure#3918)

* ip-masq-agent as addon (Azure#3916)

* Update AKS base image to 0.17.0 (Azure#3949)

Adds support binaries for Kubernetes 2.0.0 and 1.9.11.

* Move utility methods to the helper package (Azure#3948)

* exit 3 means resource group doesn’t exist (Azure#3954)

* AKS distro is for Kubernetes only (Azure#3951)

* use westus2 for swarm tests (Azure#3956)

* add basic distro tests for swarm, swarmmode, dcos (Azure#3957)

* Update go-dev tools image for go 1.11.1 (Azure#3947)

* Refactor VM prefix to template functions (Azure#3925)

* Migrating cloud spec config to api package (Azure#3953)

* Accelerated networking for Windows (Azure#3908)

* Add support for Kubernetes 1.12.1 (Azure#3963)

* Cleanup Packer directory after VHD build (Azure#3964)

* can't move the same file twice (Azure#3965)

* sudo sudo sudo (Azure#3967)

* retain existing AKS SNAT implementation (Azure#3966)

* create cgroups needed by kubelet's --system-reserved and --kube-reserved flags (Azure#3915)

* Dont set default distro when OSType is Windows (Azure#3950)

* vmss needs systemConf too (Azure#3970)

* update AKS VHD image to ver 0.18.0 (Azure#3969)

* Fix urls to gofi.sh (Azure#3973)

* Strengthen unit tests for cluster ID (Azure#3972)

* optimize customData payload by removing comments (Azure#3971)

* bump default from 1.8 to 1.10 (Azure#3946)

* Using local rand object to generate cluster ids (Azure#3978)

* Update node-labels to 1.6+ standard (Azure#3980)

* E2E: retry kubectl delete job (Azure#3981)

* E2E: actually fail when no InternalIP, ssh master tweaks, delete retries (Azure#3982)

* 1.12 uses coredns (Azure#3987)

* Refactor: Moving set defaults logic from package acsengine to package api (Azure#3974)

* Enable the kubelet-monitor systemd unit (Azure#3983)

* k8s component tests should happen before api tests (Azure#3991)

* add kubernetes1.12 example (Azure#3992)

* gosimple fixes (Azure#3993)

* Azure CNI v1.0.12 (Azure#3989)

* bump etcd version (Azure#3975)

* swarmm = swarm mode (Azure#3995)

* Update apiversion to make it consistent in k8s templates (Azure#3909)

* E2E: set stability iterations to 10 by default (Azure#3997)

* kube-dns 1.14.13 for k8s 1.8 and up (Azure#4004)

* update kubernetes-dashboard to 1.10 (Azure#4005)

* only schedule coredns pods on a linux node (Azure#4014)

* add coredns image reference to components versions map (Azure#3998)

* Remove redundant exechealthz references (Azure#4012)

* health-monitor script doesn’t require docker (Azure#4028)

* Updating the tag for omsagent container to use the latest production tag (Azure#4015)

* Replace docker images with the official releases. (Azure#4026)

* Fix linter errors reported by gosimple (Azure#4031)

* Split Windows setup scripts, prepare for cleanup and multiple CRI (Azure#3994)

* Image version bump (Azure#4033)

* Doc style, minor updates pass (Azure#4017)

* acsengine and deploy pass

* Clean up the main README

* pass over the kubernetes walkthrough doc - I think maybe just azure specific bits should stay here?

* Review changes

* Fix typo in prometheus-grafana-k8s extension (Azure#4039)

* Add support for Kubernetes 1.10.9 (Azure#4040)

See /~https://github.com/kubernetes/kubernetes/releases/tag/v1.10.9

* Add support for Kubernetes v1.13.0-alpha.1 (Azure#4036)

* Fix the Authorization and ManagedIdentity api versions  (Azure#4048)

* schedule ip-masq-agent on masters (Azure#4049)

* delay docker and kubelet health monitors for 30 mins (Azure#4050)

* Don't block on Kubernetes installation cleanup operations (Azure#4056)

* update to latest AKS VHD image (Azure#4054)

* set default masterSubnet value for custom VNET (Azure#4058)

* Updating oms agent tag to use the latest tag released (Azure#4059)

*  Don't test k8s 1.8 or 1.9 in CircleCI  (Azure#4061)

* Don't test k8s 1.8 or 1.9 in CircleCI

* Add k8s 1.13 jobs to build_and_test_master task

Also reordered the jobs so that maintainers are less likely to
forget about adding both Linux and Windows jobs.

* Azure CNI 1.0.12 should be in VHD image (Azure#4067)

* 16.04:latest by default for Ubuntu distro flows (Azure#4068)

* E2E: enable pod-svc connection test (Azure#4062)

* E2E: reuse long-running apache pod, HPA stabilization (Azure#4073)

* Update doc: keyvault-flexvol addon default flag (Azure#4072)

* use latest AKS VHD (Azure#4074)

* E2E: general hardening (Azure#4079)

* test scale down as well (Azure#4087)

* remove unused nsg for AKS (Azure#4085)

* fix error log message (Azure#4088)

* Fix issue where kubernetesDashboard params weren't being added despite e enabling the dashboard addon (Azure#4084)

* use latest tag for flexvol versions (Azure#4091)

* set FailureActions for docker, kubelet, kubeproxy (Azure#3905)

* no more default stability test iterations (Azure#4095)

* Update vmss master EncryptionWithExternalKms with userassignedidentity (Azure#4082)

* update azure-npm to v1.0.13 (Azure#4094)

* apt lock hygiene (Azure#4081)

* Add userassignedidentity for EncryptionWithExternalKms (Azure#4089)

* use azk8s.cn instead of akscn.io (Azure#4099)

* Fix calico for k8s 1.12 (Azure#4090)

* enable user-configurable Azure CNI URL (Azure#4097)

* Fix standard lb with vmss master (Azure#4101)

* Don't require vm tags (Azure#4100)

* Moby container runtime (Azure#3896)

* minor template optimization in kubernetesmastervarsvmss.t (Azure#4112)

* Fix potential nil pointer dereference when VM tags are empty (Azure#4117)

* add resilience to nvidia driver install/config (Azure#4113)

* don’t timeout for apt (Azure#4121)

* only install GPU if docker-engine (Azure#4122)

* Make --profiling user configurable (Azure#4114)

* suppressing sensitive openssl output (Azure#4123)

* Configure Docker Version on Windows (Azure#4119)

Tests passed. merging. thanks!

* disable kubelet health monitor (Azure#4127)

* Add support for Kubernetes v1.13.0-alpha.2 (Azure#4128)

See /~https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG-1.13.md#v1130-alpha2

* Add support for Kubernetes 1.11.4 (Azure#4130)

See /~https://github.com/kubernetes/kubernetes/releases/tag/v1.11.4

* adjust pagefile size (Azure#4098)

* Add support for Kubernetes 1.12.2 (Azure#4131)

See /~https://github.com/kubernetes/kubernetes/releases/tag/v1.12.2

* Add external custom yaml for manifests (Azure#4092)

* add AKSDockerEngine distro (Azure#4120)

* Adding DOCKER_API_VERSION workaround (Azure#4141)

* Adding DOCKER_API_VERSION workaround

* Fix extra character added

* restore exechealthz references (Azure#4145)

* re-use ILB test deployment (Azure#4147)

* Enable k8s features by default (Azure#4133)

* Enable k8s features by default

* Add test

* Optimize CSE + FeatureFlags option for run in background (Azure#4104)

* add VHD images w/ k8s 1.11.4 and k8s 1.12.2 (Azure#4146)

* use china mirror in binary downloading (Azure#4137)

* Make windows binary url configurable (Azure#4103)

* Move the role assignment to the ARM template and fix api versions (Azure#4032)

* Merging kubernetesmastervarsvmss into kubernetesmastervars (Azure#4116)

* virtualNetworkName is needed for vmss masters (Azure#4159)

* vmss masters listen on firstConsecutiveStaticIP (Azure#4162)

* rationalize addons/kube-system e2e checks (Azure#4166)

* vmss masters customvnet dependson lb (Azure#4167)

* Remove unreachable NSG code (Azure#4164)

* move k8s specific params to params_k8s.go (Azure#4156)

* delete empty file (Azure#4180)

* add Skip functionality for skipped tests (Azure#4181)

* Output kernel version during VHD build (Azure#4176)

* updated VHDs for aks and aks-docker-engine distros (Azure#4178)

* distinct outbound test for mooncake clusters (Azure#4169)

* update dependencies to point to latest k8s api release (Azure#4157)

* Pre-pull hyperkube in VHD (Azure#4174)

* use gcr.azk8s.cn for ip-masq-agent on Azure China (Azure#4190)

* remove unnecessary bytes (Azure#4187)

* cleanUpContainerImages (Azure#4195)

* Add AKS container images to VHD build script (Azure#4194)

* Update ip assignment and cert gen for vmss masters (Azure#4193)

* Azure CNI v1.0.13 (Azure#4197)

* Enable upgrade to next supported Kubernetes version (Azure#3968)

* reduce customData overhead via streamlined boilerplate (Azure#4183)

* Fix 1.8 cluster config (Azure#4200)

* More validations for custom vnet and vmss masters (Azure#4199)

* fix cilium cluster config (Azure#4202)

* Calico support for azure-vnet-ipam (Azure#4154)

* Update VHD image to 2018.11.06 (Azure#4201)

* move kubeserviceCidr params to windowsparams tpl (Azure#4203)

* Only cleanup AKS container images if cluster is not a hosted master cluster (Azure#4204)

* 2 units of errata (Azure#4205)

* setting default Images in addon defaults  instead of params_k8s.go (Azure#4208)

* mount xtables lock in proxy (Azure#4210)

* test outbound for URLs that we know we need (Azure#4211)

* imagePullPolicy: IfNotPresent for all versioned containers (Azure#4212)

* don’t save _output as artifacts (Azure#4214)

* fix standard lb (Azure#4217)

* ensure N series clusters get aks-docker-engine (Azure#4221)

* ensure addon image is overwritten during upgrade (Azure#4224)

* Update to Docker 18.09 for Windows (Azure#4223)

* ensure validate-dns job doesn’t already exist before creating (Azure#4230)

* remove empty customdata yml file (Azure#4231)

* adding back in double quotes one at a time (Azure#4235)

* azure npm addon has differently named pods (Azure#4237)

* E2E: ensure long-running-apache hpa doesn’t already exist before creating (Azure#4232)

* Adding c:\tmp as needed to pass Kubernetes tests (Azure#4240)

* up image to 1108 (Azure#4239)

* Add no outbound internet feature flag (Azure#4222)

* update azure-const.sh with new location of azure constants python file (Azure#4247)

* Tigera Technical Advisory TTA-2018-001 (Azure#4244)

* Enable pre-rendering of Container addons (Azure#4218)

* Make orchestrator command Windows aware (Azure#4142)

* Enable multiple Windows vmss agent pools - refactor pool names (Azure#3907)

* consistent use of kubernetes image base (Azure#4233)

* remove extraneous sed statements for mooncake (Azure#4253)

* Add exechealthz to 1.12/13 section as 1.11 or earlier (Azure#4252)

* append bug means we aren’t cleaning up! (Azure#4255)

* *string PrincipalID needs to be nil-guarded (Azure#4258)

* fix retrycmd_if_failure: $retries should be $r (Azure#4263)

* Add DockerEngine feature flag (Azure#4262)

* updating azureconst and adding PB6 skus (Azure#4265)

* Fix outbound connection check for master VMSS (Azure#4267)

* use mcr repos and disable smb flexvol addon (Azure#4266)

* bash func definition needs () without “function” (Azure#4269)

* Replace docker engine feature flag by existing cloud spec (Azure#4270)

* remove DockerEngine FeatureFlag (Azure#4275)

* E2E: rationalize node check + kube-system check, no kms (Azure#4273)

* install gpu drivers before extracting hyperkube (Azure#4276)

* [docs] Add documentation for GPU w/ docker-engine (Azure#4268)

* Windows e2e scale up / down test Fixes#3632 (Azure#4264)

* remove dead code. (Azure#4282)

* remove one extra english paragraph in zh-cn readme. (Azure#4281)

* rollback k8s client-go deps to v7.0.0 (Azure#4291)

* Fix issue caused by updating azure.json (Azure#4279)

* enable typha and add horizontal autoscaler (Azure#4290)

* feat(perf): Invoke-WebRequest much slower then browser download (Azure#4294)

* Set progresspreference to avoid progress bar and speed up downloads (Azure#4300)

* Ensure we do have an error before testing it (Azure#4301)

* update client-go to v9 (Azure#4296)

* Update to Azure-CNI v1.0.14 (Azure#4297)

* Make AvailabilitySet profile for master use Availability Zones (Azure#4286)

* Updates from aks-engine spike (Azure#4302)

* Fix prow set up

* e2e changes

* removing openshift artifacts

* accelerated networking rationalization, with tests

* remove additional sed statements for ip-masq addons

* Update go-dev tools image for go 1.11.2

* remove unused azconst methods

* add support PB6 vm skus

* update azure_const unit test

* update tiller versions in the recent versions of kubernetes

* VSTS VHD pipeline hosted ubuntu pool

* azureconst cruft

* scale: persist scale down in api model

* Add support for Kubernetes 1.11.5

* Fix docker-engine install in VHD pipeline

* remove IsOpenShift from E2E

* replace premature aks-engine reference

* make validate-headers doesn’t exist, revert rename

* all outbound checks are retried (Azure#4304)

* fix bunch of warnings for arm templates. (Azure#4285)

* Adding doc on how to set Azure CNI versions (Azure#4293)

* Support Windows Server 2019 and make it default (Azure#4299)

* fix malformed clusterautoscaler yaml bug (Azure#4322)

* update kubernetes api to 1.12.3 (Azure#4315)

* Prune non-go files from vendoring (Azure#4320)

* Prune non-go files from vendoring

* Work around errors from gosimple linter

* Bump cluster-autoscaler to recommended version for 1.11.5 (Azure#4314)

* Add kubelet system-reserved on Windows (Azure#3999)

* Add system-reserved on Windows

* Remove extra quotes

* Add system-reserved on Windows

* Update to match usage in #69960

* Re-escape quotes

* Just add system reserved as planned at 2Gb

* Bump VHD version to 2018.11.28 (Azure#4323)

* Add test for docker based workflow (ContainerInventory) (Azure#4198)

* Add copyright headers to source files (Azure#4324)

* Add Copyright header

* Add Copyright header to more files

* Rearrange finicky package comments and enforce validate-headers in CI

* Remove some extraneous diffs

* Rename Makefile target to be more descriptive

* deprecation notice (Azure#4335)

* Use 2018.12.03 VHD images (Azure#4333)

* we need newline (Azure#4341)

* [BUG] orchestratorVersion should not get changed for ACS scale apiVersion 2017-07-01 (Azure#4346)

* Enable Azure CNI 1.0.15 (Azure#4361)

* clarified docs (Azure#4362)

* chore: add config for "stale" bot service (Azure#4364)

* chore: add config for "stale" bot service

* fix: make PRs stale after a week

This repo is deprecated and shouldn't be getting any PRs.

* fix: rename stale config to have ".yml" extension
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants