Skip to content
This repository has been archived by the owner on Jan 11, 2023. It is now read-only.

ACS-Engine 0.23.1 deploys clusters with non-functional CoreDns #4011

Closed
MarcPow opened this issue Oct 12, 2018 · 6 comments · Fixed by #4014
Closed

ACS-Engine 0.23.1 deploys clusters with non-functional CoreDns #4011

MarcPow opened this issue Oct 12, 2018 · 6 comments · Fixed by #4014
Assignees

Comments

@MarcPow
Copy link

MarcPow commented Oct 12, 2018

Is this a request for help?: Yes


Is this an ISSUE or FEATURE REQUEST? (choose one): ISSUE


What version of acs-engine?: 0.23.1


Orchestrator and version (e.g. Kubernetes, DC/OS, Swarm)

Kubernetes 1.12.1

What happened:

Events:
Type Reason Age From Message


Normal Scheduled 55m default-scheduler Successfully assigned kube-system/coredns-56684f94d6-h9btz to 35934k8s9000
Normal Pulling 54m (x3 over 54m) kubelet, 35934k8s9000 pulling image "coredns/coredns:1.2.2"
Warning Failed 54m (x3 over 54m) kubelet, 35934k8s9000 Failed to pull image "coredns/coredns:1.2.2": rpc error: code = Unknown desc = no matching manifest for windows/amd64 in the manifest list entries
Normal SandboxChanged 54m (x7 over 54m) kubelet, 35934k8s9000 Pod sandbox changed, it will be killed and re-created.
Warning Failed 4m (x225 over 54m) kubelet, 35934k8s9000 Back-off pulling image "coredns/coredns:1.2.2"

What you expected to happen:

ACS-Engine deploys a DNS that works

How to reproduce it (as minimally and precisely as possible):

Deploy a standard cluster with ACS-Engine 0.23.1

Anything else we need to know:

@MarcPow
Copy link
Author

MarcPow commented Oct 12, 2018

It looks as though this needs to be labeled to run only on the master in a Windows cluster. This may be an issue in Kubernetes 1.12.1 itself. Let me explore that.

@jackfrancis
Copy link
Member

Hi @MarcPow, all our E2E test clusters have working coredns. Your cluster config scenario is probably different than our test clusters. Could you share your api model input (free of creds?)

Thanks!

@MarcPow
Copy link
Author

MarcPow commented Oct 12, 2018

I have a cluster with one Linux master and 3 Windows agents.

The issue is that my coredns happened to get scheduled on one of the Windows boxes (instead of the Linux master). As such, it got stuck there temporarily.

Eventually, I bet it would converge - but I think the coredns container is missing appropriate labeling for the scheduler.

@jackfrancis
Copy link
Member

@MarcPow can you build from source on PR #4014?

If not, we'll test it out and report back.

@MarcPow
Copy link
Author

MarcPow commented Oct 13, 2018

At the moment, I don't have the ability to build from source.

So instead, I pulled down the current deployment/coredns yaml, updated it with a node-selector, and re-applied. It seems to work like a charm:

Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
coredns-token-6pzrg:
Type: Secret (a volume populated by a Secret)
SecretName: coredns-token-6pzrg
Optional: false
QoS Class: Burstable
Node-Selectors: beta.kubernetes.io/os=linux
Tolerations: CriticalAddonsOnly

@MarcPow
Copy link
Author

MarcPow commented Oct 13, 2018

Update: My manual update isn't stable. Something is removing my node selector, but I assume that's automation around the deployment under ACS that I don't fully understand, being a newbie and all. But I'm reasonably convinced that when applied, the node selector works appropriately.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants