Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Rancher as a cloud provider #32419

Closed
wants to merge 2 commits into from
Closed

Conversation

wlan0
Copy link
Member

@wlan0 wlan0 commented Sep 9, 2016

This PR adds rancher as a cloud provider to Kubernetes.

Rancher is a multi cloud, multi tenant container management platform. Rancher uses Kubernetes as one of its orchestration engines. We maintained our own fork of Kubernetes to support rancher networking, load balancing and volumes in Kubernetes. Since we moved to CNI based networking and Flex volumes, we found that there is no need to maintain a separate fork anymore.

This PR specifically adds support for rancher cloud provider, and rancher credential provider to Kubernetes. Rancher cloud provider is vital for registering rancher hosts as Kubernetes nodes. Once added into the Kubernetes cluster, the nodes can leverage Rancher DNS, Rancher Load balancer using native kubectl commands. Rancher credential provider allows nodes to use Rancher managed credentials.

We've been running this code (+ other rancher specific networking code) for over a year now. It has also been deployed by many of our customers.

CC (code reviewers from Rancher Labs) @ibuildthecloud @alena1108


This change is Reviewable

@k8s-ci-robot
Copy link
Contributor

Can a kubernetes member verify that this patch is reasonable to test? If so, please reply with "ok to test" on its own line.

Regular contributors should join the org to skip this step.

While we transition away from the Jenkins GitHub PR Builder plugin, "ok to test" commenters will need to be on the admin list defined in this file.

@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
  • If you signed the CLA as a corporation, please let us know the company's name.

@k8s-github-robot k8s-github-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. release-note-label-needed labels Sep 9, 2016
@wlan0
Copy link
Member Author

wlan0 commented Sep 12, 2016

I signed it!

@googlebot
Copy link

CLAs look good, thanks!

@wlan0
Copy link
Member Author

wlan0 commented Sep 13, 2016

@dchen1107 - What are your thoughts on this PR? Most of the files in this PR are vendor files. Even though it looks large, there are only a few code files. Would it help if I divided this PR in two separate commits so that you can review the code separately from the vendor files?

@wlan0
Copy link
Member Author

wlan0 commented Sep 13, 2016

@dchen1107 I divided this PR into two commits. The first one with all the relevant changes, and the second one with the vendor files. This should make it easier to review.

  1. add rancher cloud provider [/~https://github.com/Add Rancher as a cloud provider #32419/commits/a089913f7ab17dbecd236cd9532d81311cfdaae9]
  2. add vendor files for rancher cloud provider [/~https://github.com/Add Rancher as a cloud provider #32419/commits/1368aaf76d29257f86c736790ee6b6d4531adc0b]

@wlan0 wlan0 force-pushed the upstream branch 3 times, most recently from 2598469 to f46b3a2 Compare September 16, 2016 18:33
@wlan0 wlan0 force-pushed the upstream branch 2 times, most recently from c78f7ed to 23f5e09 Compare September 22, 2016 23:07
@dchen1107
Copy link
Member

@wlan0 Thanks for sending us the pr to integrate Rancher with Kubernetes. We are currently working on the process of moving all cloud provider related stuff to a separate repo. I am adding @bgrant0607 as the reviewer for the direction on this.

@wlan0
Copy link
Member Author

wlan0 commented Sep 27, 2016

Thanks! @dchen1107. @bgrant0607 Let me know if I can do anything to ease the process of transitioning the rancher cloud provider to the new repo.

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 28, 2016
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 28, 2016
@thockin
Copy link
Member

thockin commented Sep 29, 2016

So this is a tough phase to be in. We know we want to change the way CloudProvider works, but that work hasn't really started yet. Are you guys willing to be a bit of a guinea pig? In broad strokes, here's what I am thinking. Don't build this as a Cloud Provider. Instead, let's make ControllerManager understand that "external" is a valid cloud provider, and write this as a controller. Simply run it somewhere in the cluster and let it watch the various APIs it needs.

While you do this, you can start to catalog all the places that assume CloudProvider is populated, and your controller can be a starting point for the discussion of how to evict google and amazon and openstack into their own modules.

@errordeveloper

@errordeveloper
Copy link
Member

It would be mega awesome if we could do what @thockin said!

On Thu, 29 Sep 2016, 05:41 Tim Hockin, notifications@github.com wrote:

So this is a tough phase to be in. We know we want to change the way
CloudProvider works, but that work hasn't really started yet. Are you guys
willing to be a bit of a guinea pig? In broad strokes, here's what I am
thinking. Don't build this as a Cloud Provider. Instead, let's make
ControllerManager understand that "external" is a valid cloud provider, and
write this as a controller. Simply run it somewhere in the cluster and let
it watch the various APIs it needs.

While you do this, you can start to catalog all the places that assume
CloudProvider is populated, and your controller can be a starting point for
the discussion of how to evict google and amazon and openstack into their
own modules.

@errordeveloper /~https://github.com/errordeveloper


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#32419 (comment),
or mute the thread
/~https://github.com/notifications/unsubscribe-auth/AAPWS26kjXFFT7jlPv9LFb6dNBfAMGawks5qu0GNgaJpZM4J5hRa
.

@bgrant0607
Copy link
Member

ref #2770

@wlan0
Copy link
Member Author

wlan0 commented Sep 29, 2016

@thockin Thanks for explaining your position. We agree to be your guinea pigs! :-)

I'll restate my understanding of your plan, to ensure we're on the same page.

I'll start off by making ControllerManager understand that an external controller is managing the cloudprovider resources if cloud-provider == external.

Then, I'll create a new controller that can work with the following types of CloudProvider resources

  • Instances
  • Load Balancers
  • Clusters
  • Routes
  • Zones

TL;DR I'll make a PR asap with a new ControllerManager, that understands "external" cloud provider and a minimal implementation of Rancher CloudProvider as a controller.

@thockin let me know if this sounds good to you.
cc @errordeveloper

@luxas
Copy link
Member

luxas commented Sep 30, 2016

That sounds great, thank you @wlan0!

@thockin
Copy link
Member

thockin commented Sep 30, 2016

On Thu, Sep 29, 2016 at 4:47 PM, Sidhartha Mani
notifications@github.com wrote:

@thockin Thanks for explaining your position. We agree to be your guinea pigs! :-)

I'll restate my understanding of your plan, to ensure we're on the same page.

I'll start off by making ControllerManager talk to an external controller if cloud-provider == external. i.e. Implement a new CloudProvider called "external" that talks to a CloudProvider controller.

I don't think controller manager needs to "talk to" anything. I
suggested "external" as a place-holder, but maybe "" (not specified)
is good enough. Just something that tells all of the built-in
controller loops which use CloudProvider to deactivate instead.

Then, I'll create a new controller that can respond to the following types of CloudProvider API calls

Instances
Load Balancers
Clusters
Routes
Zones

Your RancherController needs to do more than just respond to those
APIs - it needs to actually implement all the controllers. I think
you can actually start with kube-controller-manager and inject an
instance of CloudProvider interface. Basically, you're building a
kube-controller-manager that ONLY supports rancher.

If we do this right, it becomes easy to substitute any implementation
of cloud provider, but to only link the one you care about, rather
than all of them. We can even offer the k8s-cloud-controller as a
library that each cloud provider can (but doesn't have to!!) use.

TL;DR I'll make a PR asap with a new ControllerManager, that understands "external" cloud provider and a minimal implementation of Rancher CloudProvider as a controller.

@thockin let me know if this sounds good to you.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@wlan0
Copy link
Member Author

wlan0 commented Sep 30, 2016

Thanks for clarifying @thockin.

@wlan0
Copy link
Member Author

wlan0 commented Sep 30, 2016

@thockin I like the k8s-cloud-controller library approach. That way there could be a Kubernetes/k8s-cloud-controller repo which hosts the library code (a fork of KCM), and each cloud vendor (such as us!) could use that library to create their own {cloud-provider}-k8s-controller-manager.

Will you maintain and update two version of the KCM (the mainstream KCM in the core repo and the k8s-cloud-controller library) in two different repositories?

In trying to understand the motive behind this approach, was this done becuase this approach doubles down as breaking down Kubernetes into multiple repositories as well (#24343, #2742, #444, #24156)? 'Cos KCM is being moved out.

@thockin
Copy link
Member

thockin commented Sep 30, 2016

KCM is more than just the cloud provider parts, so we'd still have it, but
I expect it would vendor in the new library until such time as all
providers are removed from the main repo.

On Fri, Sep 30, 2016 at 3:46 PM, Sidhartha Mani notifications@github.com
wrote:

@thockin /~https://github.com/thockin I like the k8s-cloud-controller
library approach. That way there could be a Kubernetes/k8s-cloud-controller
repo which hosts the library code (a fork of KCM), and each cloud vendor
(such as us!) could use that library to create their own
{cloud-provider}-k8s-controller-manager.

Will you maintain and update two version of the KCM (the mainstream KCM in
the core repo and the k8s-cloud-controller library) in two different
repositories?

In trying to understand the motive behind this approach, was this done
becuase this approach doubles down as breaking down Kubernetes into
multiple repositories as well (#24343
#24343, #2742
#2742, #444
#444, #24156
#24156)? 'Cos KCM is
being moved out.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#32419 (comment),
or mute the thread
/~https://github.com/notifications/unsubscribe-auth/AFVgVHfqxhveN_FKxp242Tll2bmLEuAIks5qvZFbgaJpZM4J5hRa
.

@bgrant0607
Copy link
Member

@wlan0 The idea is that there should be no cloudprovider API. Just separate (out-of-repo) controllers for each cloud.

@thockin
Copy link
Member

thockin commented Oct 1, 2016

...thought there might be a library that individual cloud providers might
link to cover the most common logic, so they can specialize on their own
logic.

It's sort of an inversion.

On Sep 30, 2016 7:13 PM, "Brian Grant" notifications@github.com wrote:

@wlan0 /~https://github.com/wlan0 The idea is that there should be no
cloudprovider API. Just separate (out-of-repo) controllers for each cloud.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#32419 (comment),
or mute the thread
/~https://github.com/notifications/unsubscribe-auth/AFVgVJ_TXtJGi3F8wkiwBt-dWPZlXvsjks5qvcGtgaJpZM4J5hRa
.

@ibuildthecloud
Copy link
Contributor

ibuildthecloud commented Oct 2, 2016

@wlan0 The registry credential provider should really be broken out into a different PR. Being that 1) it's not related to cloud provider directly 2) the cloud provider code is going to take awhile to get merged give the refactoring approach that is now going to be taken.

@thockin
Copy link
Member

thockin commented Oct 2, 2016

To make this run, we should not try to move all the logic in one PR. We
should evaluate the things we have to move, identify the ones that are
easiest and hardest, and the ones that are most representative. Then we
can agree on a path and move incrementally.

The last thing I want is a 5K line PR to review. :)

On Sat, Oct 1, 2016 at 10:18 PM, Darren Shepherd notifications@github.com
wrote:

@wlan0 /~https://github.com/wlan0 The registry credential providers
should really be broken out into a different PR. Being that 1) it's not
related to cloud provider directly 2) the cloud provider code is going to
take awhile to get merged give the refactoring approach that is now going
to be taken.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#32419 (comment),
or mute the thread
/~https://github.com/notifications/unsubscribe-auth/AFVgVAROygfpyIzwyxmERRmElVeTGaWVks5qvz6ggaJpZM4J5hRa
.

@bgrant0607
Copy link
Member

@thockin I'm not opposed to reusable helper libraries.

@wlan0
Copy link
Member Author

wlan0 commented Oct 6, 2016

@bgrant0607 Im working on the reusable library - here's the initial PR for disabling cloud specific controller loops - #34273

@thockin
Copy link
Member

thockin commented Oct 20, 2016

For task-queue sanity I am going to close this PR, but I'm obviously VERRRRY interested in the splittling-up work :)

@thockin thockin closed this Oct 20, 2016
@dims
Copy link
Member

dims commented Oct 31, 2016

Looks like a features issue has been added - kubernetes/enhancements#88

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.