Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xds: share rds configuration among envoy #8523

Closed
wants to merge 7 commits into from

Conversation

lambdai
Copy link
Contributor

@lambdai lambdai commented Oct 7, 2019

WIP since this PR is on top of unlanded PR #8485 and I will rebase.

Context: Currently if a dynamic rds config can be referenced by multiple HCM and multiple scoped rds config. Envoy is maintaining many copies.

Description:
In this PR envoy will maintain only one copy of rds config for each resource name. Notes the rule that listener is ready only if the dependent RDS is ready is respected.

BTW there is a bug so the rule is not fully respected. Will fix in future PRs.
Fixed the bug where a init manager might see rds config ready when it's not. That's why SharedTargetImpl is introduced.

Fix #8303

@jmarantz jmarantz changed the title xds: share rds configuration among envoy WiP xds: share rds configuration among envoy Oct 9, 2019
@stale
Copy link

stale bot commented Oct 16, 2019

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Oct 16, 2019
@lambdai
Copy link
Contributor Author

lambdai commented Oct 17, 2019

Update: dependency is almost done. Will rebase now and then

@stale stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Oct 17, 2019
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
@lambdai lambdai changed the title WiP xds: share rds configuration among envoy xds: share rds configuration among envoy Oct 23, 2019
@lambdai lambdai marked this pull request as ready for review October 23, 2019 07:00
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
@lambdai
Copy link
Contributor Author

lambdai commented Oct 23, 2019

/assign @htuch
CC @stevenzzzz

Copy link
Contributor

@stevenzzzz stevenzzzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use the new(ish) config-provider framework, I think they are doing the same thing. With the ServerFactoryContext PR merged, I think there is no blocker to migrate this into the config-provider framework.

source/common/init/target_impl.cc Show resolved Hide resolved
source/common/init/target_impl.h Show resolved Hide resolved
source/common/router/rds_impl.cc Outdated Show resolved Hide resolved
source/common/router/scoped_rds.cc Outdated Show resolved Hide resolved
test/common/init/target_impl_test.cc Outdated Show resolved Hide resolved
test/common/router/scoped_rds_test.cc Outdated Show resolved Hide resolved
test/common/router/scoped_rds_test.cc Outdated Show resolved Hide resolved
@lambdai
Copy link
Contributor Author

lambdai commented Oct 23, 2019

Thanks @stevenzzzz for the quick review! Will update soon

Signed-off-by: Yuchen Dai <silentdai@gmail.com>
Copy link
Contributor Author

@lambdai lambdai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @stevenzzzz I'd like to use the new provider in the follow up PR(s), making the future PR a pure migration. WDYT?

subscription = it->second.lock();
auto existing_provider = it->second.lock();
RELEASE_ASSERT(existing_provider != nullptr, "");
init_manager.add(existing_provider->subscription_->init_target_);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shared target is used here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you know that existing_provider->subscription_->init_target_ hasn't already initialized?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InitManager doesn't need to know. Manager will be notified after Manager::initialize(), no matter target is initialized or not at this line.

The Target impl (which is SharedTargetImpl) maintains the internal states.

If the target need to be initialized only after all managers call initialze(): This behavior should be achieved by another instance of InitManager IMHO :)

test/common/router/rds_impl_test.cc Show resolved Hide resolved
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
@lambdai lambdai mentioned this pull request Oct 24, 2019
@htuch
Copy link
Member

htuch commented Oct 24, 2019

@lambdai if your PR and the shared config provider do the same thing, then why do this PR? I'd prefer we don't review and merge code that is going to be throw away.

@lambdai
Copy link
Contributor Author

lambdai commented Oct 24, 2019

@htuch I am not sure what config provider you are referring to.
Assuming you are referring to ScopedRdsConfigProvider where srds is used:

  1. There is only very few lines tackling how a route config is shared. It's a trivial change that RouteConfigProvider user should hold shared ptr.
  2. Instead, the majority of the PR is fixing the misused assumption: owners of rds config will be signaled when the config is ready. SRDS is also the victim and that provider framework depends on the correct assumption.

@htuch
Copy link
Member

htuch commented Oct 25, 2019

@lambdai I'm talking about the base classes of ScopedRdsConfigProvider. They are basically designed to do what this PR sets out to achieve; make it safe and efficient to share immutable objects related to route config across workers and referees. I think the clean thing to do is have all the route config use this pattern.

@lambdai
Copy link
Contributor Author

lambdai commented Oct 25, 2019

ScopedRdsConfigProvider derives from MutableConfigProviderCommonBase
My understanding is that ScopedRdsConfigProvider and ScopedRdsConfigProviderManager delegate how the underlying RouteConfig creation to RouteConfigProviderManager
This PR is fixing the underlying RouteConfigProviderManager

I don't see how MutableConfigProviderCommonBase could magically resolve the issues that notifying multiple Init::Manager

If you are referring that rds config user should rely on a MutableConfigProviderCommonBase instead of "RouteConfigProvider`. I agree and I think it could be the follow up since it's a purely refactor.

@htuch
Copy link
Member

htuch commented Oct 27, 2019

@lambdai you are correct that ScopedRdsConfigProvider delegates to RouteConfigProviderManager.

The point made above is that RouteConfigProviderManager would benefit from the same design patterns as ScopedRdsConfigProvider, to solve basically the same problem you set out to achieve in this PR. @stevenzzzz can probably elaborate in much more detail, but a number of the recent PRs around the config provider have been about solving (I think) a somewhat similar problem you are tackling in this PR, namely how to share configs and subscriptions across multiple referees and worker threads safely.

It might be simple to "just use a shared pointer", but there are subtleties around this that we explored in depth in those PRs, specifically, what happens when a shared pointer gets destructed on a worker thread vs. main thread, when it happens to be the last thread standing.

Also, I'd want to understand why the same init concerns that exist here don't also exist for SRDS. Can you sync with @stevenzzzz and come up with a common pattern/framework that works for both SRDS and RDS? RDS can be referenced by multiple HCMs and SRDS, similarly SRDS might be referenced by multiple HCMs.

@lambdai
Copy link
Contributor Author

lambdai commented Oct 27, 2019

It might be simple to "just use a shared pointer", but there are subtleties around this that we explored in depth in those PRs, specifically, what happens when a shared pointer gets destructed on a worker thread vs. main thread, when it happens to be the last thread standing.

I admit that I didn't take full care of destruction path. My understanding is that my change affect only the lifetime of RDSProvider in master thread and the new TLS magic should work with the new provider.

Also, I'd want to understand why the same init concerns that exist here don't also exist for SRDS.

The bug must exist in SRDS. My observation is that since SRDS delegate the RdsConfigProvider creation to RouteConfigProviderManager. SRDS is built on top of the assumption that RouteConfigProviderManager would notify the init manager owned by SRDS. But it's not the truth. See the failing test case I added in #8781 It's a little hard for me to manipulate the SRDS test cases and fail it. I hope my test in rds ring the bell that why SRDS would fail as well

@stevenzzzz
Copy link
Contributor

It might be simple to "just use a shared pointer", but there are subtleties around this that we explored in depth in those PRs, specifically, what happens when a shared pointer gets destructed on a worker thread vs. main thread, when it happens to be the last thread standing.

I admit that I didn't take full care of destruction path. My understanding is that my change affect only the lifetime of RDSProvider in master thread and the new TLS magic should work with the new provider.

There was a previous hard-failure assertion in TLS slot destruction, which limits the destruction of SlotImpl can only happen in mainThread, it's relaxed now I added the "Bookeeper" into the thread_local_impl.h to solve the life-cycle problem of RDS Provider may gets removed by LDS while there are active lambdas in workers dispatcher queue pointing to the provider (which cause undefined behaviors). If we are talking about that, I think it's resolved now. (for RDS and SRDS only).

Also, I'd want to understand why the same init concerns that exist here don't also exist for SRDS.

The bug must exist in SRDS. My observation is that since SRDS delegate the RdsConfigProvider creation to RouteConfigProviderManager. SRDS is built on top of the assumption that RouteConfigProviderManager would notify the init manager owned by SRDS. But it's not the truth. See the failing test case I added in #8781 It's a little hard for me to manipulate the SRDS test cases and fail it. I hope my test in rds ring the bell that why SRDS would fail as well
If lambdai@ hasn't changed SRDS to also depends on ServerFactoryContext, the init-manager bug is still alive in SRDS, i.e., SRDS still depends on the per-Listener factoryContext.

Once that's done. I think we can ignore this PR and move to the config-provider framework, which instead of sharing a config provider(which contains a configImpl) across listeners, shares a subscription (just like the current RDS, but contains a config impl now) across RDS providers.
IMHO, it'd be much less change to make and consistency across all xDS is the most appealing selling point.

@lambdai
Copy link
Contributor Author

lambdai commented Oct 29, 2019

Talked to @stevenzzzz offline
Agreements:

  1. Confirmed that listener might be early initialized before rds config is ready.
  2. We might need to understand the new stats: since rds config is shared by multiple listeners but not owned by listener. It's not reasonable to build counter("listenerA.route.hit").

Steps:

  1. Let's land this PR because it fix the multi-notification bug which existing in the RDS and SRDS code
  2. Moving RDS to use ConfigProvider is on my short road map. I will be working on it in follow up.
  3. Sharing SRDS config among listeners is in the long road map.

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lambdai this seems like a reasonable plan. Can you update the PR description to make it clear that the main contribution here is the shared init manager?

Re: shared init manager. I'm still trying to persuade myself this is the right thing. It's simpler than adding a pub-sub or message bus to Envoy, but it also just generally makes reasoning about init watches and handlers that much more complicated. CC @mergeconflict who is the world premier expert on our init manager :)

} else {
// Because the RouteConfigProviderManager's weak_ptrs only get cleaned up
// in the RdsRouteConfigSubscription destructor, and the single threaded nature
// of this code, locking the weak_ptr will not fail.
subscription = it->second.lock();
auto existing_provider = it->second.lock();
RELEASE_ASSERT(existing_provider != nullptr, "");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a regular ASSERT I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 line below the existing_provider is the deref. Instead of a nullptr deref I prefer to throw a message here. I added a detailed error message in the follow commit.
There is no penalty if the condition is true.

subscription = it->second.lock();
auto existing_provider = it->second.lock();
RELEASE_ASSERT(existing_provider != nullptr, "");
init_manager.add(existing_provider->subscription_->init_target_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you know that existing_provider->subscription_->init_target_ hasn't already initialized?

mergeconflict
mergeconflict previously approved these changes Nov 1, 2019
Copy link
Member

@mergeconflict mergeconflict left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just reviewed the SharedTarget portion of this PR. I guess there's a real use case for registering an init target with multiple init managers, so I think I'm ok with this, but I should point out that there are two choices you could have made here and neither is obviously "more correct" than the other:

  • What was done here: the target initializes as soon as any of the managers tell it to.
  • The other option: the target waits to initialize until all of the managers tell it to.

Maybe there are more options I didn't think of... My point is that there's sort of a rabbit hole here, where you could keep adding new kinds of targets for different use cases. Maybe that's a good thing, I don't know.

Anyway, a few minor nits, but this otherwise looks fine to me.

Comment on lines +25 to +27
external_deps = [
"abseil_memory",
],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer not to add a dependency just so you can use absl::WrapUnique instead of std::unique_ptr ctor (sorry @stevenzzzz, I know you suggested it). Is this style more common in the Envoy codebase?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 occurrences. I like the simplicity of WrapUnique. I can seek to mark abseil_memory as a common deps in targets so that I don't have to introduce this deps.
And eventually we can replace WrapUnique along with better c++ stdlib

source/common/init/target_impl.h Outdated Show resolved Hide resolved
@@ -85,5 +87,49 @@ class TargetImpl : public Target, Logger::Loggable<Logger::Id::init> {
const std::shared_ptr<InternalInitalizeFn> fn_;
};

/**
* A specialized Target which can be added by multiple Manager.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Managers.

source/common/init/target_impl.h Outdated Show resolved Hide resolved
// Human-readable name for logging
const std::string name_;

// Handle to the ManagerImpl's internal watcher, to call when this target is initialized
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update comment for multiple managers/multiple watchers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@htuch htuch added the waiting label Nov 4, 2019
@stale
Copy link

stale bot commented Nov 8, 2019

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Nov 8, 2019
@lambdai
Copy link
Contributor Author

lambdai commented Nov 8, 2019

Back to this PR. Sorry for the late response

@stale stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Nov 8, 2019
@lambdai
Copy link
Contributor Author

lambdai commented Nov 8, 2019

As an analogy to when_all(futures) and when_any(futures), initialize() might need a when-all-init-manager-init semantic.
But I also have the hunch that the when_all is done by the existing Init::Watcher. Let's discuss when there is a requirement.

@lambdai
Copy link
Contributor Author

lambdai commented Nov 8, 2019

Addressing comments...

@lambdai lambdai requested a review from htuch November 9, 2019 00:30
Signed-off-by: Yuchen Dai <silentdai@gmail.com>
@htuch
Copy link
Member

htuch commented Nov 11, 2019

@lambdai can you update the GH top comment (and maybe even title)? It seems this PR is mostly about introducing a shared init manager. In fact, if you want to split out the shared init manager to a separate dedicated PR, that could be even better, as we could review that and this PR becomes much smaller and self-contained.

@lambdai
Copy link
Contributor Author

lambdai commented Nov 12, 2019

@htuch will do. Now that we reach the consensus such a shared_target is required I can create a new PR referring to this scenario.

@mergeconflict
Copy link
Member

@htuch will do. Now that we reach the consensus such a shared_target is required I can create a new PR referring to this scenario.

Just to be clear, I don't believe it's required. I think you can build what you need without it, by having multiple targets — one per init manager — and running the initialization logic when the first target is initialized.

@lambdai lambdai mentioned this pull request Nov 13, 2019
@htuch htuch added the waiting label Nov 15, 2019
@stale
Copy link

stale bot commented Nov 19, 2019

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Nov 19, 2019
htuch pushed a commit that referenced this pull request Nov 25, 2019
Extracted from #8523

@mergeconflict mentioned we can also maintain a collection of TargetImpl(and a initialize_ flag, std::once_flag) at the target owner.

I agree it works but a SharedTarget provides stronger cohesion.

Follow up:

* update #8523 on top of this PR
* complete fuzz test #8714

Signed-off-by: Yuchen Dai <silentdai@gmail.com>
@stale
Copy link

stale bot commented Nov 26, 2019

This pull request has been automatically closed because it has not had activity in the last 14 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale stalebot believes this issue/PR has not been touched recently waiting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rds: make RouteConfigProvider shared among ListenerImpls, or move factory_context_ outside of Listener.
4 participants