-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CORE-7996 Trial: build in an evaluation period license #23893
Conversation
/ci-repeat |
src/v/features/feature_table.cc
Outdated
if (ss::this_shard_id() == 0) { | ||
vlog(featureslog.debug, "Initializing builtin trial license."); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this conditional just for the log message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, to only log on shard 0 and not all cores
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe log the timestamp as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe log the timestamp as well
Yeah, and then make_builtin_trial_license
wouldn't need to log, which it does on every core.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/v/redpanda/admin/server.cc
Outdated
@@ -2407,7 +2407,7 @@ void admin_server::register_features_routes() { | |||
ss::httpd::features_json::license_response res; | |||
res.loaded = false; | |||
const auto& ft = _controller->get_feature_table().local(); | |||
const auto& license = ft.get_license(); | |||
const auto& license = ft.get_license(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do want to return the evaluation license via the get_license
endpoint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sort of related to this - the evaluation license has to supersede all functional effects of license enforcement, but
- should the response from
GET /v1/features/enterprise
look like there's a real license installed? or - should we suppress the license nag during the trial period?
- should the
has_valid_license
bool in phone-home telemetry be an enum? or maybe we add an additional field for "evaluation_period"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO yes, the evaluation period should work as if a real trial license had been installed (i.e., don't nag until expiry, show that there is a trial license installed, expiry time metric > 0).
% rpk cluster license info
LICENSE INFORMATION
===================
License status: valid
License violation: false
Organization: Redpanda Built-In Evaluation Period
Type: free_trial
Expires: Dec 6 2024
should the has_valid_license bool in phone-home telemetry be an enum? or maybe we add an additional field for "evaluation_period"?
Probably easiest to add a new field for "evaluation_period". I guess this could be derived from the license hash (id_hash == 0
) but probably best to add it explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, totally missed the free trial type on the license!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The request from product is to have it return "trial" just like if it's a generated trial license. We should be able to tell that it's the evaluation by:
- The organization, and
- The empty SHA256 checksum
@@ -312,10 +313,14 @@ ss::future<> metrics_reporter::try_initialize_cluster_info() { | |||
|
|||
auto& first_cfg = batches.front(); | |||
|
|||
_cluster_info.creation_timestamp = first_cfg.header().first_timestamp; | |||
co_await _feature_table.invoke_on_all([&](features::feature_table& ft) { | |||
ft.set_builtin_trial_license(_cluster_info.creation_timestamp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do we do if creation timestamp is bogus (e.g. ntp isn't running and clocks aren't accurate)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's a good point. Specifically, when only the creation timestamp is bogus, but now the time is accurate, right? Surely, all bets are off if the time is bogus throughout the cluster's runtime (e.g., license checks against the timestamp in the license would break equally). Do you have any ideas? The only improvement I can think of is sanity-checking that the creation timestamp is in the past, but it's hard to determine the cluster creation time retrospectively because the prefix of the controller log is snapshotted away after ~60s.
|
|
/ci-repeat |
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/57127#0192bf96-5477-437e-b9d1-cef05cec4b52 |
if (_license) { | ||
return _license->is_expired(); | ||
} else if (_builtin_trial_license) { | ||
return _builtin_trial_license->is_expired(); | ||
} | ||
|
||
// While we are yet to initialize _builtin_trial_license on cluster | ||
// creation, be permissive | ||
return _builtin_trial_license_initialized; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (_license) { | |
return _license->is_expired(); | |
} else if (_builtin_trial_license) { | |
return _builtin_trial_license->is_expired(); | |
} | |
// While we are yet to initialize _builtin_trial_license on cluster | |
// creation, be permissive | |
return _builtin_trial_license_initialized; | |
auto license = get_license(); | |
if (!license) { | |
// ... | |
return _builtin_trial_license_initialized; | |
} | |
return license->is_expired(); |
are these 2 identical? Or am i missing an edge case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they are. I'm happy to switch to that if you think that's simpler.
src/v/features/feature_table.cc
Outdated
_builtin_trial_license_initialized = true; | ||
_builtin_trial_license = make_builtin_trial_license( | ||
model::to_time_point(cluster_creation_timestamp)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: while make_builtin_trial_license
doesn't seem to be able to throw, we should probably set the bool after it is initialized, mainly for sanity reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
FYI - tripped over redpanda/tests/rptest/services/redpanda.py Line 5358 in fb388a0
That check is no longer meaningful w/ a trial license. could be if license is None or not license['loaded'] or \
license['license']['type'] == 'free_trial': or something like that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good.
src/v/features/feature_table.cc
Outdated
if (ss::this_shard_id() == 0) { | ||
vlog(featureslog.debug, "Initializing builtin trial license."); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe log the timestamp as well
Yeah, and then make_builtin_trial_license
wouldn't need to log, which it does on every core.
Create a builtin fallback license that allows customers to trial Enterprise features for a trial period starting at cluster creation. This commit implements the `get_license` interface that will be used in following commits to start using this built in trial period license when no license has been configured yet.
Use the built in evaluation period trial license in the following admin API endpoints: * `GET /v1/features/license` * `PUT /v1/features/license` * `GET /v1/features/enterprise` This is to ensure that the trial license and its expiry time is visible to rpk and the customer, both for the `rpk cluster license info/set` commands and the rpk-side license nag.
This ensures that we allow major version upgrades during the evaluation period.
This ensures that the expiry of the built in trial license can be monitored.
We do not want to emit the license nag during the evaluation period.
Helper to tell whether to act on an expired license or evaluation period by restricting enterprise feature usage
Have `revoke_license` revoke the builtin trial license as well for more convenient testing.
|
Retry command for Build#57223please wait until all jobs are finished before running the slash command
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few non-blocking comments, looks good
def _disable_evaluation_period(self): | ||
self.redpanda.set_environment( | ||
{'__REDPANDA_DISABLE_BUILTIN_TRIAL_LICENSE': True}) | ||
self.redpanda.restart_nodes(self.redpanda.nodes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: maybe move this to redpanda.py
, but this isn't a blocker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW the restart is only needed if the brokers have already been started. Tests that manually start the brokers (empty setUp
method, eg. the tests in license_enforcement_test.py
) can set the env var before the first broker is started and the brokers will pick it up on startup.
In this test case, I chose to restart the nodes instead of switching to manually start them because that seemed easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is all getting ripped up in my PR anyway. i think "get the test passing and move on" is the right course of action.
self.redpanda.set_environment( | ||
{'__REDPANDA_DISABLE_BUILTIN_TRIAL_LICENSE': True}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: maybe this be a variable passed to the redpanda
constructor or a method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't add it in the constructor because I only wanted it to apply to the first two tests but not test_enterprise_cluster_bootstrap
. start
and restart
don't currently take extra environment variables; only extra configs (at the moment).
I guess I could factor out the first two tests into a separate test class, and then I could pass it into the constructor.
]) | ||
def test_get_enterprise(self, with_license): | ||
@matrix(with_license=[True, False], with_evaluation_period=[True, False]) | ||
def test_get_enterprise(self, with_license, with_evaluation_period): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might consider just marking these ok to fail temporarily and otherwise leaving exactly as is. your changes look fine but it'll be a pain to integrate with #23899
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eh, probably not worth delaying merge. i can deal with it.
if (_license) { | ||
return _license->is_expired(); | ||
} else if (_builtin_trial_license) { | ||
return _builtin_trial_license->is_expired(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it the case that you are creating a special carve-out for the trial license? it would seem that an alternative would have been to generate an actual license that would expire as expected, and re-use all of the existing machinery. is that because you can't generate a license without the private key? if so, would it have been an option to expand the semantics of the existing license abstraction as opposed to expressing a trial license explicitly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are able to create a license in the broker without the private key because internally the license is stored unencrypted, only licenses passed in from outside through the admin API are signed and validated.
We store the real license and the eval license separately (ie. have the eval license carved out) because we need a way to tell them apart at some points of the code. For example, we only recover the user-configured license during whole cluster recovery but not the evaluation period license, and we only write the user-configured license into the controller snapshot. We could represent this information differently (eg. introduce security::license_type::evaluation_period
or add a new source_type
field to security::license
), but storing it as a separate field seemed simplest to me because it guarantees by design that we can't overwrite a real license with an evaluation period license. Storing the license in memory instead of writing it into the controller log also makes it easy to change the behaviour of the trial license later if the product asks change (eg. change the evaluation period length).
(fyi I've drafted up the alternative you proposed; it would look something like this: #23945)
Let me know what you think with the above in mind. I'm open to changing it, potentially even to this alternative implementation, but the reasons above are why the evaluation period license is currently carved out into a separate field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the detailed response, it's awesome! i think what we are doing in this PR is ok. actually, it's great--not generalizing until we get a few examples (currently N=2) makes a lot of sense. but maybe you can see where it could break down in the future: product asks for a new kind of license, and then we are asked for carve-outs for different features, etc... pretty soon the carve-out approach becomes unwieldy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah makes sense. I would expect that product asks would mostly just change the way the evaluation period behaves and we would not need to introduce a new kind of license. But if we do, we can certainly think about ways of simplifying things at N=3.
/cdt |
CDT test failures:
FIPS CDT test failures:
|
auto expiry = std::chrono::duration_cast<std::chrono::seconds>( | ||
expiry_time.time_since_epoch()); | ||
|
||
if (std::getenv("__REDPANDA_DISABLE_BUILTIN_TRIAL_LICENSE") != nullptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the double prefix underscore is odd. we have so many notations we should unify.
RPK has env variables for example as well that do not have this. @twmb - we should have a developer experience here unify these. the product flags feel disjointed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i pinged @mattschumpert on this too to make sure all the vars are consistent. this should have product input for consistency. i only care about consistency, the prefix,postfix,dot-notation,etc doesn't matter as much to me as consistency across all product lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ENV vars specifically are so internal facing and no user ever sees them unless their browsing the code that it only matters for redpanda devs I think.
I've never once looked at any of these (only a couple of documented external env vars where we don't prefix).
the '__REDPANDA' seems to be our convention for internal, and seems consistent so it makes sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeh, internal consumption only. Should never be in documentation.
Create a builtin fallback license that allows customers to trial
Enterprise features for a trial period starting at cluster creation.
Fixes https://redpandadata.atlassian.net/browse/CORE-7996
Backports Required
Release Notes
Features