-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ignore data sync errors and allow custom should_capture
via Delegator pattern
#160
Conversation
9511bc8
to
b5774f0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Chris, lots of great improvements here and this is shaping up great. There's a few areas where I think the suggestions I've made didn't come out too clearly so I did some hacking myself so I could try point to what I meant rather than just try describe it in words - hopefully that provides greater clarity.
I cobbled my bits together here: 81c78c5
So these are the bits that perhaps I think our wires ended up a bit crossed on.
I think we should make the data sync error side distinct from the configuration. I've tried applying that like so: 81c78c5#diff-e455739649a523940d6e5fc5353656a0R3-R36 - that also reflects a difference on should capture, I'd expect an application defining should_capture will replace a previously defined version, but always keep the data sync check.
81c78c5#diff-60bf2f76f5886144ddb9f524d7a5a8edR49-R51 represents how I'd expect that errors were configured, so they rather transparently fit into configuration rather than containing some of the logic. I also felt it didn't feel quite right to be setting our default configuration in and amongst a class, seems to raise questions about what to test or not.
I did have a quick pass at matching to the exception class by converting the names to strings here: 81c78c5#diff-e455739649a523940d6e5fc5353656a0R30-R32 to be honest I don't know if that is sufficient or not as Raven seems to have used a more complex constant resolving process. It seemed to work in my test though. I guess a thing to_s
might not cover is two constants pointing to the same object, but that seemed pretty edge case.
Hope this helps and happy to have a chat if some of my suggestions are ambiguous, there are a variety of subtle things in my hacking that I haven't explained.
9c80568
to
bd8759c
Compare
Thanks again for the review, @kevindew. Apologies it's taken a while to address your comments - other things have taken priority this week. I've administered your suggestions in the relevant commits, and then added a final commit for moving the data sync logic into the RavenDelegator. I'm aware you've opted for a |
ee02e5b
to
407fa14
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Chris, this is looking really close now. Nice job.
Regarding the areas you referenced in your previous comment.
ConfigureDefaults class, to me this isn't a class in a traditional sense all we're looking for is vessel for the configuration code, and it doesn't really behave like a class since we just use it as a function to apply side effects to the argument.
I put it in a govuk_error/configure file to reference that's it's just to configure that module. It could well go inside the block of /~https://github.com/alphagov/govuk_app_config/pull/160/files#diff-de4bdf6f6685d2a23ac3a9ea5c54b985R10-R12 instead - yeah my concerns are mostly about this being a class.
With the naming switch from GovukError::RavenDelegator to GovukError::Configuration, my logic for that was that the object is a configuration object for GovukError, whereas it does more than be a delegator to Raven. That naming thought more like it was describing the implementation than the purpose.
26acc13
to
d9dcdcd
Compare
Thanks @kevindew - some excellent suggestions there, have applied them all. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great thanks Chris - I think we're through all the meat of this now, just a few minor things.
Have you given it a try out on an app? You could try install with Content Publisher I did that myself for a quick spin earlier in the process.
d9dcdcd
to
9e95ef9
Compare
661ce7c
to
92a83fa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work Chris 💯 🏅 . I've got two very minor comments but this shouldn't need an extra review.
describe ".in_progress?" do | ||
it "returns false if we are outside of the time range" do | ||
data_sync = GovukError::GovukDataSync.new("22:30-8:30") | ||
at(hour: 21) { expect(data_sync.in_progress?).to eq(false) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nice 👍
92a83fa
to
fa8cf47
Compare
This is to enable alphagov/govuk_app_config#160, which ignores certain exceptions if they occur during the data sync period. We want to set this via Puppet so that it is DRY and configured in only one place. Trello: https://trello.com/c/rAlvGlSp/2123-5-ignore-data-sync-errors-in-govukappconfig
This is to enable alphagov/govuk_app_config#160, which ignores certain exceptions if they occur during the data sync period. We want to set this via Puppet so that it is DRY and configured in only one place. Trello: https://trello.com/c/rAlvGlSp/2123-5-ignore-data-sync-errors-in-govukappconfig
This is to enable alphagov/govuk_app_config#160, which ignores certain exceptions if they occur during the data sync period. We want to set this via Puppet so that it is configured in only one repo. Whilst it would be nice to read the start and end times from govuk_data_sync_in_progress, there is precedent for duplicating the values - see /~https://github.com/alphagov/govuk-puppet/blob/5b1dc13ea3c7180318da284fe1dbe90a14460234/modules/monitoring/manifests/contacts.pp#L57-L59 Trello: https://trello.com/c/rAlvGlSp/2123-5-ignore-data-sync-errors-in-govukappconfig
This is to enable alphagov/govuk_app_config#160, which ignores certain exceptions if they occur during the data sync period. We want to set this via Puppet so that it is configured in only one repo. Whilst it would be nice to read the start and end times from govuk_data_sync_in_progress, there is precedent for duplicating the values - see /~https://github.com/alphagov/govuk-puppet/blob/5b1dc13ea3c7180318da284fe1dbe90a14460234/modules/monitoring/manifests/contacts.pp#L57-L59 Trello: https://trello.com/c/rAlvGlSp/2123-5-ignore-data-sync-errors-in-govukappconfig
This is to enable alphagov/govuk_app_config#160, which ignores certain exceptions if they occur during the data sync period. We want to set this via Puppet so that it is configured in only one repo. In the next commit I will DRY up the values so that they're only set in one place, but for now, there is precedent for duplicating the values - see /~https://github.com/alphagov/govuk-puppet/blob/5b1dc13ea3c7180318da284fe1dbe90a14460234/modules/monitoring/manifests/contacts.pp#L57-L59 We only want to define GOVUK_DATA_SYNC_PERIOD on Integration and Staging. If we hit an error on Production during the data sync period, we still want to log it to Sentry. Trello: https://trello.com/c/rAlvGlSp/2123-5-ignore-data-sync-errors-in-govukappconfig
Useful utility for devs, should be included by default.
In the next few commits, we're going to be making some changes to the logic inside the Raven configuration, so we need a class we can write tests against.
Switching to the [Delegator pattern], rather than allowing apps to work with the `Raven.configuration` object directly, gives us an opportunity to perform some custom logic prior to passing the configuration to Raven. We anticipate using this to allow apps to specify `should_capture` logic without overwriting the default logic in govuk_app_config, by defining our own `should_capture=` method in ConfigureDefaults which knows how to merge the two callbacks. [Delegator pattern]: https://ruby-doc.org/stdlib-2.5.1/libdoc/delegate/rdoc/SimpleDelegator.html
This is based on the should_capture callback defined in [email-alert-api][], but differs in that it doesn't refer to the `PG::Error` constant, as that would raise errors in apps that don't use PostgreSQL. Instead, it uses introspection to compare the exception's class to a known string ("PG::Error"). We currently have a hard-coded period of 10pm - 8am ([according to puppet][puppet]) in which to ignore PostgreSQL errors in all environments. In subsequent commits I will have this read from an environment variable instead. We're using ActiveSupport's TimeHelpers utility as per https://docs.publishing.service.gov.uk/manual/conventions-for-rails-applications.html#testing-utilities Finally, I had to explicitly set the `current_environment=` to the ENV or the configuring_spec.rb would fail (environment would be "default"). I could probably change the test itself but I think it's more useful/self-documenting to have the assignment in code. [email-alert-api]: /~https://github.com/alphagov/email-alert-api/blob/d77a94be840a9995e53279b896c5634a7e24f9a6/config/initializers/govuk_error.rb [puppet]: /~https://github.com/alphagov/govuk-puppet/blob/4bcf837e0dc4f48be59716921841307dfa291ebe/modules/govuk_data_sync_in_progress/manifests/init.pp#L20-L24
Previous approach only looked 1 level deep. On @kevindew's suggestion, we can reuse Raven code to look infinitely deep: #159 (comment)
We will do some work in govuk-puppet to create this ENV variable based on the time ranges that are already defined in puppet: /~https://github.com/alphagov/govuk-puppet/blob/4bcf837e0dc4f48be59716921841307dfa291ebe/modules/govuk_data_sync_in_progress/manifests/init.pp#L20-L24 I have encapsulated the time parsing in its own class: GovukDataSync. This sticks to the sole responsibility principle. NB: As this is a lambda, we cannot use a `return` statement to exit the `should_capture` lambda early. We have to process all the logic up front and then use ruby's implicit return. https://www.reddit.com/r/ruby/comments/4z86sg/another_lambda_vs_proc_vs_blocks_article/
Apps can hook into `should_capture` behaviour by specifying their own `should_capture=(lambda)` hook. This gets merged with the default `should_capture` behaviour. An app can call `should_capture=(lambda)` multiple times if it wishes. All of the lambdas will be evaluated, and if they all return `true`, then `should_capture` will return `true`. If any of them return `false`, then it will return `false`. This seems a sensible behaviour to roll with for now, as this is all somewhat hypothetical. The only apps making use of custom `should_capture` callbacks are [content-publisher][] and [email-alert-api][], and both are currently only using it to do what we are defining as the default anyway, so will be removed soon. In the future we may want to consider supporting different lambda merge strategies, so that for example `should_capture` returns `true` if ANY of the lambdas returns `true` (as opposed to having to ALL return `true`). The framework is in place now, so it should be fairly simple to cross that bridge when we come to it. [content-publisher]: /~https://github.com/alphagov/content-publisher/blob/6b756d8e52c858abf81eff10966ba39070c9f229/config/initializers/govuk_error.rb [email-alert-api]: /~https://github.com/alphagov/email-alert-api/blob/afa38dd9087daec1f849f27dbc3941653a5d503b/config/initializers/govuk_error.rb
Prior to this commit, if an app wanted to ignore a particular type of exception that occurs during the data sync window, they'd have the facility to do so via the `should_capture=` callback but would need to recreate the GovukDataSync logic and keep that in sync with govuk-puppet. Now, they can simply append their exception to the newly added `data_sync_excluded_exceptions` property.
Rather than defining the default `should_capture` callback in the default configuration, which is a little different to the simple strings, arrays and booleans that exist in this file (although not that far removed from the `proc`s), it is cleaner to encapsulate this logic in the RavenDelegator class where we define the `should_capture=` merging strategy. We've also made the decision that a `should_capture=` call will overwrite any previous calls, except for the data sync callback. I've moved/removed some of the tests as appropriate. Co-authored-by: Kevin Dew <Kevin.Dew@digital.cabinet-office.gov.uk>
There was previously a lot of emphasis on PG::Error errors, as these were initially hard-coded. Now that it is more config driven, we should make the tests a little more abstract. I've also rearranged the tests to start with the simplest case (capture errors) and finish with the most complex (ignoring errors whose original cause was an error we've configured to ignore).
Previous class 'ConfigureDefaults' doesn't really behave like a class as all it does is use a function to apply side effects to the argument. It is cleaner to move the default configuration here (alternatively it could have been inline in govuk_app_config.rb itself but that felt like we were burdening the file with too much responsibility.) It does become very difficult to test the config (and would rely heavily upon mocks and stubbing), so I've sacrificed the 'silence_ready' tests, which were the only ones that were arguably useful. Ideally our default config should be as hard-coded and predictable as possible, so that tests aren't even necessary.
The object is a configuration object for GovukError, and does more than be a delegator to Raven. The previous naming referenced the implementation rather than the purpose. For that reason, 'Configuration' seems a better class name. Co-authored-by: Kevin Dew <Kevin.Dew@digital.cabinet-office.gov.uk>
This is to prevent any surprising behaviour should the downstream app decide to call 'GovukError.configure' twice. Prior to this commit, calling twice would keep all the Raven configuration but lose any custom configuration. I then needed to edit the 'govuk_error_spec' configure test to prevent `double("Configuration")` from leaking between tests. I also found that the change would cause configuring_spec.rb to fail, as the initialised GovukError module would no longer be looking for the SENTRY_CURRENT_ENV variable at the time the test would run. Once I changed this test to accommodate the singleton pattern, we no longer needed the `config.current_environment=` line, which was only really there to satisfy configuring_spec.rb. The line is unnecessary as the default behaviour is to delegate that to Raven, which is already being set via the ENV variable, so we don't need to create that ourselves. We also no longer need configuring_spec itself - its behaviour is now tested in govuk_error_spec.
fa8cf47
to
7c47fb7
Compare
(@kevindew I've just pushed a 'bump to 2.5.0' commit, which involved a rebase as the version had changed while this PR was open. There are no other new changes). |
As of govuk_app_config 2.5.0, excluding errors of type `PG::Error` that occur during the data sync on integration/staging now happens automatically. See alphagov/govuk_app_config#160 Trello: https://trello.com/c/rAlvGlSp/2123-5-ignore-data-sync-errors-in-govukappconfig
As of govuk_app_config 2.5.0, excluding errors of type `PG::Error` that occur during the data sync on integration/staging now happens automatically. See alphagov/govuk_app_config#160 Trello: https://trello.com/c/rAlvGlSp/2123-5-ignore-data-sync-errors-in-govukappconfig
As of govuk_app_config 2.5.0, excluding errors of type `PG::Error` that occur during the data sync on integration/staging now happens automatically. See alphagov/govuk_app_config#160 Trello: https://trello.com/c/rAlvGlSp/2123-5-ignore-data-sync-errors-in-govukappconfig
As of govuk_app_config 2.5.0, excluding errors of type `PG::Error` that occur during the data sync on integration/staging now happens automatically. See alphagov/govuk_app_config#160 Trello: https://trello.com/c/rAlvGlSp/2123-5-ignore-data-sync-errors-in-govukappconfig
We sometimes spin up temporary environments (e.g. 'test-pink-aws') while developing applications. If there are errors in those apps, we can accidentally send many thousands of errors to Sentry, which can mean we hit rate limits on our other, production, environments, as we have a single shared Sentry account. This commit introduces a `active_sentry_environments` config property, which is checked against the SENTRY_CURRENT_ENV ENV variable. If an error occurs in an environment we care about, it is reported to Sentry, otherwise it gets ignored. There is prior art for this - in #160 we added a similar check, such that if a particular error occurs during the nightly data sync, it gets ignored. I've made sure this new functionality is compatible with what's already been added. Trello: https://trello.com/c/GfFwJG0K/2180-investigate-disabling-sentry-on-environments-we-dont-care-about-timebox-2-days
In order to prevent thousands of transient errors being reported to Sentry during the data sync window (errors that occur only because databases are copied/provisioned overnight and are sometimes unavailable), we want to configure the
should_capture
logic in govuk_app_config in the same way as Content Publisher and Email Alert API, so that the fix is applied centrally. This involves setting theshould_capture
behaviour to ignore any errors of typePG::Error
, if we are in the data sync window (10pm to 8am, according to Puppet).However, we want to allow downstream apps to be able to define their own
should_capture
conditions too. Therefore, this PR:GovukError.configure
- it now yields to aRavenDelegator
, which uses the Delegator pattern as suggested by @kevindew. The pattern delegates methods to theRaven.configuration
object unless they've been explicitly overridden.should_capture=(callback)
, insideDefaultConfiguration
- but also allows downstream apps to provide their ownshould_capture=(callback)
callbacks too. In our implementation, both the default callback and the custom callback must returntrue
for the overallshould_capture
callback to returntrue
.should_capture
callback examines the error type and determines whether or not we are in the data sync window to decide if the error should be captured. It does this by looking at a globalGOVUK_DATA_SYNC_PERIOD
ENV variable of the form22:00-08:00
.PG::Error
errors that occur during this time frame will be ignored.Trello: https://trello.com/c/rAlvGlSp/2123-5-ignore-data-sync-errors-in-govukappconfig
Fixes #152. Supersedes #159 (the code has been largely recreated by hand here, as the commit history was meandering and no longer particularly valuable).