Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rfc: add notification service design doc #414

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

wihobbs
Copy link
Member

@wihobbs wihobbs commented May 13, 2024

This is a RFC-style design document being put up to document the design we have for the flux job notification service.

I'm looking to get feedback on the clarity of this document and make sure we have as many edge cases as possible covered. If this isn't an appropriate RFC for the whole project it can be moved somewhere else, but the github PR interface provides a nice way for us to engage in a feedback loop.

I wanted to add a flow chart detailing some of the interactions but am still working on that. I think there's enough here for a first pass review, though.

@wihobbs wihobbs force-pushed the flanrfc branch 2 times, most recently from a56d23d to 60db4cf Compare May 13, 2024 15:50
Copy link
Member

@cmoussa1 cmoussa1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start @wihobbs! I've left some comments mostly just looking at grammar; feel free to take them or leave them. Sorry in advance if I am misunderstanding some of the RFC language here!

spec_44.rst Outdated Show resolved Hide resolved
spec_44.rst Outdated Show resolved Hide resolved
spec_44.rst Outdated Show resolved Hide resolved
spec_44.rst Outdated Show resolved Hide resolved
spec_44.rst Outdated Show resolved Hide resolved
spec_44.rst Outdated Show resolved Hide resolved
spec_44.rst Outdated Show resolved Hide resolved
spec_44.rst Outdated Show resolved Hide resolved
spec_44.rst Outdated Show resolved Hide resolved
spec_44.rst Outdated Show resolved Hide resolved
Copy link
Member

@garlick garlick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent!

I had a few minor suggestions about the flow of the document. Feel free to ignore if it doesn't make sense to you!

index.rst Outdated Show resolved Hide resolved
spec_44.rst Outdated Show resolved Hide resolved
spec_44.rst Outdated Show resolved Hide resolved
spec_44.rst Outdated Show resolved Hide resolved
Copy link
Member

@chu11 chu11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a side question, is the python driver typically going to be associated with system instance? So we'd run this python driver as a side daemon.

Or is the python driver something the user would start with their subinstance? How would we expect them to generlaly start the process? Like add a python myscript.py & in their batch file?

spec_44.rst Outdated Show resolved Hide resolved
spec_44.rst Outdated
Comment on lines 171 to 180
The default behavior SHALL be to send a notification to the users' primary email
address, as provided by an LDAP query, when the job reaches the START and FINISH
events.
Copy link
Member

@chu11 chu11 May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially read this and thought this implementation was way too specific. But then I thought maybe you're just giving a use case example. Perhaps the words "The default behavior" threw me off. Maybe "For example, the default behavior could be ..."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default behavior is supposed to be detailing what happens when a user says flux batch --setattr=system.notify=default. The eventual goal is to support doing crazy things like --setattr=system.notify.service=slack --setattr=system.notify.events=validate,epilog-start --setattr=system.notify.include="{id.f58} {R} {eventlog}, etc. so I wanted to give the behavior in the general case (which is probably all we'll actually support in v1.)

I do think that maybe specifying LDAP is a tad specific. Though I've generally tried to be as specific as possible in this RFC since its more of a design document for a service than a traditional RFC...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we may want to think through the ergonomics of requesting notifications before codifying into an RFC. Requiring a user to specify multiple --setattr options on the command line might not go over that well..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, it would be fine to have something in here, it can always be updated in the future. However, just keep in the back of your mind that users would appreciate a single --setattr so perhaps comma delimited options or other ways to provide options in a single go should be considered.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For longer term, maybe a design that is readable and simple, something like --notify.service=slack I can read from left to right.

@wihobbs
Copy link
Member Author

wihobbs commented May 13, 2024

Just a side question, is the python driver typically going to be associated with system instance? So we'd run this python driver as a side daemon.

The idea was that a sysadmin would load the jobtap plugin (maybe just include it in the config?) and then start the Python driver under systemd on the management node.

Still TBD is how a user would be able to start this service themselves to get notifications for batch jobs in sub-instances (which is currently how some DATs are organized).

@wihobbs wihobbs force-pushed the flanrfc branch 2 times, most recently from e227539 to a350340 Compare May 13, 2024 22:36
Copy link
Contributor

@grondo grondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I just took a quick first pass 👍

spec_44.rst Outdated Show resolved Hide resolved
spec_44.rst Outdated

The ``notify.enable`` request has no payload.

At initialization the python driver SHALL create a kvs subdirectory, ``notify``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the directory already exists? :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Python driver can traverse the existing directory for jobid keys.

If the jobtap plugin could insert and remove jobid keys itself, this eliminates the need for the record of jobids in the jobtap plugin ("hash table"). The jobtap plugin could have an INACTIVE callback to remove, and add the key in DEPEND. Currently the KVS is only being used to ensure uniqueness of the notification being sent.

Thoughts on this design?

spec_44.rst Outdated Show resolved Hide resolved
spec_44.rst Outdated Show resolved Hide resolved
spec_44.rst Outdated Show resolved Hide resolved
spec_44.rst Outdated Show resolved Hide resolved
spec_44.rst Outdated Show resolved Hide resolved
spec_44.rst Outdated Show resolved Hide resolved
spec_44.rst Outdated Show resolved Hide resolved
spec_44.rst Outdated Show resolved Hide resolved
index.rst Outdated Show resolved Hide resolved
spec_44.rst Outdated Show resolved Hide resolved
spec_44.rst Outdated Show resolved Hide resolved
spec_44.rst Outdated
- Support notification after any event of the job, where events are defined in
:doc:`spec_21`.
- Support email as the primary end user notification delivery.
- Build a driver capable of sending POST requests to chat services for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do you see tokens or credentials for these posts going? I assume it's on the level of the user? So the user needs to setup their own slack app to make it work? But what about an organization or group - how would that be handled, permissions wise for the credentials?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I had envisioned credentials/tokens being stored in a config file on the management node. The only thing a user would need to pass to the jobspec would be their handle, i.e. hobbs17 on the LLNL slack. The slack admins would set up a bot to handle delivering the message to the user, and the sysadmins would store the webhooks for this bot in a toml file on the management node.

An edge case I haven't considered yet is "what if we need to support multiple 'teams' on Slack," i.e. LLNL and llnl-performance.

I should probably go read the Slack/Mattermost API docs and make sure this approach will actually work. I've built very little with MM and nothing with Slack...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any event I would probably place configuration location of third party tokens, etc. outside the scope of this document, which should focus on how the service monitors jobs for events of interest to turn them into generic notifications. (i.e. the protocol between Python service and a jobtap plugin, how restarts are handled, etc)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spec_44.rst Outdated Show resolved Hide resolved
spec_44.rst Outdated

.. note::
This design is intended to ensure that no double-notifications are sent upon
the restart of the Python script, the jobtap plugin, or the job-manager.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless the directory deletion is missed?

spec_44.rst Outdated
If a disconnect request is received by the jobtap plugin, this indicates the
python driver has exited. The jobtap plugin SHALL continue to add notification-
enabled jobs to its hash table as they enter the DEPEND state. When the python
driver reconnects, the jobtap plugin SHALL respond to its initial ``notify.enable``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if it doesn't reconnect?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"reconnect" is probably the wrong word. The Python driver if disconnected from the jobtap plugin will exit immediately and log an error. Eventually, systemd or an admin will start a new Python driver which sends a new initial notify.enable RPC request. No notifications can be sent while there's not an active Python driver process.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question then is - what if the admin doesn't start a new Python driver? The hash table keeps filling up and explodes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really good point. I hadn't thought of this. Maybe the jobtap plugin should remove jobids when they reach an inactive state... the notifications would be missed, though. Hmm. 🤔

There might be a way to log a warning reminding the user/admins to start the Python driver, but I'll have to look into that more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add this to the list of things to investigate. It shouldn't be an issue to keep a hash/list of all jobids with notifications, but we should not assume there is not a better way at this point in the design.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this, my question above is becoming a bit more urgent. Is there a reason to keep these things separate? If so, is there a reason the jobtap plugin shouldn't manage the lifetime of the driver as a subprocess or systemd unit?

spec_44.rst Outdated Show resolved Hide resolved
spec_44.rst Outdated
-------------------------

The webhooks and other secrets required to connect to chat services SHALL be included
in a ``config.toml`` file. The path to this file MUST be provided to the FLAN
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would either place this in a directory that has a meaningful path (for the developer to see) or give it name that is more specific to describe what it is. And I am wondering who has ownership of this file - a single user or if a center needs to take responsibility, for example, to provide a single credential for all users to use (is that a good idea)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely both use cases could be wanted (the "I am a user with my own custom plugin and private credential" and "I am a center providing this to my users."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, maybe for the "center providing for users" case this needs an /etc/flux/notify.d/ or similar. Probably an implementation detail outside this RFC, but I had not thought about standardizing the config file path. Good idea!

spec_44.rst Outdated
In the event the job-manager crashes or is shut down the python driver SHALL exit
immediately and log an error.

Flux does not currently support restarting with running jobs. However, on a system
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to restore state sounds messy - is there any reason to not recreate notifications on restart so there aren't any mismatches between state brought up and state known to the plugin?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notifications would be recreated internally in the Python driver, since losing its connection to Flux is a fatal error (and that would happen in a system-instance restart).

Restoring state is messy...but I want to try for it anyway :). Here's an idea for how:

  1. Python driver crashes because job-manager is shut down.
  2. Job-manager is restarted (and so is the KVS). KVS comes back; jobtap plugin is loaded on start. All job eventlogs are replayed.
  3. All of the jobtap plugin depend callbacks are hit again. Jobtap plugin records these jobids.
  4. A "new" Python driver starts and gets a bunch of jobids sent to it. It watches the events [that may be somewhat stale.] (This was @chu11's point in a message above.) It does not add jobid folders to the KVS because they are already there.
  5. Before it sends a notification, it checks the KVS notify.<jobid> folder to see if __ (event) key is there. If the key is not there, that means a notification has already been sent. (On reflection, maybe it should be flipped -- i.e. if the key is there, that means a notification has already been sent.)
  6. Stop! Do not pass go, do not collect $200! Because we know the job is being watched, but a notification has already been sent, keep chugging and watching for others that may not have been sent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit unclear here what is meant by the "depend callbacks are hit again".

Note that there are two jobtap callbacks related to dependencies and the DEPEND state:

  • job.state.depend: One of the generic job.state.* callbacks, called when the job reaches the DEPEND state.
  • job.dependency.SCHEME: Called for each dependency SCHEME in jobspec of a submitted job

I don't think the job.state.* callbacks are called on reload. You can test this by writing a jobtap plugin that subscribes to all these states and prints what it sees on a restart.

The job.dependency.SCHEME callbacks are specifically issued on a restart for jobs with dependencies defined in jobspec.

I know we talked about this before, but I don't think I properly understood you were talking about one of these callback types and not the other.

Probably the best way for plugins to be notified of new jobs is to use the job.new callback, which is called for new jobs after they've been validated, on plugin reload, and on job manager restart.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For lack of a better place to document this (tagging @garlick as well):

It appears the job.state.priority callback is re-triggered when flux start is run in recovery mode with pending jobs, but no other job.state.* callbacks.

#include <flux/core.h>
#include <flux/jobtap.h>
#include <stdio.h>
#include <jansson.h>
#include <stdlib.h>

static int print_cb (flux_plugin_t *p,
                     const char *topic,
                     flux_plugin_arg_t *args,
                     void *arg)
{
    flux_jobid_t id;
    long unsigned int state;
    flux_t *h = flux_jobtap_get_flux(p);
    json_t *event;

    if (flux_plugin_arg_unpack(args, 
                               FLUX_PLUGIN_ARG_IN,
                               "{s:I s:i s:o}",
                               "id", &id,
                               "state", &state,
                               "entry", &event) < 0)
        return -1;
    
    if (id) {
        char* json_string = json_dumps(event, JSON_ENCODE_ANY);
	    flux_log(h, 0, "jobid %ld has entered state %s\n", id, json_string);
    }

    return 0;
}

static const struct flux_plugin_handler tab[] = {
    {"job.state.*", print_cb, NULL},
    {0},
};

int flux_plugin_init (flux_plugin_t *p) {
    if (flux_plugin_register(p, "jobtap-print", tab) < 0) {
        return -1;
    }
    return 0;
}

And on shutdown/restart:

  auk108 ~/scripts/jobtap-test $ flux start -o,--config-path=$(pwd)/config.toml
(s=1,d=0)  auk108 ~/scripts/jobtap-test $ flux submit --urgency=0 hostname
May 14 15:19:03.251059 job-manager.emerg[0]: jobid 330880253952 has entered state 140651589009410 {"timestamp": 1715725143.2404392, "name": "validate"}
May 14 15:19:03.251083 job-manager.emerg[0]: jobid 330880253952 has entered state 140651589009412 {"timestamp": 1715725143.2510674, "name": "depend"}
May 14 15:19:03.251104 job-manager.emerg[0]: jobid 330880253952 has entered state 140651589009416 {"timestamp": 1715725143.251091, "name": "priority", "context": {"priority": 0}}
ƒ9h7knoh
(s=1,d=0)  auk108 ~/scripts/jobtap-test $ flux shutdown --dump=$(pwd)/dump.tar
  auk108 ~/scripts/jobtap-test $ flux start -o,--config-path=$(pwd)/config.toml --recovery=$(pwd)/dump.tar
May 14 15:22:46.751219 job-manager.emerg[0]: jobid 330880253952 has entered state 139796890517508 {"timestamp": 1715725366.7511699, "name": "flux-restart"}
May 14 15:22:46.751270 job-manager.emerg[0]: jobid 330880253952 has entered state 139796890517512 {"timestamp": 1715725366.7512314, "name": "priority", "context": {"priority": 0}}
+-----------------------------------------------------
| Entering Flux recovery mode.
| All resources will be offline during recovery.
| Any rc1 failures noted above may result in
|  reduced functionality until manually corrected.
|
| broker.rc1_path    /home/hobbs17/flux-core/../bin/etc/flux/rc1
| broker.rc3_path    /home/hobbs17/flux-core/../bin/etc/flux/rc3
| config.path        /home/hobbs17/scripts/jobtap-test/config.toml
| statedir           changes will not be preserved
|
| Exit this shell when finished.
+-----------------------------------------------------

I accept the suggestion to tie the streaming RPC response to job.new, which is replayed on restart (tweaked the test above and verified).

We will have to make the ability to restart the service dependent on a requirement that this plugin be included in the job-manager's plugins table in the config.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

job.new is also called for every active job when a plugin is loaded (on the assumption that this is the method that "introduces" jobs to jobtap plugins). This should allow the plugin to be reloaded (e.g. to get a new version) and get back to the same place it was before it was unloaded.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job on the test plugin, btw. You could tweak it to register callbacks for * if you want to see everything.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad I posted this because I lost it in my home directory somewhere over the last 3 months!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to restore state sounds messy

And this is correct. A thought for the new design based on the job-manager journal of events -- I wonder always storing the most recently received event would be enough to restore state? This could be stored somewhere persistent and overwritten each time a new event was received. Then, on a restart, you could inspect the event and throw out any event that took place before it. I'll think on this more...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With a dispatcher handling actual notification transmission there's an opportunity a few notifications would be missed on an abrupt shutdown, but maybe that's not so bad for a first pass.

@wihobbs
Copy link
Member Author

wihobbs commented May 14, 2024

Note that ^ push is mostly grammar/clarity fixes and I'm still working through the [excellent, but involved :) ] design feedback.

@vsoch
Copy link
Member

vsoch commented May 15, 2024

@wihobbs you've probably already exceeded most of us in knowledge of jobtap plugins, but in case you are interested we are having a little (hopefully fun) hackathon this week on Thursday - grandmaster @trws is going to show us the ropes for making plugins, and ❗ I think there might be rust involved! If you are interested, even for the fun or sharing your expertise, I can send along the invite.

@wihobbs wihobbs force-pushed the flanrfc branch 2 times, most recently from b6366cc to 2883259 Compare May 15, 2024 17:59
@wihobbs
Copy link
Member Author

wihobbs commented May 15, 2024

Tons of great feedback on this. Thanks to everybody for participating and sharing your expertise.

I tried to close or consolidate duplicate issues. I apologize in advance if something was inadvertently closed -- please reopen if so.

I'm making a high-level summary of the outstanding design todos:

  • Recovery of state on a crash. Note that there are three things that can crash, and we should possibly have a different strategy for handling each:
    • Crash of the job-manager (+Python driver+jobtap plugin) -- essentially restoring the state of everything.
    • Crash of the Python driver, but jobtap plugin and job-manager persist.
    • Crash of the jobtap plugin, but not the job-manager (+?Python driver)
  • Ensuring uniqueness of events before sending notifications (related: no double-or-worse notifications, and no missed notifications)
  • Rate limiting: why do we need it and how will we implement it?
  • Extensibility of the service via plugins (for external services such as Slack). Implementation of individual services is not a requirement.
  • Ergonomics of the "advanced use case," where we allow a user to specify services other than email, non-standard things to include, and states other than START and FINISH in which to trigger notifications.

spec_44.rst Outdated
`flux-jobtap-plugins(7) <https://flux-framework.readthedocs.io/projects/flux-core/en/latest/man7/flux-jobtap-plugins.html>`_
that streams the jobids of notification-enabled jobs to the Python driver.

The Python driver
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this documenting a thing that already exists? If not, is there a reason these two should be separate? If it's a language issue, a python-based jobtap plugin shouldn't be an issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's largely because the jobtap plugin resides in the job manager and we want it to do as little as possible since it would be in the critical path.

The jobtap plugin (at the time) seemed like the most efficient way to avoid having to fetch and examine the jobspec of every job since it can just peek at the jobspec that the job manager already has in memory and send the jobid along to the Python process only when the user has requested notification.

However, I'm glad you asked because I wonder if there's a better way to do this using the job-manager journal or some other more widely applicable method. It sure would be nice if it was easy for people to develop services like this and not have to install yet another jobtap plugin to identify jobs of interest.

Something to consider...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's a good point. Blocking the job manager would be unfortunate.

The journal is a good thought, or maybe even a pub-sub event from the plugin so anyone could subscribe to it? Not sure, there are definitely tradeoffs here, but it sounds like the current approach has some potential to cause lost or repeated notification as well, especially racing with the event delivery and kvs removal vs jobtap replay if flux is restarted.

@wihobbs
Copy link
Member Author

wihobbs commented Oct 10, 2024

I need to reread some of the previous comments before asking for another set of reviews. This is a pretty substantial update to the proposed design.

Some concerns will stay and some will go, but a major advantage of subscribing to the journal of events is that it absolutely cannot block the job manager, because it's a separate process completely.

I also noticed that we've ended up in a race over who gets RFC no. 44 😉😆

@wihobbs wihobbs force-pushed the flanrfc branch 2 times, most recently from b364b0d to 082ef36 Compare October 10, 2024 20:52
Problem: no design currently exists for the Flux
email service as noted in flux-framework/flux-core#4435.

Add a RFC-style document detailing this.
@wihobbs wihobbs force-pushed the flanrfc branch 2 times, most recently from d775154 to 066fc56 Compare October 17, 2024 22:37
@wihobbs wihobbs requested review from cmoussa1 and grondo October 17, 2024 22:44
@wihobbs
Copy link
Member Author

wihobbs commented Oct 17, 2024

Ideas that were discussed, but not included in the RFC:

  • Posting an event to the eventlog of the job when a notification is sent. It seems easier to keep the record of what's been sent all in one place that's quickly accessible by the driver, rather than dissociated in each job's eventlog.
  • A serialized hash table, hashing each notification before sending, so that a collision (sent notification) can be easily detected. I determined that this was an implementation detail and too deep for an RFC.
  • Using systemd freeze and thaw for storing the state of the driver. I don't think this will work in the way we thought, and we have access to the KVS, so I figured we'd just use that for storing state.

@grondo
Copy link
Contributor

grondo commented Oct 17, 2024

Posting an event to the eventlog of the job when a notification is sent. It seems easier to keep the record of what's been sent all in one place that's quickly accessible by the driver, rather than dissociated in each job's eventlog.

Haven't read through the new RFC yet, but just as a counterpoint here: There are some nice benefits to using an event in the eventlog. For example when jobs are purged those events won't appear on the next restart (the notification service will have to purge its own records otherwise, adding unnecessary complexity). Also, you're already getting all job events through the journal, so this would limit the number of sources the notification service has to deal with, again simplifying the implementation.

That being said, I think we discussed that perhaps an event posted by the notification service would appear in the job eventlog after it is needed (e.g. the start event is posted, the service sends a notification, then posts a notification event -- when restarting the service would see the start event before the notification event so it may not be helpful for preventing duplicate notifications) IMO, this is a more compelling argument to drop the suggestion that events be posted by the notification service in job eventlogs than because it seems easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants