Skip to content

Commit

Permalink
docs: Add workload alerting (#9938)
Browse files Browse the repository at this point in the history
  • Loading branch information
tara-hpe authored Sep 19, 2024
1 parent cedfcfe commit eb1b0de
Show file tree
Hide file tree
Showing 5 changed files with 223 additions and 112 deletions.
62 changes: 45 additions & 17 deletions docs/integrations/_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,23 +7,51 @@
.. meta::
:description: Discover how Determined integrates with other popular machine learning ecosystem tools.

Determined is designed to easily integrate with other popular ML ecosystem tools for tasks that are
related to model training, such as ETL, ML pipelines, and model serving. It is recommended to use
the :ref:`python-sdk` to interact with Determined.

- :ref:`data-transformers`: Dive into how Determined integrates with data transformation tools such
as :ref:`pachyderm-integration`.
- :ref:`ides-index`: Determined shells can be used in the popular IDEs similarly to a common remote
SSH host.
- :ref:`notifications`: Make use of webhooks to integrate Determined into your existing workflows.
- :ref:`prometheus-grafana`: Discover how to enable a Grafana dashboard to monitor Determined
hardware and system metrics on a cloud cluster, such as AWS or Kubernetes.

Learn more:

Visit the `Works with Determined </~https://github.com/determined-ai/works-with-determined>`__
repository to find examples of how to use Determined with a variety of ML ecosystem tools, including
Pachyderm, DVC, Delta Lake, Seldon, Spark, Argo, Airflow, and Kubeflow.
Determined seamlessly integrates with popular ML ecosystem tools to enhance your model training
workflow. From data transformation to monitoring and alerting, our integrations help streamline your
ML pipeline.

******************
Key Integrations
******************

- **Data Transformation**: Integrate with tools like :ref:`pachyderm-integration` to streamline
your data preprocessing.

- **Development Environments**: Use Determined shells in popular IDEs, similar to remote SSH hosts.
Learn more at :ref:`ides-index`.

- **Workload Alerting**: Set up :ref:`workload-alerting` through webhooks to stay informed about
your experiments in real-time. For a comprehensive overview of notification options, see
:ref:`notifications`.

- **Monitoring**: Enable Grafana dashboards to monitor hardware and system metrics on cloud
clusters. See :ref:`prometheus-grafana` for details.

*****************
Getting Started
*****************

To make the most of these integrations, we recommend using the :ref:`python-sdk` to interact with
Determined.

**************
Explore More
**************

Visit our `Works with Determined </~https://github.com/determined-ai/works-with-determined>`__
repository for examples of using Determined with various ML ecosystem tools, including:

- Pachyderm
- DVC
- Delta Lake
- Seldon
- Spark
- Argo
- Airflow
- Kubeflow

These examples demonstrate how Determined can enhance your existing ML workflows and tools.

.. toctree::
:hidden:
Expand Down
118 changes: 23 additions & 95 deletions docs/integrations/notification/_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,14 @@
Notifications
###############

Monitoring experiment status is a vital part of working with Determined. In order to integrate
Determined into your existing workflows, you can make use of webhooks to update other systems,
receive emails, slack messages, and more when an experiment is updated.
Monitoring experiment status is crucial when working with Determined. To integrate Determined into
your existing workflows, you can use :ref:`workload-alerting` through webhooks. This feature allows
you to receive timely updates about your experiments via various channels such as email, Slack
messages, or other systems.

Workload alerting is particularly useful for real-time monitoring, debugging, and custom
notifications. For example, you can configure alerts to trigger as soon as specific events occur in
your experiments, rather than waiting for tasks to reach final states like "Completed" or "Error".

Webhooks such as tasklog webhooks are useful for real-time monitoring, debugging, custom
notifications, and integration with other systems. For example, using ``Tasklog``, you could get
Expand Down Expand Up @@ -127,15 +132,17 @@ Below is an example of handling a signed payload in Python.
Supported Triggers
==================

``Completed`` or ``Error`` will be triggered when an experiment in scope is completed or errored.
Determined supports the following webhook trigger types:

``COMPLETED`` or ``ERROR`` will be triggered when an experiment in scope is completed or errored.

``Tasklog`` will be triggered when a task matching regex is detected.
``TASKLOG`` will be triggered when a task matching regex is detected.

``Custom`` will only be triggered from experiment code.
``CUSTOM`` will only be triggered from experiment code.

.. code::
# Here is an example code to trigger a custom trigger.
# Example code to trigger a custom trigger.
# config.yaml
integrations:
Expand All @@ -147,99 +154,20 @@ Supported Triggers
with det.core.init() as core_context:
core_context.alert(title="some title", description="some description", level="info")
*******************
Creating Webhooks
*******************

To create a webhook, follow these steps:

- Navigate to ``/det/webhooks`` or select **Webhooks** in the left-side navigation pane.
- Choose **New Webhook**.

.. image:: /assets/images/webhook.png
:width: 100%
:alt: Webhooks interface showing New Webhook button.

.. note::

If you do not have sufficient permissions to view and create webhooks, consult with a systems
administrator.

- Workspace: Select a workspace where you have permission to create webhooks.
- Name: Supply a unique identifier for referencing the webhook in the experiment configuration.
- URL: Enter the webhook URL.
- Type: Choose either ``Default`` or ``Slack``. The ``Slack`` type automatically formats message
content for better readability on Slack.
- Trigger: Select the event you want to monitor. See the list of supported triggers in the
:ref:`supported-webhook-triggers` section.
- Triggered by: Choose whether to monitor all experiments within the workspace. For the ``Custom``
option, the trigger applies only to specific experiments.

.. code::
# Example of an experiment configuration with webhooks
integrations:
webhooks:
webhook_name:
- <webhook_name>
- Regex: If the webhook is configured to trigger on Tasklog, define a regex using `Golang Regex
Syntax <https://pkg.go.dev/regexp/syntax>`_.

.. image:: /assets/images/webhook_modal.png
:width: 100%
:alt: Webhook user interface showing the fields you will interact with.

Once created, your webhook will automatically execute for the selected events within the specified
experiments.

******************
Testing Webhooks
******************

To test a webhook, select the more-options menu to the right of the webhook record to access
available actions.

.. image:: /assets/images/webhook_action.png
:width: 100%
:alt: Webhooks interface showing where to find the actions menu

Select **Test Webhook** to trigger a test event to be sent to the defined webhook URL with a mock
payload as stated below:
****************
Using Webhooks
****************

.. code::
{
"event_id": "b8667b8a-e14d-40e5-83ee-a64e31bdc5f4",
"event_type": "EXPERIMENT_STATE_CHANGE",
"timestamp": 1665695871,
"condition": {
"state": "COMPLETED"
},
"event_data": {
"data": "test"
}
}
*******************
Deleting Webhooks
*******************
To get started with webhooks in Determined:

To delete a webhook, select the more-options menu to the right of the webhook record to expand
available actions.
#. For step-by-step instructions on creating webhooks, see :ref:`creating-webhooks`.

******************
Editing Webhooks
******************
#. For use cases and best practices, visit :ref:`workload-alerting` guide.

To edit a webhook, select the more-options menu to the right of the webhook record to expand
available actions.

.. note::
#. For platform-specific integration guides, see:

Determined only supports editing the URL of webhooks. To modify other attributes, delete and
recreate the webhook.
- :ref:`slack-integration`
- :ref:`zapier-integration`

.. toctree::
:caption: Notification
Expand Down
2 changes: 2 additions & 0 deletions docs/integrations/notification/slack.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _slack-integration:

#######
Slack
#######
Expand Down
151 changes: 151 additions & 0 deletions docs/integrations/notification/workload-alerting.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
.. _workload-alerting:

###################
Workload Alerting
###################

Workload alerting allows you to monitor the state of your experiments and share important
information with your team members. This feature enables proactive issue detection while maintaining
a good signal-to-noise ratio.

.. note::

To use this experimental feature, enable "Webhook Improvement" in :ref:`user settings
<web-ui-user-settings>`.

**************
Key Concepts
**************

- Webhook Trigger options: "All experiments in Workspace" and "Specific experiment(s) with matching
configuration"
- Webhook Exclusion
- Trigger Types: COMPLETED, ERROR, TASKLOG, CUSTOM
- Alert Levels: INFO, WARN, DEBUG, ERROR

For detailed information on supported triggers and example usage, see :ref:`notifications`.

.. _creating-webhooks:

*******************
Creating Webhooks
*******************

As a non-admin user with Editor or higher permissions, you can configure webhooks within your
workspace. Here's how to create webhooks:

#. Navigate to the **Webhooks** section in the WebUI.

#. Select **New Webhook**.

#. In the New Webhook dialogue:

- Select your Workspace
- Name your webhook
- Paste the webhook URL (e.g., from Zapier)
- Set Type to either Default or Slack
- Select the Trigger event (COMPLETED, ERROR, TASKLOG, or CUSTOM)
- Choose the Trigger by option: "All experiments in Workspace" or "Specific experiment(s) with
matching configuration"
- If "Specific experiment(s) with matching configuration", note the Webhook Name for use in
experiment configurations

#. Click **Create Webhook**.

*******************
Deleting Webhooks
*******************

To delete a webhook, select the more-options menu to the right of the webhook record to expand
available actions.

******************
Editing Webhooks
******************

To edit a webhook, select the more-options menu to the right of the webhook record to expand
available actions.

.. note::

Determined only supports editing the URL of webhooks. To modify other attributes, delete and
recreate the webhook.

***********
Use Cases
***********

Webhooks in Determined offer versatile solutions for various monitoring and alerting needs. Let's
explore some common use cases to help you leverage this powerful feature effectively.

Case 1: Share Simple State on All Experiments in Workspace
==========================================================

This use case is ideal for teams that want to maintain a broad overview of all experiments running
in a workspace, ensuring that no important updates are missed.

#. Create a webhook with the "All experiments in Workspace" option.
#. Select the desired trigger events (COMPLETED, ERROR, TASKLOG).
#. All experiments in the workspace will now trigger this webhook unless explicitly excluded.

Case 2: Exclude Specific Experiments from Triggering Webhooks
=============================================================

During active development or debugging, you may want to prevent certain experiments from triggering
alerts to reduce noise and focus on specific tasks.

#. Edit the experiment configuration:

.. code:: yaml
integrations:
webhooks:
exclude: true
#. Run the experiment and verify that no webhooks are triggered.

Case 3: Customizable Monitoring for Specific Experiments
========================================================

For critical experiments or those requiring special attention, you can set up custom monitoring to
receive tailored alerts based on specific conditions or events in your code.

#. Create a webhook with the "Specific experiment(s) with matching configuration" option and
"CUSTOM" trigger type.

#. Note the Webhook Name.

#. In the experiment configuration, reference the webhook:

.. code:: yaml
integrations:
webhooks:
webhook_name:
- <webhook_name>
#. In your experiment code, use the `core_context.alert()` function to trigger the webhook:

.. code:: python
with det.core.init() as core_context:
core_context.alert(
title="Custom Alert",
description="This is a custom alert",
level="INFO"
)
#. Run the experiment and check the event log in your webhook service for the custom data.

For more details on custom triggers, see :ref:`notifications`.

****************
Best Practices
****************

- Use "Open" subscription mode for general monitoring of all experiments in a workspace.
- Leverage "Run specific" mode and custom triggers for fine-grained control over alerts for
critical experiments.
- Use webhook exclusion for experiments under active iteration to reduce noise.
- Regularly review and update your webhook configurations to ensure they remain relevant and
useful.
2 changes: 2 additions & 0 deletions docs/integrations/notification/zapier.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _zapier-integration:

########
Zapier
########
Expand Down

0 comments on commit eb1b0de

Please sign in to comment.