Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create user-facing documentation for DAG Factory #278

Closed
tatiana opened this issue Oct 30, 2024 · 3 comments
Closed

Create user-facing documentation for DAG Factory #278

tatiana opened this issue Oct 30, 2024 · 3 comments
Assignees

Comments

@tatiana
Copy link
Collaborator

tatiana commented Oct 30, 2024

At the moment, all the DAG Factory documentation is a README.

We should collaborate with Astronomer's doc team and @cmarteepants to decide what content to cover and how to represent it.

Once this is done, we should write the documents using Markdown or Rest, based on what is agreed upon.

@tatiana tatiana added this to the DAG Factory 0.21.0 milestone Oct 30, 2024
tatiana added a commit that referenced this issue Dec 6, 2024
Implement support for [Airflow
TaskFlow](https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/taskflow.html),
available since 2.0.

# How to test

The following example defines a task that generates a list of numbers
and another that consumes this list and creates dynamically (using
Airflow dynamic task mapping) an independent task that doubles each
individual number.
```
example_taskflow:
  default_args:
    owner: "custom_owner"
    start_date: 2 days
  description: "Example of TaskFlow powered DAG that includes dynamic task mapping."
  schedule_interval: "0 3 * * *"
  default_view: "graph"
  tasks:

    numbers_list:
      decorator: airflow.decorators.task
      python_callable: sample.build_numbers_list

    double_number_with_dynamic_task_mapping_taskflow:
      decorator: airflow.decorators.task
      python_callable: sample.double
      expand:
          number: +numbers_list  # the prefix + tells DagFactory to resolve this value as the task `numbers_list`, previously defined
```

For the `sample.py` file below:
```
def build_numbers_list():
    return [2, 4, 6]


def double(number: int):
    result = 2 * number
    print(result)
    return result
```

In the UI, it is shown as:
![Screenshot 2024-12-06 at 11 53
04](/~https://github.com/user-attachments/assets/0643002a-2530-4bc1-af39-16fb3f48d4d4)

And:

![Screenshot 2024-12-06 at 11 52
28](/~https://github.com/user-attachments/assets/2c2ed46a-4ee8-438a-836d-3112b4737c6a)

# Scope

This PR includes several use cases of [dynamic task
mapping](https://airflow.apache.org/docs/apache-airflow/2.10.3/authoring-and-scheduling/dynamic-task-mapping.html):
1. Simple mapping
2. Task-generated mapping
3. Repeated mapping
4. Adding parameters that do not expand (`partial`)
5. Mapping over multiple parameters
6. Named mapping (`map_index_template`)

The following dynamic task mapping cases were not tested but are
expected to work:
* Mapping with non-TaskFlow operators
* Mapping over the result of classic operators
* Filtering items from a mapped task

The following dynamic task mapping cases were not tested and should not
work (they were considered outside of the scope of the current ticket):
* Assigning multiple parameters to a non-TaskFlow operator
* Mapping over a task group
* Transforming expanding data
* Combining upstream data (aka “zipping”)

# Tests

The feature is being tested by running the example DAGs introduced in
this PR, which validate various scenarios of task flow and dynamic task
mapping and serve as documentation.

As with other parts of DAG Factory, we can and should improve the
overall unit test coverage.

Two example DAG files were added, containing multiple examples of
TaskFlow and Dynamic Task mapping. This is how they are displayed in the
AIrflow UI:
<img width="1501" alt="Screenshot 2024-12-06 at 16 11 10"
src="/~https://github.com/user-attachments/assets/c4d12520-31f5-4b9d-b191-dd37523299e1">
<img width="1500" alt="Screenshot 2024-12-06 at 16 11 42"
src="/~https://github.com/user-attachments/assets/ab08749f-aedb-4c8f-9df1-8f0d0451477d">
<img width="1510" alt="Screenshot 2024-12-06 at 16 11 32"
src="/~https://github.com/user-attachments/assets/591e949a-49da-49f6-8d4d-1458fbb88d7f">



# Docs

This PR does not contain user-facing docs other than the README.
However, we'll address this as part of #278.

# Related issues

This PR closes two open tickets:

Closes: #302 (support named mapping, via the `map_index_template`
argument)

Example of usage of `map_index_template`:
```
    dynamic_task_with_named_mapping:
      decorator: airflow.decorators.task
      python_callable: sample.extract_last_name
      map_index_template: "{{ custom_mapping_key }}"
      expand:
        full_name:
          - Lucy Black
          - Vera Santos
          - Marks Spencer
```

Closes: #301 (Mapping over multiple parameters)

Example of multiple parameters:
```
    multiply_with_multiple_parameters:
      decorator: airflow.decorators.task
      python_callable: sample.multiply
      expand:
          a: +numbers_list  # the prefix + tells DagFactory to resolve this value as the task `numbers_list`, previously defined
          b: +another_numbers_list # the prefix + tells DagFactory to resolve this value as the task `another_numbers_list`, previously defined
```
@pankajastro
Copy link
Collaborator

Initiated an internal discussion: https://astronomer.slack.com/archives/C015V2JFKT5/p1733904505509939

@pankajastro
Copy link
Collaborator

We discuss further this with team to finalise the technology, template, hosting solution and some initial docs we should work on. Meeting notes: https://www.notion.so/astronomerio/DAG-Factory-Docs-Recs-15a40290af6c803fbd6ed483b6eefb7a

@tatiana tatiana closed this as completed Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants