-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify pipeline configuration #739
Comments
I sketched out how the new container app might look: This would eventually replace the apps for archive, method, pipeline, transformation, some of librarian, and possibly metadata and datachecking. |
After team discussion, the plan is to switch to singularity first by completing #737, then tackle this simplification. |
Also break an import cycle by removing link from librarian to method.
Add container analysis page that lets you choose a container app and creates a container run. Doesn't select inputs or actually launch anything yet. Generate migration for remaining tables.
Add more context to a test that might be flaky.
This should keep the original annotations in the new location.
Set favicon for Django REST framework API.
Also set Slurm options like memory and priority.
Add test coverage for API code. Add some more API features for MiCall.
Return JSON objects from endpoint calls. Call API client tests on Travis. Add some detail URL's and filters to server API. Add Dataset.is_purged, and include it in the API.
Catch exceptions in runcontainer command. Handle external datasets in runcontainer command.
Also start adding some missing search filters.
Also copy permissions from run to outputs.
Background
I've recently been adding support for optional arguments as part of #511 and thinking about switching from Docker to Singularity for #737. One of the main enhancements to the current milestone is #723 to make it easier to create a new method by not defining the outputs.
All of this has got me thinking about how complex the code is for configuring a new pipeline and executing a pipeline in a run. It took over a week to add support for optional arguments, and I still have to make changes to the user interface and API for launching runs. The complexity also makes Kive slow at run time, as it searches for results that it can reuse.
We have learned a lot about new tools since we started the Kive project: Slurm, GitHub/GitLab, and Docker/Singularity. Could we replace a lot of the complexity in Kive with features from these tools? We did this in the past when we moved from MPI to Slurm.
This issue is a proposal for the team to discuss - a plan to simplify Kive.
Features to Remove
These are the main sources of complexity. Are they worth it?
Features to Extract
pipeline.json
file in the tar.gz file with the scripts./mnt/bin
, plus a driver script that reads thepipeline.json
file and executes the steps.pipeline.json
, and uploading it to another Kive server. A Kive container can be either a Singularity image or a tar.gz file. The tar.gz file is a layer on top of the default Singularity image.pipeline.json
, it can copy thepipeline.json
from the previous revision in the container family.Features to Keep
These features are working well, and I think they're the main value of Kive.
Features to Add
Link to Git comparison of different pipeline versions.
Define pipeline configuration in Singularity labels. That way, you could migrate a pipeline from development to test to production by copying the image file and nothing else.
Task List
ContainerArgument
table, and configure on container page.ContainerRun
and related tables.Purge sandboxes.Janitor tasks for container runs #750Purge log files.Janitor tasks for container runs #750Purge output datasets.Janitor tasks for container runs #750Container as an archive of scripts, with pipeline configuration in a JSON file.Run a pipeline in a container #751Remove old features.Remove old runs and pipelines #752The text was updated successfully, but these errors were encountered: