-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add code infrastructure for pipeline configuration regression tests #12
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Superficially, which is as far as I'm capable of going with this, looks great to me. How does this work interplay, if at all, with the plan to capture all input configuration for every run? Can we use some of this to capture both input values and post configuration values to store? Or can we already extract those values just from the normal run?
This is definitely related work. My mental framework is as follows:
There are three problems there:
This work partially addresses the first problem by helping us be consistent in the rendering/translation process (although it doesn't say anything about it being right). Sufficient test coverage with this tool might convince us of the "rightness". @j2salmingo's work with uclahs-cds/pipeline-Nextflow-config#43 addresses the second problem by capturing the full rendered configuration with the pipeline results. To my knowledge, the third problem is currently unaddressed. |
Thanks, I agree it would be helpful to solve all three problems. |
On the first point, this definitely is not a terrible idea, and I don't think we can throw it away if we wanted to. This would be an awesome utility to take the guesswork out of the downstream effects of configuration changes, and would dramatically improve the quality of code reviews in a number of ways(speed, certainty, trackability of changes).
In terms of top level interface, did you compose the test json by hand or did you have an automated way of constructing the json or at least a skeleton of it. Some of the components seem to be JSONified portions of the schema.yaml Overall this is amazing work Nick! I think if this were actionified it would seriously raise the professional standard of our pipeline work. I do feel it would be worthwhile to present a demo on this work at Nextflow WG meeting so we can stop and ask you implementation level questions |
I hand-composed a file like this: {
"config": [
"test/nftest.config"
],
"params_file": "test/single.yaml",
"cpus": 16,
"memory_gb": 31,
"empty_files": [
"/hot/ref/reference/GRCh38-BI-20160721/Homo_sapiens_assembly38.fasta",
"/hot/ref/tool-specific-input/GATK/GRCh38/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz",
"/hot/ref/tool-specific-input/GATK/GRCh38/Homo_sapiens_assembly38.known_indels.vcf.gz",
"/hot/ref/tool-specific-input/GATK/GRCh38/resources_broad_hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf.gz",
"/hot/ref/tool-specific-input/GATK/GRCh38/Biallelic/hapmap_3.3.hg38.BIALLELIC.PASS.2021-09-01.vcf.gz",
"/hot/resource/SMC-HET/tumours/A-mini/bams/n1/output/S2.T-n1.bam"
],
"mapped_files": [],
"nf_params": {
"output_dir": "/tmp/outputs"
},
"envvars": {
"SLURM_JOB_ID": "851543"
},
"mocks": {
"parse_bam_header": {
"read_group": [
{
"SM": "4915723"
}
]
}
},
"dated_fields": [
"params.log_output_dir",
"report.file",
"timeline.file",
"trace.file"
],
"expected_result": {}
} So that's all of the inputs and an expected result of nothing. When I ran the tool it complained about all of the unexpected results, and wrote them to a new file for me (I trimmed out most of the JSON from the console output below), which I turned right around and committed. That definitely makes these regression tests - I don't swear that it's the correct output, but it's consistent output. $ ./entry.py ~/src/pipeline-recalibrate-BAM/
docker
None
{'all_group_ids': '$(for i in `id --real --groups`; do echo -n "--group-add=$i "; done)', 'enabled': True, 'runOptions': '-u $(id -u):$(id -g) $(for i in `id --real --groups`; do echo -n "--group-add=$i "; done)', 'uid_and_gid': '-u $(id -u):$(id -g)'}
------
manifest
None
{'author': 'Yash Patel', 'description': 'Nextflow pipeline to perform Indel Realignment and Base Quality Score Recalibration', 'name': 'recalibrate-BAM', 'version': '1.0.0-rc.4'}
------
params
None
[full content trimmed]
------
params_schema
None
[full content trimmed]
------
process
None
[full content trimmed]
------
report
None
{'enabled': True, 'file': '/tmp/outputs/recalibrate-BAM-1.0.0-rc.4/TWGSAMIN000001/log-recalibrate-BAM-1.0.0-rc.4-20240215T215116Z/nextflow-log/report.html'}
------
timeline
None
{'enabled': True, 'file': '/tmp/outputs/recalibrate-BAM-1.0.0-rc.4/TWGSAMIN000001/log-recalibrate-BAM-1.0.0-rc.4-20240215T215116Z/nextflow-log/timeline.html'}
------
trace
None
{'enabled': True, 'file': '/tmp/outputs/recalibrate-BAM-1.0.0-rc.4/TWGSAMIN000001/log-recalibrate-BAM-1.0.0-rc.4-20240215T215116Z/nextflow-log/trace.txt'}
------
workDir
None
/scratch/851543
------
yaml
None
[full content trimmed]
------
Saving updated file to /Users/nwiltsie/src/tool-Nextflow-action/run-nextflow-tests/recalibrate-bam-out.json So yes, that |
(Expanding on that - that line of code should have a |
This is absolutely amazing work! Have you considered incorporating this into our nf-test? It is an already published software, so we could even consider another publication of nf-test v2 in the future if more functionalities are added. Are you also expect users to write a series of test cases for different input parameters and enviroments? We can benefit from it when developing locally if this can be accessed from command line directly. A couple of silly questions:
|
I've vaguely considered it - part of my goal in pushing this up now was to solicit feedback on how best to use it. I'm open to whatever makes the most sense!
Yes, the expectation is that we'd have a representative set of tests stored in each pipeline - different CPU counts, different inputs, etc. I agree that it should be available locally as well as in the cloud.
No, not every closure needs to be mocked (in fact, as few as possible should be mocked out). Anything not mocked will be evaluated in the same way it would be in a real pipeline run.
#!/bin/bash
# Clone this pull request
git clone \
--branch nwiltsie-nextflow-regression-logic \
--recurse-submodules \
git@github.com:uclahs-cds/tool-Nextflow-action.git
# Clone the recalibrate-BAM pipeline (with submodules)
git clone \
--recurse-submodules \
git@github.com:uclahs-cds/pipeline-recalibrate-BAM.git
# Run the test bundled with the pull request
./tool-Nextflow-action/run-nextflow-tests/entry.py \
"$(readlink -f ./pipeline-recalibrate-BAM/)" |
uclahs-cds/pipeline-call-gSV#118 is a perfect example of where this can be useful. @Faizal-Eeman and I created the test files from cd08fa7 and a187b4e to match pipeline-call-gSV's current $ ./entry.py ../../../pipelines/pipeline-call-gSV ./call-gsv-F32.json
.paramsproc_resource_params
None
{'call_gSV_Delly': {'cpus': '1', 'memory': '30 GB', 'retry_strategy': {'memory': {'operand': '2', 'strategy': 'exponential'}}}, 'call_gSV_Manta': {'cpus': '1', 'memory': '30 GB', 'retry_strategy': {'memory': {'operand': '2', 'strategy': 'exponential'}}}, 'run_validate_PipeVal': {'cpus': '1', 'memory': '1 GB'}}
------
.process.withName:call_gSV_Delly.memory
30 GB
{'1': '30 GB', '2': '60 GB', '3': '64 GB', 'closure': 'retry_updater(30 GB, exponential, 2, $task.attempt, memory)'}
------
.process.withName:call_gSV_Manta.memory
30 GB
{'1': '30 GB', '2': '60 GB', '3': '64 GB', 'closure': 'retry_updater(30 GB, exponential, 2, $task.attempt, memory)'}
------
proc_names
None
[Ljava.lang.String;@47fc9ce
------
Saving updated file to call-gsv-F32-out.json So the differences are:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! I think we can proceed with this version and make updates as necessary once we begin implementing across pipelines
Okay, this is a lot. None of this is in Action format yet, but I wanted to push up this minimal working version and start getting some feedback on it.
The problem this is solving
We have so much metaprogramming craziness inside the pipeline configurations. The configuration changes based on available CPUs and memory, there are 7 ways to modify parameters, and we have fancy process parameter merging and retry specification.
Given that, it's very difficult to review most pull requests changing configuration logic, as I have exceptionally little understanding of what's going on under the hood.
I want an Action that can run regression tests on our Nextflow pipeline configurations. That is, given a specific commit of a pipeline and a defined configuration (split across config files, parameter files, environment variables, and command line arguments), I want to be able to compute the final configuration in the cloud, without requiring any files outside of the repository. Then that Action can automatically run on pull requests, documenting how the code changes impacted the final rendered configuration.
This PR is the bones of that Action (and a record of my slow descent into madness).
Interface
Test File Format
Tests are defined as specially-formatted JSON files tracked within each pipeline repository, one test per file. For demonstration I've included one test for the recalibrate BAM pipeline with this PR. The configuration appears at the start:
tool-Nextflow-action/run-nextflow-tests/recalibrate-bam.json
Lines 1 to 37 in c890484
The following keys are required (unless stated otherwise all paths are relative to the pipeline root):
nextflow -c <file1> -c <file2>
)nextflow -params-file <file>
)SysHelper::getAvailCpus()
SysHelper::getAvailMemory
(float, GB)touch
within the docker imagenextflow --<key>=<value>
)KEY=VALUE nextflow ...
)That
expected_results
field looks something like this:tool-Nextflow-action/run-nextflow-tests/recalibrate-bam.json
Lines 38 to 810 in c890484
Due to our metaprogramming madness many of the
memory
andcpu
values underprocess
are closures that return different results depending upontask.attempt
- in those cases the reported value is a map containing the closure code itself (heavy caveats) and the computed values for attempts 1, 2, and 3:tool-Nextflow-action/run-nextflow-tests/recalibrate-bam.json
Lines 605 to 612 in c890484
Current Usage
The
entry.py
script compares the bundled tests against a local checkout of the recalibrate BAM pipeline. With a clean checkout that goes well (all examples are running on my mac laptop, detached from the cluster):After I make a small change to the pipeline...
When differences are found, the script saves a complete test file as "[testfile]-out.json", which makes it easy to copy over the original if the changes were intended.
Internals
The key steps are:
includeConfig
snextflow config -properties
Docker Image
This tool runs in a modified version of the nextflow/nextflow:23.10.0 image, and can be updated if desired.
The key changes are to download and include mockito on the classpath, and to swap the JAR entrypoint for the
nextflow
script fromnextflow.cli.Launcher
togroovy.ui.GroovyMain
(what you get if you callgroovy
from the command line).This is definitely hackish, but the goal is to execute an arbitrary groovy script with exactly the same JVM arguments and settings as
nextflow
. Thenextflow
script does some tricky business, caching all of those arguments in/.nextflow/tmp/launcher/nextflow-one_${NEXTFLOW_VERSION}/buildkitsandbox/classpath-${NEXTFLOW_MD5}
, where${NEXTFLOW_MD5}
is the hash of a bunch of environment variables. I'd prefer to use$NXF_CLASSPATH
rather thansed
to splice in the extra JARs, but doing so would result in an unpredictable launcher and make it harder to swap out the entrypoint.Runtime Container
At runtime, this tool creates a container from the image and bind-mounts:
NamedTemporaryFile
to each "empty file"/mnt/bl_tests
Within that temporary directory, it creates:
docker_test.config
- the temporary config file described belowcli_params.json
- A JSON file of the Nextflow command-line parameterstest_mocks.json
- A JSON file of the test-specific method mocksThe container also gets the following environment variables:
BL_PIPELINE_DIR
- Path to the pipelineBL_CONFIG_FILE
- The container path todocker_test.config
BL_MOCKS_FILE
- The container path totest_mocks.json
BL_CLI_PARAMS_FILE
- The container path tocli_params.json
BL_PARAMS_FILE
- The container path to the "params file" (if set)Temporary Config File
The generated config file looks something like this:
The two key parts here:
Groovy Script
Here's the main madness. This script is a chimera of
nextflow config
and a portion ofnextflow run
with some major changes:BL_*
environment variables and injected into thenextflow.cli.CliOptions
andnextflow.cli.CmdRun
objects in the same way that they would have been using CLI parameters.ConfigObject
(where all of our config code is executed) is wrapped in an Interceptor that gets to spy on every method call. If any of those method names are in the tests's "mocks", the actual method is ignored and the pre-computed result is returned.At this point the
ConfigObject
is as complete as Nextflow would ever make it. However, if it were to be serialized at this point every closure would be represented likeScript1E713595D312C8FDC3D2D9EDAF11D59D$_run_closure1$_closure11@65293ca5
, which is not ideal.To solve that I take one more step and walk over the
process
block with another Interceptor. During that process I take the following steps:process
block), cache their name (e.g.withLabel:process_medium
) in the interceptor while traversing their children.task
.closure
,1
,2
, and3
mapping to results of evaluating the closure, with some serious skullduggery to interceptConfigObject.get("task")
to inject the value[process: <saved name>, attempt: '???', cpus: '$task.cpus']
(making the closure not-so-closed). The value forcpus
is the literal string'$task.cpus'
, but the value forattempt
varies:closure
,attempt
is set to the literal string'$task.attempt'
. While evaluating the closure any calls to methods namedcheck_limits
orretry_updater
are mocked out for strings like"$name(${args.join(', ')})"
(e.g."retry_updater(12, add, 0, $task.attempt, cpus)"
).1
,2
, and3
,attempt
is set to that integer and the closure is evaluated without any other shenanigans.cpus = { methods.check_limits( 1 * task.attempt, 'cpus' ) }
.1 * task.attempt
is evaluated with the static Java methodjava.lang.Integer.multiply
and can't be intercepted without mockito, so at this point I just bail and return"closure()"
.Finally, the fully-rendered configuration is printed to stdout in Java Properties File format.
Parsing and Diffing
Outside of the docker container, the groovy output is split using the sentinel value and parsed into a dictionary. A lot of junk shows up here - seemingly any variable defined without a
def
(example, another example) bubbles up to the global namespace. For sanity I'm not parsing thatjson_object
variable, but we should just add thedef
onto it.Once parsed, I pop off any configuration-only namespaces we've defined, as they're just noise:
tool-Nextflow-action/run-nextflow-tests/configtest.py
Lines 224 to 236 in f8210ca
Finally, I generate a list of differences, using
re.sub
to remove any dates from the "dated_fields".Where to go from here
As-is, this is not an Action. However, it is shockingly complicated, and before I go any further I need feedback from the lot of you. Feel free to dive into any aspect of this, but here are some specific questions:
nf-configtest-*.json
files (or something) in the repository. That would allow each test file to be independent.