Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add method to store params #43

Merged
merged 7 commits into from
Dec 13, 2023
Merged

Add method to store params #43

merged 7 commits into from
Dec 13, 2023

Conversation

j2salmingo
Copy link
Contributor

Adds a function to dump the params object into a json file for later analysis.
Note: I was not able to add the method as a separate class with a static method, nor was I able to just define the method in the config file, and so I added it as a closure.

  • I have read the code review guidelines and the code review best practice on GitHub check-list.

  • I have reviewed the Nextflow pipeline standards.

  • The name of the branch is meaningful and well formatted following the standards, using [AD_username (or 5 letters of AD if AD is too long)]-[brief_description_of_branch].

  • I have set up or verified the branch protection rule following the github standards before opening this pull request.

  • I have added my name to the contributors listings in the
    metadata.yaml and the manifest block in the nextflow.config as part of this pull request, am listed
    already, or do not wish to be listed. (This acknowledgement is optional.)

  • I have added the changes included in this pull request to the CHANGELOG.md under the next release version or unreleased, and updated the date.

  • I have updated the version number in the metadata.yaml and manifest block of the nextflow.config file following semver, or the version number has already been updated. (Leave it unchecked if you are unsure about new version number and discuss it with the infrastructure team in this PR.)

  • I have tested the config being added or modified. Outline the tests below.

  • RNA A-mini-N2

    • sample: RNA A-mini-N2
    • config: /hot/software/pipeline/pipeline-call-RNAEditingSite/Nextflow/development/unreleased/jsalmingo-check-param-dump/call-RNAEditingSite-A-mini-n2-test.config
    • output: /hot/software/pipeline/pipeline-call-RNAEditingSite/Nextflow/development/unreleased/jsalmingo-check-param-dump/output
    • benchmarking: /hot/software/pipeline/pipeline-call-RNAEditingSite/Nextflow/development/unreleased/jsalmingo-check-param-dump/output/call-RNAEditingSite-6.0.0/CPCG0196-F1-A-mini-n2-strand-correct-false/log-call-RNAEditingSite-6.0.0-20231030T225336Z/nextflow-log/trace.txt

@j2salmingo j2salmingo self-assigned this Oct 30, 2023
Copy link
Member

@nwiltsie nwiltsie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functionality looks great!

It took me a bit to figure out that I needed to pass a File object and not a path string, though - this could use a documentation snippet, particularly if we want to standardize how we use this. Modifying a pipeline's methods.config like so worked well for me and seems like a reasonable standard:

--- a/config/methods.config
+++ b/config/methods.config
@@ -1,5 +1,6 @@
 includeConfig "${projectDir}/external/pipeline-Nextflow-config/config/methods/common_methods.config"
 includeConfig "${projectDir}/external/pipeline-Nextflow-config/config/schema/schema.config"
+includeConfig "${projectDir}/external/pipeline-Nextflow-config/config/store_params/store_params.config"

 methods {
     // Set the output and log output dirs here.
@@ -68,5 +69,10 @@ methods {
         methods.set_resources_allocation()
         methods.set_pipeline_logs()
         methods.set_env()
+
+        json_extractor.store_object_json(
+            params,
+            new File("${params.log_output_dir}/parameters.json")
+        )
     }
 }

Copy link
Member

@zhuchcn zhuchcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wonderful! We should have done this already! Just wondering if it's necessary to also store the resources being used? That will be the process namespace. Also, the manifest namespace could be usful too, to figure out the pipeline version being used.

@yashpatel6
Copy link
Collaborator

This is wonderful! We should have done this already! Just wondering if it's necessary to also store the resources being used? That will be the process namespace. Also, the manifest namespace could be usful too, to figure out the pipeline version being used.

We definitely would want to store those, which should be possible with the function to save being general enough to accept any namespace to dump

@nwiltsie
Copy link
Member

This is wonderful! We should have done this already! Just wondering if it's necessary to also store the resources being used? That will be the process namespace. Also, the manifest namespace could be usful too, to figure out the pipeline version being used.

Hmm, params isn't the only namespace we care about? Where have I heard that before...?

More seriously, manifest works great but I'm running into a weird issue when trying to serialize process:

Nov-22 11:48:25.983 [main] ERROR nextflow.cli.Launcher - Unable to parse config file: '/hot/user/nwiltsie/pipelines/pipeline-filter-RNAEditingSite/./test/nftest.config'

  No signature of method: java.lang.Integer.multiply() is applicable for argument types: (ConfigObject) values: [[:]]
  Possible solutions: multiply(java.lang.Number), multiply(java.lang.Character)

groovy.lang.MissingMethodException: No signature of method: java.lang.Integer.multiply() is applicable for argument types: (ConfigObject) values: [[:]]
Possible solutions: multiply(java.lang.Number), multiply(java.lang.Character)
	at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.unwrap(ScriptBytecodeAdapter.java:70)
	at org.codehaus.groovy.runtime.callsite.PojoMetaClassSite.call(PojoMetaClassSite.java:46)
	at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:139)
	at ScriptC5BBA90DD6D0E6A6F0C9883ADD3598DC$_run_closure1$_closure2.doCall(ScriptC5BBA90DD6D0E6A6F0C9883ADD3598DC:6)
	at ScriptC5BBA90DD6D0E6A6F0C9883ADD3598DC$_run_closure1$_closure2.doCall(ScriptC5BBA90DD6D0E6A6F0C9883ADD3598DC)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
	at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:274)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1035)
	at groovy.lang.Closure.call(Closure.java:412)
	at groovy.lang.Closure.call(Closure.java:406)
	at groovy.json.JsonDelegate.cloneDelegateAndGetContent(JsonDelegate.java:89)
	at groovy.json.DefaultJsonGenerator.writeObject(DefaultJsonGenerator.java:214)
	at groovy.json.DefaultJsonGenerator.writeMapEntry(DefaultJsonGenerator.java:381)
	at groovy.json.DefaultJsonGenerator.writeMap(DefaultJsonGenerator.java:369)
	at groovy.json.DefaultJsonGenerator.writeObject(DefaultJsonGenerator.java:200)
	at groovy.json.DefaultJsonGenerator.writeObject(DefaultJsonGenerator.java:164)
	at groovy.json.DefaultJsonGenerator.toJson(DefaultJsonGenerator.java:98)

I'm assuming this is due to there being closures within in process namespace.

@yashpatel6
Copy link
Collaborator

Hmm probably with the closures yeah, I think we'll want to dump the process namespace before any processing that generates those closures

@j2salmingo
Copy link
Contributor Author

It took me a bit to figure out that I needed to pass a File object and not a path string, though - this could use a documentation snippet, particularly if we want to standardize how we use this. Modifying a pipeline's methods.config like so worked well for me and seems like a reasonable standard:

That's a good point, since I just looked at the top README.md and noticed that there is documentation for each config module. I'll be sure to add it before resubmitting this PR for review.

@j2salmingo j2salmingo requested a review from nwiltsie December 2, 2023 01:05
@j2salmingo
Copy link
Contributor Author

The readme has been added, and the name of the function changed to reflect the new functionality.

Copy link
Member

@nwiltsie nwiltsie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done! The README usage template worked great on a clean repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants