Issues #80 and #81: reduce 'mclapply' memory footprint; add "num_cores" option #94

warrenmcg · 2017-03-14T01:38:23Z

Hi,

NOTE: This is the same pull request as #93, but I made the mistake of using my master branch instead of a dedicated branch, and unrelated changes got included. I corrected that mistake here.

Below is the same message I included with #93:

This is to address issues #80 and #81 that I opened.

For issue #80, gene_summary uses a nested lapply, with the first one using mclapply. This meant that every full bootstrap object was copied to each core. With more than a few samples, this leads to a very large memory footprint (20 samples failed on a machine with 120+ GB of RAM). Using the blogpost I mentioned in #80 as a starting point (link), I switched the use of mclapply to the second nested lapply, so that only individual bootstraps will be sent out to each core. This significantly reduces the memory footprint.

For issue #81, I wanted to take full advantage of a machine that had more than 2 cores, so I simply added the option to gene_summary and then to sleuth_prep, updated the documentation for sleuth_prep, and added test conditions to make sure the choice of num_cores was reasonable, throwing an informative error if it is not.

Finally, I followed your contribution guidelines to make the code lintr clean. In the process of making sure my updated code passed all of your tests, I found that one of your tests ("give a design matrix") had a bug, so I fixed it to make sure it worked.

Hope this helps!
Warren

…dded sleuth_prep option to select number of cores for mclapply

…ons for num_cores to throw informative error

…trix did not have dim names that matched the formula or the sample ids

roryk · 2017-03-14T02:23:01Z

This p/r is awesome.

warrenmcg · 2017-03-14T04:50:12Z

Hi Rory! Thank you for your kind words! It means a lot to me as a longtime user but occasional and novice contributor.

As an addendum, I'll also note to the sleuth authors that I realized that the 'give a design matrix' test issue on the main branch was already fixed in the devel branch, so I just fixed mine to match the changes in the devel branch.

pimentel · 2017-03-15T22:37:38Z

@warrenmcg thank you very much for this! nice work!

the gene aggregation stuff is currently undergoing some pretty massive changes in terms of memory usage which will be done probably by tomorrow. I will review it carefully by the end of the week.

same goes for your other PR.

thanks again,

harold

warrenmcg · 2017-03-16T00:54:41Z

Thank you @pimentel!

I saw on some of the other threads that you were out for medical leave. I hope you're feeling better! Excellent work on the tool overall, and for the improvements made to it since its release!

I began to make my own changes to the code to try to accommodate the gene_aggregation and the devel branch changes for my own work, so I'm excited to hear that you're almost finished with the merge!

Best,
Warren

warrenmcg · 2017-04-22T23:28:58Z

Hi @pimentel and @roryk,

Because of my updated code, reconciling the gene aggregation and memory usage enhancements, I've decide to close this pull request in favor of that one. Harold, let me know if you wish to focus on this one, in case you have a different set of changes to reconcile the two aspects of sleuth.

roryk · 2017-04-23T14:26:21Z

Thanks for the heads up Warren.

warrenmcg added 5 commits March 13, 2017 07:31

modified mclapply usage in gene_summary to reduce memory footprint; a…

932fc47

…dded sleuth_prep option to select number of cores for mclapply

corrected default value for num_cores, and strengthened check conditi…

90b5b99

…ons for num_cores to throw informative error

fixed 'give a design matrix' test, which failed because the design ma…

161d231

…trix did not have dim names that matched the formula or the sample ids

now lintr clean

66cea3a

clean a few lints that I missed

c321f90

change 'give a design matrix' test to match changes on 'devel' branch

d6d1d2c

warrenmcg mentioned this pull request Apr 20, 2017

Reconcile Gene Aggregation Feature with Memory Footprint Overhaul #99

Merged

warrenmcg closed this Apr 22, 2017

pimentel mentioned this pull request May 29, 2017

Release v0.29.0 #110

Merged

warrenmcg deleted the issue80_81 branch June 3, 2017 03:33

cgpu mentioned this pull request May 7, 2020

Adds updated exclusion criterion for transcripts TheJacksonLaboratory/nf-core-rnaseq#3

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues #80 and #81: reduce 'mclapply' memory footprint; add "num_cores" option #94

Issues #80 and #81: reduce 'mclapply' memory footprint; add "num_cores" option #94

warrenmcg commented Mar 14, 2017

roryk commented Mar 14, 2017

warrenmcg commented Mar 14, 2017

pimentel commented Mar 15, 2017

warrenmcg commented Mar 16, 2017 •

edited

Loading

warrenmcg commented Apr 22, 2017

roryk commented Apr 23, 2017

Issues #80 and #81: reduce 'mclapply' memory footprint; add "num_cores" option #94

Issues #80 and #81: reduce 'mclapply' memory footprint; add "num_cores" option #94

Conversation

warrenmcg commented Mar 14, 2017

Below is the same message I included with #93:

roryk commented Mar 14, 2017

warrenmcg commented Mar 14, 2017

pimentel commented Mar 15, 2017

warrenmcg commented Mar 16, 2017 • edited Loading

warrenmcg commented Apr 22, 2017

roryk commented Apr 23, 2017

warrenmcg commented Mar 16, 2017 •

edited

Loading