-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues #80 and #81: reduce 'mclapply' memory footprint; add "num_cores" option #94
Conversation
…dded sleuth_prep option to select number of cores for mclapply
…ons for num_cores to throw informative error
…trix did not have dim names that matched the formula or the sample ids
This p/r is awesome. |
Hi Rory! Thank you for your kind words! It means a lot to me as a longtime user but occasional and novice contributor. As an addendum, I'll also note to the sleuth authors that I realized that the 'give a design matrix' test issue on the main branch was already fixed in the |
@warrenmcg thank you very much for this! nice work! the gene aggregation stuff is currently undergoing some pretty massive changes in terms of memory usage which will be done probably by tomorrow. I will review it carefully by the end of the week. same goes for your other PR. thanks again, harold |
Thank you @pimentel! I saw on some of the other threads that you were out for medical leave. I hope you're feeling better! Excellent work on the tool overall, and for the improvements made to it since its release! I began to make my own changes to the code to try to accommodate the gene_aggregation and the devel branch changes for my own work, so I'm excited to hear that you're almost finished with the merge! Best, |
Because of my updated code, reconciling the gene aggregation and memory usage enhancements, I've decide to close this pull request in favor of that one. Harold, let me know if you wish to focus on this one, in case you have a different set of changes to reconcile the two aspects of sleuth. |
Thanks for the heads up Warren. |
Hi,
NOTE: This is the same pull request as #93, but I made the mistake of using my master branch instead of a dedicated branch, and unrelated changes got included. I corrected that mistake here.
Below is the same message I included with #93:
This is to address issues #80 and #81 that I opened.
For issue #80, gene_summary uses a nested lapply, with the first one using mclapply. This meant that every full bootstrap object was copied to each core. With more than a few samples, this leads to a very large memory footprint (20 samples failed on a machine with 120+ GB of RAM). Using the blogpost I mentioned in #80 as a starting point (link), I switched the use of mclapply to the second nested lapply, so that only individual bootstraps will be sent out to each core. This significantly reduces the memory footprint.
For issue #81, I wanted to take full advantage of a machine that had more than 2 cores, so I simply added the option to gene_summary and then to sleuth_prep, updated the documentation for sleuth_prep, and added test conditions to make sure the choice of num_cores was reasonable, throwing an informative error if it is not.
Finally, I followed your contribution guidelines to make the code lintr clean. In the process of making sure my updated code passed all of your tests, I found that one of your tests ("give a design matrix") had a bug, so I fixed it to make sure it worked.
Hope this helps!
Warren