sleuth "Advanced options" #168

warrenmcg · 2018-04-18T03:29:43Z

Hi @pimentel,

Here are my suggested changes to the sleuth API and Documentation:

New arguments

Add filter_target_id option to sleuth_prep, to allow users to specify a list of target_ids to filter using some independent method. This is recommended if the preferred filtering method requires a matrix-wide transformation (e.g. edgeR's CPM filter) or otherwise requires assessing multiple features simultaneously, since sleuth_prep filtering step is built to only assess features one at a time.
Add normalize option to sleuth_prep that allows the user to skip the normalization steps (and all subsequent steps) if set to FALSE. This severely reduces the functionality of the sleuth object for most downstream applications, but could be useful in certain situations (e.g. quickly checking a custom filter; quickly checking the raw data or summary of the kallisto objects; etc).
Add weight_func option to sleuth_results, to specify a custom weighting function that acts on the mean observations for each transcript when doing p-value aggregation.

Changed API

Changed API of transformation_function and transformation_function_tpm to transform_fun_counts and transform_fun_tpm, respectively, for clarity and brevity.
Greatly simplify the exposed API for sleuth_prep. Move several features to "advanced options" using ... and the Details section of the documentation for details. The following features were moved: filter_target_id, filter_fun, norm_fun_counts, norm_fun_tpm, extra_bootstrap_summary, read_bootstrap_tpm, max_bootstrap, transform_fun_counts, and transform_fun_tpm.
Simplified the exposed API for sleuth_fit. Moved which_var and the sliding_window_grouping extra options to ... with details in the Details section of the documentation.
Simplified the exposed API for sleuth_results, with the weight_func hidden in ..., but described in detail in the Details section of the documentation.

Other small changes:

Deprecate bs_sigma_summary, since it assumes that the bootstraps are summarized using the method in versions <= 0.28.1.
Add several sanity checks.

…ed using an independent filtering method

+ this allows the use different weights with the observed means of transcripts for the lancaster method + this prevents errors when a sleuth-ALR transformation is used

+ Changed API for 'transformation_fun' and 'transformation_fun_tpm' to 'transform_fun_counts' and 'transform_fun_tpm' respectively + Added public API for the 'norm_fun_counts' and 'norm_fun_tpm' so that users can see how the data was normalized when viewing a sleuth object. + Added error handling if the user attempts to change 'norm_fun_counts' or 'norm_fun_tpm' manually. + Added new 'normalize' boolean to skip the normalization steps, which also skips the rest of the downstream processing (bootstrap summarization, transformation, etc.) + Moved several of the sleuth_prep options to a new section of 'advanced options'. These are now handled by the '...' argument. This includes options for summarizing the bootstraps ('read_bootstrap_tpm', 'extra_bootstrap_summary', 'max_bootstrap'), normalizing the data ('normalize' boolean, 'norm_fun_counts', 'norm_fun_tpm'), transforming the data ('transform_fun_counts', 'transform_fun_tpm'), and the old 'gene_mode' for counts aggregation. Followed the example for advanced options used by the 'polyester' package for its 'simulate_experiment' function. + Added sanity checks for the mutually exclusive 'gene_mode' & 'pval_aggregate' modes for gene-level aggregation. 'pval_aggregate' is the default mode if 'aggregation_column' is set. If the user tries to change either gene_mode or pval_aggregate manually, they receive warning if these two modes conflict and if 'gene_column' has not been set. + Changed how the sleuth object handles when 'transform_fun_counts' or 'transform_fun_tpm' are changed manually. Now it throws an error if nothing has been fit, preventing the user from changing the listed transformation function. This is so users can always see how the data was transformed when viewing fits within a sleuth object.

…fit'

+ Specifies documentation for the 'which_var' argument. + Adds explicit documentation for the additional options to 'sliding_window_grouping': 'n_bins', 'lwr', and 'upr'.

+ Discuss the interpretation of the 'b' value from Wald test results. + Discuss the warning if gene aggregation is done with transcript-level target_mappings. + Discuss the two aggregation modes. + Discuss the advanced option 'weight_func' for weighting the lancaster method. + Add the expected columns to the specification of the results if a user does 'pval_aggregate = TRUE'

…mentation

+ now those packages can move to the 'imports' section of the DESCRIPTION + this addresses issue pachterlab#56

…cter columns + this prevents warnings introduced when the supplied target_mapping had factors instead + this handles bugs seen in issues pachterlab#76 and pachterlab#169

pimentel · 2018-06-03T18:45:54Z

JFC, @warrenmcg. This is a massive PR. I didn't think you were going to go through all of this so carefully -- thanks for that. It looks great!

pimentel · 2018-06-03T18:46:52Z

PS: thanks for dealing with the import stuff -- this had been on my mental TODO list forever.

warrenmcg added 8 commits April 11, 2018 17:06

add sanity checks for the filter_fun option in sleuth_prep

537dccd

add option to filter observations just by a list of target IDs obtain…

a28f6df

…ed using an independent filtering method

add weight_func option for sleuth_results:

42934b1

+ this allows the use different weights with the observed means of transcripts for the lancaster method + this prevents errors when a sleuth-ALR transformation is used

add sanity checks based on 'check_norm_status' for plots and 'sleuth_…

bc214c7

…fit'

update NAMEPSACE

ba3568d

create 'advanced options' section for the 'sleuth_fit' function:

ee51c06

+ Specifies documentation for the 'which_var' argument. + Adds explicit documentation for the additional options to 'sliding_window_grouping': 'n_bins', 'lwr', and 'upr'.

warrenmcg requested a review from pimentel April 18, 2018 03:29

warrenmcg added 4 commits April 23, 2018 10:39

avoid warning when checking if 'normalize = TRUE'

682eaac

add encoding info to description to prevent errors when updating docu…

47ba1fb

…mentation

remove hard dependencies on dplyr and ggplot2

7e7662f

+ now those packages can move to the 'imports' section of the DESCRIPTION + this addresses issue pachterlab#56

force the 'target_mapping' to be a treated as a data.frame with chara…

f8aba01

…cter columns + this prevents warnings introduced when the supplied target_mapping had factors instead + this handles bugs seen in issues pachterlab#76 and pachterlab#169

lynnyi closed this Apr 30, 2018

lynnyi reopened this May 8, 2018

pimentel merged commit 29c0a01 into pachterlab:devel Jun 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sleuth "Advanced options" #168

sleuth "Advanced options" #168

warrenmcg commented Apr 18, 2018

pimentel commented Jun 3, 2018

pimentel commented Jun 3, 2018

sleuth "Advanced options" #168

sleuth "Advanced options" #168

Conversation

warrenmcg commented Apr 18, 2018

New arguments

Changed API

Other small changes:

pimentel commented Jun 3, 2018

pimentel commented Jun 3, 2018