Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New package architecture #85

Merged
merged 163 commits into from
Mar 2, 2023
Merged

New package architecture #85

merged 163 commits into from
Mar 2, 2023

Conversation

joshwlambert
Copy link
Member

This pull request contains a large number of changes influencing the handling of epidemiological data, the API of the epidist class and its methods, as well as work on numerous helper functions.

Changes to the {epiparameter} library:

  • The library has several new categories
    • both disease and pathogen in library Adding disease and pathogen genus to database #63
    • to account for the various ways in which parameters, summary statistics and uncertainty can be reported.
    • There are columns for methodological criteria used when inferring the distribution (e.g. censored, right-truncated).

Changes to the epidist class:

Addition of an epiparam class:

  • In the old code the epiparameter library could be read in and viewed through several functions (epidist(), list_distributions() and pathogen_summary()), but there was no way to handle the library and manipulate it in custom ways. The epiparam class is a data structure for the library and is a subclass of a data.frame to provide printing and subsetting methods. Add epiparam class #74
  • The epiparam class follows the structure of having a constructor, helper and validator function.

Addition of a vb_epidist class:

  • In order to maintain all the functionality of the epidist class for vector-borne diseases in which there is an intrinsic and extrinsic delay (e.g. incubation period) the vb_epidist class provides a consistent interface by wrapping around the epidist class. Accomodate vector-borne diseases in database #50
  • The vb_epidist class has the same probability distribution methods as the epidist class.
  • The vb_epidist class follows the structure of having a constructor, helper and validator function.

Addition of JSON validation on library:

  • There is a new github action workflow to validate the epiparameter data.
  • There are helper functions in utils.R to help update the JSON data and JSON schema from the data dictionary yaml file and library csv file. Write data dictionary #64

Addition of conversion functions:

  • Negative binomial
  • Geometric

There are class conversion functions (as_epidist() and as_epiparam()). There is also a bind_epiparam() function that allows all classes defined in the {epiparameter} package to be bound to an existing epiparam object.

Helper functions for constructing objects have been added to aid in supplying information manually when constructing an epidist object. For example create_epidist_citation() and create_epidist_metadata().

Two exported functions, calc_dist_params() and create_prob_dist() are added which are mainly used internally but are exported in case users may find them helpful.

Added data protocol vignette that describes the process of building the {epiparameter} library. #68

The pathogen_summary() function has been removed as it's functionality was no longer deemed necessary.

Documentation and testing have been updated in line with the above-mentioned developments.

joshwlambert and others added 30 commits January 31, 2023 15:38
… and skipped tests to pass check ready for WIP refactor
Copy link
Collaborator

@pratikunterwegs pratikunterwegs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @joshwlambert, looks good to me overall, and happy to approve. There are a few open conversations though, which seem to be optional suggestions, and I would recommend converting them to new issues to be addressed in a different PR.

Putting down a few things I noticed which might be good to convert to issues or keep in mind for future fixes:

  1. The vignette chunks are set to eval = FALSE since the functions are out of date with the current PR - something to keep in mind.
  2. There are over 1,300 tests which is potentially unwieldy - suggest eventually compressing tests - for example, a single key-value pair comparison (via a snapshot) for the column types in the parameter database, rather than a single expect for each column.
  3. L.209 in epiparam_utils.R has a hardcoded number of columns.
  4. Might be good to revisit the function get_percentiles() - haven't tested it but getting names and converting to decimal values, as well as determining the upper and lower percentile values seems a little hacky at the moment - am I right in thinking one value has to be > 50 and one < 50? An understandable choice, but not mentioned in the documentation. These lines in the tests suggests this is the case:
    res <- get_percentiles(c(q_05 = 1, q_45 = 10))
    expect_identical(res, NA)
    .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants