The code in this repository transforms the data from dqa_library into a format more easily read by dqa_shiny to reduce computation required to display visualizations on the PEDSnet Data Quality Dashboard and to generate threshold-based performance measures to feed back to individual institutions.
The code was developed on R version 4.2.0 (2022-04-22). To execute the R code in this repository, users will need to install the packages named in the top lines of driver.R.
The data is expected to be in the format of the output from the dqa_library step of the PEDSnet DQA process.
- Set up configurations for execution of PEDSnet standardized R framework code in run.R and site_info.R, including setting up srcr .json config file to successfully establish connection to the database containing the DQA results.
- Edit run.R:
config('results_schema', 'dqa_rox')
: change 'dqa_rox' to the name of the schema containing the DQA outputconfig('new_site_pp',FALSE)
: if running against all sites, set toFALSE
. If there is already output for some sites and you want to add in another site, set toTRUE
config('results_schema_other', NA)
: if running against all sites, set toNA
. If there is already output for some sites and you want to add in another site, set to the name of the results schema containing the output of dqa_library for the other siteconfig('results_name_tag', NA)
: if the suffix on the tables output from dqa_library is anything other than the default, change to the suffix on the tables. By default, set toNA
to expect no suffix on the table namesconfig('current_version', 'vxx')
: change 'vxx' to the name of the current version of the data. Should match the name assigned in the DQA library step in the column database_version in the outputconfig('previous_version', 'vyy')
: change 'vyy' to the name of the previous version of the data. Should match the name assigned in the DQA library step in the column database_version in the output
- Edit site_info.R:
config('db_src')
should set up the connection to the database containing the schema with the data output from dqa_library where you will also output the results from processing.- You will also need to establish a connection with the database containing output from
dqa_redcap
, specifically the tabledqa_issues_redcap
.config('db_src_prev')
will do this for you, but you need to make sure the connection information is either in a file namedconfig_dqa_prev.json
or, if it is contained in a file with another name, the environmental variablePEDSNET_DB_SRC_CONFIG_BASE_PREV
is set to the name of the file. This can be done either in the console or in your .Rprofile. It is assumed that the results from the previous DQ run are in a schema nameddqa_rox
. If this is not the case, this needs to be edited in the call to generate redcap_prev in driver.R
- Either set
config('execution_mode', '')
toproduction
in run.R or set todevelopment
and highlight the contents within .run{} in driver.R
The processing steps are executed through driver.R, driver_thresholds.R, driver_anon.R, and driver_large_n.R, which should be run sequentially. The purposes of each file are:
- Generates at least one table for each table output from dqa_library, containing a version of the data with post-processing steps applied, and outputs each table with the suffix
pp
. - The tables with app
suffix are the ones accessed by the dashboard code
- Establishes thresholds to apply to the DQ output based on standard PEDSnet thresholds or a site-specific threshold that has been established, if one exists.
- Generates a version of each of the
pp
tables in a format in which the thresholds should be applied and outputs each table with the prefixthr
. - Applies the thresholds established to the tables generated and outputs violations that were not previously indicated as issues to stop raising. An indicator for whether to continue or stop raising a consistent issue across cycles is pulled from the REDCap review form.
- Generates and tracks a history of threshold values.
- Creates a random site identifier that is consistent across all data output and creates the columns site_anon (char) and sitenum (int) in each of the
pp
tables.
- Computes summary statistics for each of the tables and outputs tables suffixed
ln
that contain the output in a readable format for the dashboard. Contains the masked identifiers if it is run downstream of driver_anon.R