Skip to content

Commit

Permalink
Document and finalise datasets
Browse files Browse the repository at this point in the history
  • Loading branch information
MiguelRodo committed Oct 19, 2021
1 parent ff8d2fd commit 1227790
Show file tree
Hide file tree
Showing 28 changed files with 5,156 additions and 9 deletions.
1 change: 1 addition & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
^.*\.Rproj$
^\.Rproj\.user$
^LICENSE\.md$
26 changes: 19 additions & 7 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,15 +1,27 @@
Package: DataTidyGelaDURT
Type: Package
Title: What the Package Does (Title Case)
Title: Process and package data for Gela (2021)
Version: 0.1.0
Author: Who wrote it
Maintainer: The package maintainer <yourself@somewhere.net>
Description: More about what it does (maybe more than one line)
Use four spaces when indenting paragraphs within the Description.
License: What license is it under?
Author: Miguel J Rodo
Maintainer: Miguel J Rodo <rdxmig002@myuct.ac.za>
Description: Provide data in rda and csv formats
that were used to obtain results in
Gela (2021) - Effects of BCG vaccination
on donor unrestricted T cells in humans.
License: CC BY 4.0
Encoding: UTF-8
LazyData: true
Imports:
magrittr
datautils (>= 0.1.0),
dplyr,
magrittr,
purrr,
readr,
stringr,
tidyr,
zip
Depends:
R (>= 2.10)
RoxygenNote: 7.1.1
Remotes:
MiguelRodo/datautils
395 changes: 395 additions & 0 deletions LICENSE.md

Large diffs are not rendered by default.

3 changes: 2 additions & 1 deletion NAMESPACE
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
exportPattern("^[[:alpha:]]+")
# Generated by roxygen2: do not edit by hand

25 changes: 25 additions & 0 deletions R/data_tidy_clin.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#' @rdname clinical_data
#'
#' @title Clinical data
#'
#'
#' @description Datasets containing clinical data for infants (\code{data_tidy_clin_infant})
#' and adults (\code{data_tidy_clin_adult}). The infant dataset
#' has one extra variable, \code{bcg}, indicating BCG status.
#'
#' @format Dataframes with the following variables
#' \describe{
#' \item{pid}{character. Participant ID.}
#' \item{age}{"infant" or "adult". Age category of participant.}
#' \item{race}{"black" or "non_black". Race of participant.
#' Vast majority of non-black participants
#' are mixed race.}
#' \item{sex}{"male" or "female". Sex of participant.}
#' \item{bcg}{"bcg" or "no bcg". BCG status. Infants only.}
#' }
#'
#' @aliases data_tidy_clin_infant,data_tidy_clin_adult,data_tidy_clin,clinical_data,data_clinical
"data_tidy_clin_infant"

#' @rdname clinical_data
"data_tidy_clin_adult"
25 changes: 25 additions & 0 deletions R/data_tidy_freq.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#' @rdname data_tidy_freq
#'
#' @title Major cell population frequencies
#'
#' @description Frequencies of major cell populations
#'
#' @format Dataframe with 720 rows and the following 5 columns:
#' \describe{
#' \item{pid}{character. Participant ID.}
#' \item{bcg}{"bcg", "no bcg", "before" or "after".
#' BCG status for infants ("no bcg" -> delayed BCG; "bcg" -> BCG at birth)
#' and adults ("before" -> before BCG revaccination; "after" -> 35 days
#' after BCG revaccination).}
#' \item{pop}{character. Cell type. "mait" is MAIT cells,
#' "nkt" is NKT cells, "cd1b" is CD1b GMM T cells,
#' "gem" is GEM T cells, "gd" is TCRgd T cells,
#' "cd4" is CD4 T cells and "cd4_ifng" is
#' IFNg+ CD4 T cells.}
#' \item{freq}{numeric. Frequency of parent population, where
#' the parent population is all cells for all population types
#' except IFNg+ CD4 T cells, where the parent population is
#' CD4 T cells.}
#' }
#'
"data_tidy_freq"
32 changes: 32 additions & 0 deletions R/data_tidy_freq_ifng.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#' @rdname data_tidy_freq_ifng
#'
#' @title Frequencies of IFNg+ cells
#'
#' @description Background-subtracted frequencies of
#' IFNg+ cells.
#'
#' @format Dataframe with 284 rows and the following 5 columns:
#' \describe{
#' \item{pid}{character. Participant ID.}
#' \item{bcg}{"bcg", "no bcg".
#' BCG status for infants ("no bcg" -> delayed BCG; "bcg" -> BCG at birth).}
#' \item{pop}{character. Major cell type. "cd26_161" is CD26+CD161+ cells,
#' and "gd" is TCRgd T cells.}
#' \item{pop_sub}{character.
#' Further phenotyping of major cell type.
#' "-" means no further phenotyping,
#' "cd8[n/p]cd4[n/p]" means
#' CD8(-/+)CD4(-/+), and
#' "trav12[n/p]" means TRAV1-2(-/+).}
#' \item{freq_ifng}{numeric.
#' Frequency of parent population (). For all
#' populations except "trav12[n/p]", the parent population is
#' specified by \code{pop} and \code{pop_sub}, and
#' the frequency is for all cells that are IFNg}.
#' Where the \code{pop_sub} is "trav12[n/p]",
#' the parent population is
#' CD4+CD26+CD161+IFNg+ T cells, and the frequency is
#' for all cells that are TRAV1-2(-/+).}
#' }
#'
"data_tidy_freq_ifng"
21 changes: 21 additions & 0 deletions R/data_tidy_hladr_adult.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#' @rdname data_tidy_hladr_adult
#'
#' @title HLADR MFI by cell type for adults
#'
#' @description HLADR mean fluorescent intensity (MFI) by cell type
#' for infants.
#'
#' @format Dataframe with 315 rows and the following 4 columns:
#' \describe{
#' \item{pid}{character. Participant ID.}
#' \item{days}{integer. Days since BCG revaccination.
#' Day 0 corresponds to on the day of revaccination, but before
#' revaccination.}
#' \item{pop}{character. Cell type. "mait" is MAIT cells,
#' "nkt" is NKT cells, "cd1b" is CD1b GMM T cells,
#' "gd" is TCRgd T cells,
#' "cd4" is CD4 T cells.}
#' \item{hladr_mfi}{numeric. HLADR MFI.}
#' }
#'
"data_tidy_hladr_adult"
21 changes: 21 additions & 0 deletions R/data_tidy_hladr_infant.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#' @rdname data_tidy_hladr_infant
#'
#' @title HLADR MFI by cell type for infants
#'
#' @description HLADR mean fluorescent intensity (MFI) by cell type
#' for infants.
#'
#' @format Dataframe with 315 rows and the following 4 columns:
#' \describe{
#' \item{pid}{character. Participant ID.}
#' \item{bcg}{"bcg", "no bcg".
#' BCG status for infants ("no bcg" -> delayed BCG; "bcg" -> BCG at birth).}
#' \item{pop}{character. Cell type. "mait" is MAIT cells,
#' "nkt" is NKT cells, "cd1b" is CD1b GMM T cells,
#' "gd" is TCRgd T cells,
#' "cd4" is CD4 T cells and "cd4_ifng" is
#' IFNg+ CD4 T cells.}
#' \item{hladr_mfi}{numeric. HLADR MFI.}
#' }
#'
"data_tidy_hladr_infant"
23 changes: 23 additions & 0 deletions R/data_tidy_mem.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#' @rdname data_tidy_mem
#'
#' @title Frequencies of memory phenotypes
#'
#' @description Frequency of memory phenotype relative
#' to the parent population.
#'
#' @format Dataframe with 284 rows and the following 5 columns:
#' \describe{
#' \item{pid}{character. Participant ID.}
#' \item{age}{"infant" or "adult". Age category of participant.}
#' \item{grp}{character.
#' BCG status for infants ("no bcg" -> delayed BCG; "bcg" -> BCG at birth)
#' and adults ("0", "21", "35" and "365" days after BCG revaccination).}
#' \item{pop}{character.
#' Major cell type. "mait" is MAIT cells,
#' "nkt" is NKT cells, "gd" is TCRgd T cells,
#' "cd4" is CD4 T cells.}
#' \item{pop_sub}{character. Phenotype.}
#' \item{freq}{numeric. Frequency of parent population (specified
#' by \code{pop}).}
#' }
"data_tidy_mem"
95 changes: 94 additions & 1 deletion data-raw/data_tidy_prep.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,12 @@ library(magrittr)
library(stringr)
```

```{r }
dir_save_csv <- here::here("inst", "extdata")
if (!dir.exists(dir_save_csv)) dir.create(dir_save_csv,
recursive = TRUE)
```

```{r , include = FALSE}
unlink(here::here("data"))
dir_data_raw <- file.path(
Expand Down Expand Up @@ -288,7 +294,8 @@ data_tidy_freq <- purrr::map_df(sheet_vec, function(sheet) {
data_out
}) %>%
tibble::as_tibble()
tibble::as_tibble() %>%
dplyr::select(-pid_sheet)
```

```{r }
Expand All @@ -302,6 +309,13 @@ usethis::use_data(data_tidy_freq,
overwrite = TRUE)
```

```{r }
readr::write_csv(
x = data_tidy_freq,
file = file.path(dir_save_csv, "data_tidy_freq.csv")
)
```

## HLADR MFI: Infants (Figure 3)

```{r , include = FALSE}
Expand Down Expand Up @@ -442,6 +456,14 @@ usethis::use_data(
)
```

```{r }
readr::write_csv(
x = data_tidy_hladr_infant,
file = file.path(dir_save_csv, "data_tidy_hladr_infant.csv")
)
```


## HLADR MFI: Adults (Figure 4)

```{r , include = FALSE}
Expand Down Expand Up @@ -510,6 +532,13 @@ usethis::use_data(data_tidy_hladr_adult,
overwrite = TRUE)
```

```{r }
readr::write_csv(
x = data_tidy_hladr_adult,
file = file.path(dir_save_csv, "data_tidy_hladr_adult.csv")
)
```

## IFN$\gamma$+ frequency: Infants (Figure 5)

```{r }
Expand Down Expand Up @@ -604,6 +633,14 @@ usethis::use_data(
)
```

```{r }
readr::write_csv(
x = data_tidy_freq_ifng,
file = file.path(dir_save_csv, "data_tidy_freq_ifng.csv")
)
```


## Memory profile (Figure 6)

```{r , include = FALSE}
Expand Down Expand Up @@ -734,6 +771,13 @@ usethis::use_data(
)
```

```{r }
readr::write_csv(
x = data_tidy_mem,
file = file.path(dir_save_csv, "data_tidy_mem.csv")
)
```

## Clinical data - Infants

### Copy of delayed BCG samples for Anele_.xlsx
Expand Down Expand Up @@ -1058,6 +1102,13 @@ data_tidy_clin_infant <- data_tidy_clin_infant %>%
dplyr::mutate(age = "infant")
```

##### Remove unused and very descriptive columns.

```{r }
data_tidy_clin_infant <- data_tidy_clin_infant %>%
dplyr::select(-c(visit_age, gest_age, weight, length, head_circ))
```

##### View

```{r }
Expand All @@ -1071,6 +1122,14 @@ usethis::use_data(data_tidy_clin_infant,
overwrite = TRUE)
```

```{r }
readr::write_csv(
x = data_tidy_clin_infant,
file = file.path(dir_save_csv, "data_tidy_clin_infant.csv")
)
```


## Clinical data - Adults

### TBRU samples for Anele .xlsx
Expand Down Expand Up @@ -1380,6 +1439,13 @@ data_tidy_clin_adult <- data_tidy_clin_adult %>%
dplyr::select(pid, age, sex, race, everything())
```

##### Remove unused and very descriptive columns.

```{r }
data_tidy_clin_adult <- data_tidy_clin_adult %>%
dplyr::select(-c(age_in_years, hgbval, height, weight, ppd))
```

#### View

```{r }
Expand All @@ -1392,3 +1458,30 @@ datautils::view_cols(
usethis::use_data(data_tidy_clin_adult,
overwrite = TRUE)
```

```{r }
readr::write_csv(
x = data_tidy_clin_adult,
file = file.path(dir_save_csv, "data_tidy_clin_adult.csv")
)
```

## Zip files together

```{r }
wd <- getwd()
setwd(dirname(wd))
file_vec <- list.files(dir_save_csv, full.names = FALSE,
pattern = "csv$")
file_vec <- paste0("inst/extdata/", file_vec)
file_zip <- file.path("inst", "extdata",
"data_tidy_gela_durt.zip")
if (file.exists(file_zip)) unlink(file_zip, recursive = TRUE)
zip::zip(
zipfile = file_zip,
files = file_vec
)
setwd(wd)
```
Binary file modified data/data_tidy_clin_adult.rda
Binary file not shown.
Binary file modified data/data_tidy_clin_infant.rda
Binary file not shown.
Binary file modified data/data_tidy_freq.rda
Binary file not shown.
26 changes: 26 additions & 0 deletions inst/extdata/data_tidy_clin_adult.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
pid,age,sex,race
adult_01,adult,female,non_black
adult_02,adult,female,non_black
adult_03,adult,female,non_black
adult_04,adult,male,non_black
adult_05,adult,female,non_black
adult_06,adult,female,non_black
adult_07,adult,male,black
adult_08,adult,male,non_black
adult_09,adult,male,non_black
adult_10,adult,female,non_black
adult_11,adult,female,non_black
adult_12,adult,female,black
adult_13,adult,female,non_black
adult_14,adult,female,black
adult_15,adult,male,non_black
adult_16,adult,male,non_black
adult_17,adult,female,non_black
adult_18,adult,male,non_black
adult_19,adult,female,non_black
adult_20,adult,female,black
adult_21,adult,female,non_black
adult_22,adult,female,non_black
adult_23,adult,female,non_black
adult_24,adult,female,black
adult_25,adult,female,non_black
Loading

0 comments on commit 1227790

Please sign in to comment.