generated from statOmics/Rmd-website
-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy path05_1_diabetes.Rmd
169 lines (109 loc) · 4.23 KB
/
05_1_diabetes.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
---
title: "Exercise 5.1: Hypothesis testing on the diabetes example"
author: "Lieven Clement, Jeroen Gilis and Milan Malfait"
date: "statOmics, Ghent University (https://statomics.github.io)"
---
# Aims
In this exercise, you will revisit the basics of statistical hypothesis testing.
You will acquire the skills
1. to assess the assumptions of a one-sample and a paired t-test in a data exploration.
2. to conduct a one-sample t-test in R and to interpret the results.
3. to conduct a paired t-test in R and to interpret the results.
# The diabetes dataset
The `diabetes` dataset holds information on a small experiment with
8 patients that are subjected to a glucose tolerance test.
Patients had to fast for eight hours before the test.
When the patients entered the hospital their baseline glucose level was measured (mmol/l).
Patients then had to drink 250 ml of a syrupy glucose solution containing 100
grams of sugar. Two hours later, their blood glucose level was measured again.
The data consist of three variables:
- `before`: glucose concentration upon 8 hours of fasting (mmol/l)
- `after`: glucose concentration 2 hours after drinking glucose solution (mmol/l).
- `patient`: identifier for the patient
## Research questions
The goal is to answer two research questions;
1. Is the average glucose level after the sugar intake different from the average glucose level before sugar intake?
2. Is the average glucose level of the patients two hours after the sugar intake higher than 7.8 mmol/L?
## Import the data
First, load the required R libraries:
```{r, message=FALSE, warning=FALSE}
library(tidyverse)
```
```{r, eval=FALSE}
```
# Data exploration
We will start with a data exploration.
Have a first look at the raw data. How is the data structured?
Is this data *tidy*?
```{r, eval=FALSE}
```
# Question 1
Is the glucose level after 8 hours of fasting on average different from the glucose level two hours after intake of 100g of glucose?
As the data are paired, we can expect that the measurements before and after the glucose intake are correlated.
We can illustrate this with a scatterplot.
```{r, eval=FALSE}
```
## Check the assumptions
State the assumptions that you have to check and include the diagnostic plots to
assess the assumption.
## Hypothesis test
If all assumptions are met, we may continue with
performing the paired two-sample t-test.
```{r, eval=FALSE}
paired_t <- t.test(..., ..., paired = ...)
paired_t
```
## Conclusion
Formulate a conclusion based on the output.
# Alternative solution: One-sample t-test on the difference
Since the data is paired, we can also simply calculate the differences in
glucose level before and after sugar intake for each patient. We can then
perform a one-sample t-test on these differences, testing whether they are
significantly different from zero. This is equivalent to the paired t-test we
performed above.
We can verify this with the analysis below
```{r, eval=FALSE}
t.test(... ~ 1, data = ..., mu = ...)
```
***
# Question 2
Is the average glucose level two hours after sugar intake
higher than the threshold of 7.8 mmol/l for pre-diabetes?
We can test this hypothesis using a **one-sample t-test**.
Indeed we are interested to compare the average glucose level to a known threshold for pre-diabetes.
## Assess the assumptions
Before we can perform a one sample t-test, we must check that the required
assumptions are met!
1. The observations are independent
2. The glucose levels two hours after the treatment are normally distributed
```{r, eval=FALSE}
## First filter the data on the desired group
diabetes_after <- ... %>%
filter(... == "...")
```
```{r, eval=FALSE}
... %>%
...() +
geom_qq() +
geom_qq_line()
```
What do you observe?
## Hypothesis test
Here, we will test if mean glucose level 2 hours after sugar intake is significantly
higher than the threshold for pre-diabetes of 7.8 mmol/l. More specifically, we will test the null hypothesis;
$H_0:$ ...
versus the alternative hypothesis;
$H_1:$ ...
```{r, eval=FALSE}
t_test_after <- t.test(... ~ 1,
data = ...,
mu = ...,
alternative = "...",
conf.level = ...
)
t_test_after
```
## Conclusion
When writing a conclusion on your research hypothesis,
it is very important to be precise, concise, and complete.
...