generated from statOmics/Rmd-website
-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy path06_2_FEV.Rmd
98 lines (67 loc) · 2.84 KB
/
06_2_FEV.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
---
title: "Exeercise 6.2: Linear regression on the FEV dataset"
author: "Lieven Clement and Jeroen Gilis"
date: "statOmics, Ghent University (https://statomics.github.io)"
---
# The FEV dataset
The FEV, which is an acronym for forced expiratory volume,
is a measure of how much air a person can exhale (in liters)
during a forced breath. In this dataset, the FEV of 606 children,
between the ages of 6 and 17, were measured. The dataset
also provides additional information on these children:
their `age`, their `height`, their `gender` and, most
importantly, whether the child is a smoker or a non-smoker.
The overarching goal of this experiment was to find out if
smoking has an effect on the FEV of children.
# Load the required libraries
```{r, message = FALSE}
library(tidyverse)
```
# Import the data
```{r}
fev <- read_tsv("https://raw.githubusercontent.com/statOmics/PSLSData/main/fev.txt")
head(fev)
```
# Tidy the data
There are a few things in the formatting of the
data that can be improved upon:
1. Both the `gender` and `smoking` can be transformed to
factors.
2. The `height` variable is written in inches. Assuming that
this audience is mainly Portuguese/Belgian, inches are hard to
interpret. Let's add a new column, `height_cm`, with the values
converted to centimeters using the `mutate` function.
```{r}
```
# Data Exploration
Explore the data. Visualise the FEV for smokers versus non-smokers:
```{r}
```
Did you expect these results? Can you explain what we observe (and why)?
Additionally, can you provide an even better visualisation of the data, taking
into account more useful explanatory variables with respect
to the FEV?
```{r}
```
# Analysis
As stated above, the overarching goal of this experiment was to assess the
impact of smoking on the FEV of children.
In principle, we have multiple variables that can affect the FEV.
We have, however, not learned yet how to model the response based on multiple
predictors. To answer this research question properly, we will need some
more advanced modelling techniques. In the tutorial on `multiple regression`,
we will learn those and come back to this dataset!
For now, we can already assess other (less complex) research questions:
1. Is there a linear association between the `FEV` and the `height` of
`non-smoking females`?
2. Is there a linear association between the `FEV` and the `age` of
`non-smoking females`?
3. Is there a linear association between the `FEV` and the `height` of
`non-smoking males`?
4. Is there a linear association between the `FEV` and the `age` of
`non-smoking males`?
For each of these short research questions, you should:
1. Check the assumptions of the linear model, and analyse the data accordingly.
2. Interpret the output, focusing on the interpretation of the intercept and the
slope parameters of the model.
3. Formulate a conclusion for each research hypothesis.