-
Notifications
You must be signed in to change notification settings - Fork 7
/
Copy pathchapter8.Rmd
834 lines (568 loc) · 23.4 KB
/
chapter8.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
---
title: Estimation
description: 'Get your brackets ready for diving in the world of statistical inference!'
---
## Exploring estimation with R
```yaml
type: NormalExercise
key: 44d1f23e51
lang: r
xp: 100
skills: 1
```
Welcome to chapter 8: **Estimation**. In this chapter we will focus on point and interval estimation. Interval estimation is based on the important concept of the *sampling distribution*, which is a somewhat theoretical concept and therefore takes a while to get used to.
The sampling distribution is a kind of mind game: If we could collect our random sample again and again and again and calculate the mean for each of the samples, the resulting distribution of these sample means would be the sampling distribution (of the sample mean).
It turns out that for a lot of important point estimates, the sampling distribution can be approximated with the normal distribution with certain known parameters! This is due to a fascinating phenomenom called the *central limit theorem*.
To explore these concepts, we will start this chapter with some technical tools we will need for constructing *loops* and *simulations*.
`@instructions`
- Here in DataCamp, when a single expression continues over multiple rows, the whole chunk of code needs to be executed at once. This can be done by first painting (selecting) all the lines in the expression with a mouse and then pressing `Ctrl + Enter` as normal.
- Execute the 3 rows long for-loop expression
- Click submit answer to move on to the next exercise
`@hint`
- If you cannot paint (select) the expression for some reason, you can just click submit answer instead.
`@pre_exercise_code`
```{r}
# pre exercise code here
```
`@sample_code`
```{r}
# Execute me!
for(i in 1:6) {
print("I am a loop")
}
```
`@solution`
```{r}
# Execute me!
for(i in 1:6) {
print("I am a loop")
}
```
`@sct`
```{r}
test_error()
success_msg("Good work!")
```
---
## Indices and brackets
```yaml
type: NormalExercise
key: 23abc9f934
lang: r
xp: 100
skills: 1
```
Vectors in R store multiple values of the same data type. The values in vectors have indices: the first value in a vector has a index `1`, the second value `2` and so on.
You can set or get a single value from a vector by using indices and brackets `[ ]`. Using an index number inside the brackets gives access to a single value from the vector. Using a vector of indices gives access to multiple values (another vector).
It is also possible to rearrange the values in a vector by using indices. Look at the example code to see how indices work.
`@instructions`
- Create the `names` vector.
- See how the indices work by executing the example lines.
- Use brackets and indices on `names` to create a new vector `girls` with values "Liisa" and "Anna" (in that order).
- Use brackets and indices on `names` to create a new vector `boys` with values "Pekka" and "Heikki" (in that order).
`@hint`
- Note that space between the vector object and bracets produces an error.
- Index vectors `c(1,2)` and `c(2,1)` do not produce the same outcome. The order of the values is different.
`@pre_exercise_code`
```{r}
```
`@sample_code`
```{r}
# Let's create a vector
names <- c("Matti", "Pekka", "Liisa", "Anna")
# Acess the first value of the vector
names[1]
# Change the first value of the vector
names[1] <- "Heikki"
# Acess the 1. and 3. value of the vector
names[c(1, 3)]
# Use indices and brackets to separate the names vector into two vectors of the specified ordering
girls <-
boys <-
```
`@solution`
```{r}
# Let's create a vector
names <- c("Matti", "Pekka", "Liisa", "Anna")
# Acess the first value of the vector
names[1]
# Change the first value of the vector
names[1] <- "Heikki"
# Acess the 1. and 3. value of the vector
names[c(1, 3)]
# Use indices and brackets to separate the names vector into two vectors of the specified ordering
girls <- names[c(3, 4)]
boys <- names[c(2, 1)]
```
`@sct`
```{r}
```
---
## Easy vectors
```yaml
type: NormalExercise
key: 8e0ac16859
lang: r
xp: 100
skills: 1
```
Sometimes you need a very long index vector. In that case it would be a lot of work to create the vector by combining integer values with `c()`. Luckily, there are convenient ways to create long vectors in R.
For integer values, a specially useful one is the method `start:end`, which creates an integer vector with all the values from start to end. The following two lines hence produce identical results
1:5
c(1,2,3,4,5)
`@instructions`
- Create an integer vector with values from 1 to 5.
- Create an integer vector with even values from 2 to 10.
- Create object `attitude` and give the values in it names matching the indices.
- Access index values 1-5 of `attitude`
- Use `:` to create an integer vector with the values 1, 2, ..., 10
- Access every second value of the `attitude` vector, starting from the 2. value until the 20th value. These values correspond to the even numbered indeces of the vector: 2, 4, .. , 20
`@hint`
- First you will need an index vector with values 2, 4, .. , 20. The example shows how to create such a vector
- You can then use the index vector together with brackets (`[ ]`) to complete the task
`@pre_exercise_code`
```{r}
learning2014 <- read.table("http://www.helsinki.fi/~kvehkala/JYTmooc/learning2014.txt", sep = "\t", header = TRUE)
```
`@sample_code`
```{r}
# learning2014 is available
# Create an integer vector with values 1,2,..,5
1:5
# Create an integer vector with even values 2, 4, .. , 10
(1:5)*2
# Create object attitude and give the data points names 1, 2, ..
attitude <- learning2014$attitude
names(attitude) <- 1:length(attitude)
# Access the values 5 - 10 of attitude
attitude[5:10]
# Create an integer vector with values 1,2,..,10
# Access every second value of attitude from 2. to the 20th index
```
`@solution`
```{r}
# learning2014 is available
# Create an integer vector with values 1,2,..,5
1:5
# Create an integer vector with even values
(1:5)*2
# Create object attitude and give the data points names 1, 2, ..
attitude <- learning2014$attitude
names(attitude) <- 1:length(attitude)
# Access the values 5 - 10 of attitude
attitude[5:10]
# Create an integer vector with values 1,2,..,10
1:10
# Access every second value of attitude from 2. to the 20th index
attitude[(1:10)*2]
```
`@sct`
```{r}
# submission correctness tests
test_student_typed("1:10", not_typed_msg = "Did you use `:` to create and print out the specified integer vector?")
test_output_contains("attitude[(1:10)*2]")
# test if the students code produces an error
test_error()
# Final message the student will see upon completing the exercise
success_msg("Great work!")
```
---
## Looping
```yaml
type: NormalExercise
key: a26cc4ab51
lang: r
xp: 100
skills: 1
```
In programming, often there is the need to repeat an action multiple times. In statistical programming you might for example have a theory you wish to explore by simulation.
The **for-loop** is perhaps the most common loop in programming. The structure of a for-loop is:
```
for (counter in vector) {
commands
more commands
}
```
The for-loop iterates trough the values of a vector by changing the value of the counter one at a time. The counter first takes on the first value of the vector, then the second and so on.
Inside the curly braces is the *body* of the loop, which can contain regular commands such as function calls. The current value of the counter can be used there. All the commands inside the body are repeated until the counter has taken all the values of the vector.
`@instructions`
- **Please note!** Here in DataCamp, executing a command over multiple lines is done by selecting (painting) the lines with a mouse first and then hitting `Ctrl+Enter` normally. Alternatively you could also click submit to execute the whole script.R
- Loop though the character values using the examle code.
- Loop through integer values using the example code.
- In an honor of the legendary electronic duo *Daft Punk* and their 2001 album *Discovery*, construct a for-loop that prints out "One more time!" 27 times.
- Hints: (1) Remember that `1:n` creates an integer vector of length n. (2) Inside the loop body you have to use the `print()` function to print.
`@hint`
- It does not matter what name you give to your counter. You can for example use `i` as is done in the second example.
- In this exercise you don't need to use the values of your counter.
- Remember to use quotation marks to create and print characters.
`@pre_exercise_code`
```{r}
# no pec
```
`@sample_code`
```{r}
# Loop through and print the characters a, b, c, d
for(letter in c("a","b","c","d")) {
print(letter)
}
# Loop through integers 1, 2, 3, 4, 5
for(i in 1:5) {
print(i + 5)
}
# Write a for-loop that prints out "One more time!" 27 times
```
`@solution`
```{r}
# loop through characters a,b,c,d
for(letter in c("a","b","c","d")) {
print(letter)
}
# loop through integers 1, 2, 3, 4, 5
for(i in 1:5) {
print(i + 5)
}
# Write a for-loop that prints out "One more time!" 27 times
for(i in 1:27) {
print("One more time!")
}
```
`@sct`
```{r}
# submission correctness tests
test_output_contains("'One more time!'", times = 27, incorrect_msg = "Did you write a for-loop that prints out 'One more time!' 27 times?")
# test if the students code produces an error
test_error()
# Final message the student will see upon completing the exercise
success_msg("Excellent work! Repetition is the key to success :)")
```
---
## Point estimation
```yaml
type: NormalExercise
key: 78e301b70a
lang: r
xp: 100
skills: 1
```
Let us now get back to statistical analysis and estimation. When given a sample of data from a population, the goal of statistical analysis is to use that sample to draw conclusions about the population.
In a simple case we are interested in a single variable (such as students points in an exam) and want to estimate the *expected value* of that variable. The expected value is the average value in the target population - the population mean.
We can't directly calculate the population average because we only have a sample, so we have to **estimate** it. Obtaining a point estimate of a population parameter is rather easy: just use the corresponding sample statistic.
population parameter | estimate
--------------------------------------- | --------
expected value $\mu$ | sample mean $\bar{x}$
population standard deviation $\sigma$ | sample standard deviation $s$
<hr>
In statistics, estimates are often denoted with a hat. So, for example, we also use the notation $\hat{\mu} = \bar{x}$, and refer to "mu_hat", respectively.
`@instructions`
- Create object `points`
- Estimate the expected value of `points`. If you can't remember the name of the R function you need, use your favourite search engine or take a hint.
- Estimate the population standard deviation of `points`.
- Combine the estimates to the `estimates` vector (replace `NA`). Notice how `c()` can be used to give names to the values.
- Print out the `estimates` vector, with the values rounded to 2 digits.
`@hint`
- The table above shows the relationships between the sample statistics and the population parameters.
- Use the table to figure out which operation you could use to produce the estimate.
- Mean can be computed with `mean()`
`@pre_exercise_code`
```{r}
learning2014 <- read.table("http://www.helsinki.fi/~kvehkala/JYTmooc/learning2014.txt", sep = "\t", header = TRUE)
```
`@sample_code`
```{r}
# learning2014 is available
# create object points using the learning2014 data.frame
points <- learning2014$points
# estimate the expected value of points
mu_hat <-
# estimate the population standard deviation of points
sigma_hat <- sd(points)
# Combine the estimates to a named vector and print out the rounded values
estimates <- c("mu_hat" = NA, "sigma_hat" = NA)
round(estimates, digits = 2)
```
`@solution`
```{r}
# learning2014 is available
# create object points using the learning2014 data.frame
points <- learning2014$points
# estimate the expected value of points
mu_hat <- mean(points)
# estimate the population standard deviation of points
sigma_hat <- sd(points)
# Print out the estimated values
estimates <- c("mu_hat" = mu_hat, "sigma_hat" = sigma_hat)
round(estimates, digits = 2)
```
`@sct`
```{r}
# submission correctness tests
test_function("mean", args=c("x"))
test_object("mu_hat", incorrect_msg = "Please create the object mu_hat. Use the correct function on the points vector.")
test_output_contains("round(estimates, 2)", incorrect_msg = "Please insert your hat objects to the estimates vector and do not remove the row with `round()`")
# test if the students code produces an error
test_error()
# Final message the student will see upon completing the exercise
success_msg("Very nice! You get full points for point estimation.")
```
---
## Interval estimation
```yaml
type: NormalExercise
key: 1df061a41a
lang: r
xp: 100
skills: 1
```
Clearly there is uncertainty involved in a single sample mean (our point estimate). But how much uncertainty? This is where we need statistical theory.
One way to approach this question is to try to come up with an interval around the point estimate to describe our uncertainty. It turns out that the following formula produces an interval which contains the true population mean $100 \cdot (1-\alpha)$% of the time:
$$\bar{x} \pm z_{\alpha / 2} \cdot \frac{s}{\sqrt{n}}$$
where $s$ is the sample standard deviation, $n$ is the sample size and $z_{\alpha / 2}$ is a critical (quantile) value from the $N(0,1)$ distribution. The part
$$\frac{s}{\sqrt{n}}$$
is called the *standard error* of the sample mean $\bar{x}$.
I know what you're thinking. It's confusing: where did the normal distribution come from and what's with all that $z_{\alpha/2}$ stuff? The answers lie in the **sampling distribution** which we will soon get to.
`@instructions`
- Create objects `points`, `n`, `mu_hat` and `s`
- Adjust the code: Compute the standard error of `mu_hat` (the sample mean) and save t to object `error`. `sqrt()` computes the square root.
- Compute the critical value z using a 99% confidence level (alpha = 0.01) and create object `z`
- Compute the lower and upper limits of the confidence interval (CI) by applying the formula given above
- Combine the point estimate and the lower and upper CI values to the object `interval_estimate`.
- Print the values in `interval_estimate`, rounded to 1 digit (hint: `round()`)
`@hint`
- The formula for the standard error is s/sqrt(n)
`@pre_exercise_code`
```{r}
learning2014 <- read.table("http://www.helsinki.fi/~kvehkala/JYTmooc/learning2014.txt", sep = "\t", header = TRUE)
```
`@sample_code`
```{r}
# learning2014 is available.
# Create object points
points <- learning2014$points
# Number of observations
n <- length(points)
# Sample mean estimates the expected value (mu)
mu_hat <- mean(points)
# Sample standard deviation
s <- sd(points)
# Compute the standard error of mu_hat
error <- NA
# Compute the critical value z
z <- qnorm(0.01/2, lower.tail = F)
# Compute the confidence interval
lower_ci <- mu_hat - NA
upper_ci <- mu_hat + NA
# Combine the point and interval estimates to a named vector
interval_estimate <- c("estimate" = mu_hat, "lower99%" = lower_ci, "upper99%" = upper_ci)
# Round and print out the point and interval estimates
```
`@solution`
```{r}
# learning2014 is available.
# Create object points
points <- learning2014$points
# Number of observations
n <- length(points)
# Sample mean estimates the expected value (mu)
mu_hat <- mean(points)
# Sample standard deviation
s <- sd(points)
# Compute the standard error of mu_hat
error <- s/sqrt(n)
# Compute the critical value z
z <- qnorm(0.01/2, lower.tail = F)
# compute the confidence interval and print out the results
lower_ci <- mu_hat - z*error
upper_ci <- mu_hat + z*error
# Combine the point and interval estimates to a named vector
interval_estimate <- c("estimate" = mu_hat, "lower99%" = lower_ci, "upper99%" = upper_ci)
# Round and print out the point and interval estimates
round(interval_estimate, digits = 1)
```
`@sct`
```{r}
# submission correctness tests
test_object("error")
test_object("lower_ci")
test_object("upper_ci")
test_output_contains("round(interval_estimate, digits = 1)")
# test if the students code produces an error
test_error()
# Final message the student will see upon completing the exercise
success_msg("You are awsome with 99% confidence!")
```
---
## The sampling distribution
```yaml
type: NormalExercise
key: 65d85c42aa
lang: r
xp: 100
skills: 1
```
Remember that you can think of the sampling distribution as a kind of mind game: If we could collect our random sample again and again and calculate a sample mean each time, the result would be the sampling distribution of the mean.
In reality though, we can't produce this distribution, because we only have the one sample. Or can we? Let's pretend for a moment that the data we have *is* the target population. Let's see what happens if we repeatedly
1. take random samples from the population
2. calculate the sample mean each time and
3. store the results.
The resulting distribution of the multiple sample means satisfies the idea of the sampling distribution, meaning that we can actually simulate the distribution! This method (called [bootstrapping](https://en.wikipedia.org/wiki/Bootstrapping_(statistics))) is widely used in statistics.
`@instructions`
- Create `N` and `means` and print out the empty `means` vector
- Execute the for-loop. What does it do?
- Print out the `means` vector again
- Compute basic summary statistics of `means` using the appropriate function
- Draw a histogram displaying the sampling disribution of the means
- See the help page of the `hist()` function and make it a density histogram
`@hint`
- `summary()` computes basic summary statistics
- `hist()` draws a histogram and the argument `freq = F` can be used to draw a density histogram
- ?hist opens the help page of `hist()`
`@pre_exercise_code`
```{r}
learning2014 <- read.table("http://www.helsinki.fi/~kvehkala/JYTmooc/learning2014.txt", sep = "\t", header = TRUE)
# give a seed value to the random number generator of R
set.seed(888)
```
`@sample_code`
```{r}
# learning2014 is available
# Create an empty vector of length N
N <- 100
means <- numeric(N)
# Print out the empty vector
# Repeat N times:
# (1) draw a random sample of n = 50 exam points
# (2) compute the mean of the sampled exam points and
# (3) store it in the means vector
for(i in 1:N) {
points_sample <- sample(learning2014$points, size = 50, replace = F)
means[i] <- mean(points_sample)
}
# Print out the means vector
# Compute basic summary statistics
# Visualize the distribution of the means with a histogram
```
`@solution`
```{r}
# learning2014 is available
# Create an empty vector of length N (size of our experiment)
N <- 100
means <- numeric(N)
# Print out the empty vector
means
# Repeat N times:
# (1) draw a random sample of n = 50 exam points
# (2) compute the mean of the sampled exam points and
# (3) store it in the means vector
for(i in 1:N) {
points_sample <- sample(learning2014$points, size = 50, replace = F)
means[i] <- mean(points_sample)
}
# Print out the means vector
means
# Compute basic summary statistics
summary(means)
# Visualize the distribution of the means with a histogram
hist(means, freq = F)
```
`@sct`
```{r}
# submission correctness tests
test_output_contains("numeric(N)")
test_output_contains("means")
test_function("summary", args=c("object"))
test_function("hist", args=c("x","freq"))
# test if the students code produces an error
test_error()
# Final message the student will see upon completing the exercise
success_msg("Very good work!")
```
---
## The central limit theorem
```yaml
type: NormalExercise
key: 38bf66cd9c
lang: r
xp: 100
skills: 1
```
Let's now get back to the confidence interval for the population mean:
$$\bar{x} \pm z_{\alpha / 2} \cdot \frac{s}{\sqrt{n}}$$
where $z$ is a quantile point from the $N(0,1)$ distribution. What is the relationship between the normal distribution and the (sampling) distribution of the sample mean?
A mean is a scaled sum, so according to the [central limit theorem](https://en.wikipedia.org/wiki/Central_limit_theorem) (CLT), its distribution is approximately normal.
The confidence interval above is based on a normal approximation (for the sampling distribution of the sample mean) and it is a good approximation in theory, according to the CLT. In this exercise we'll explore if that approximation seems to work in practise.
`@instructions`
- Create object `n` and take a random sample of 50 students' exam points and create object `points_sample`
- Create objects `mu_hat`, `s` and `error`
- Print out `mu_hat` and it's standard error, rounded to 2 digits
- Inside `dnorm()`, set the argument `mean` = the estimate of the population mean of the exam points, based on the random sample of size 50
- Inside `dnorm()` set the argument `sd` = the standard error (the estimate of the standard deviation) of the mean, based on the random sample of size 50
- Execute the line with `hist()` and `curve()`, which visualize (1) the previously simulated sampling distribution of the means, and (2) a normal approximation to that distribution based on the random sample of size 50. (Techically it's two lines of code: `;` is code for line change)
- Does the normal approximation seem useful?
`@hint`
- `mu_hat` is the sample mean (the estimate of the population mean)
- `error` is the standard error of the sample mean
`@pre_exercise_code`
```{r}
learning2014 <- read.table("http://www.helsinki.fi/~kvehkala/JYTmooc/learning2014.txt", sep = "\t", header = TRUE)
set.seed(888)
# Create an empty vector of length N (size of the simulation)
N <- 100
means <- numeric(N)
# Repeat N times:
# (1) draw a random sample of n = 50 exam points
# (2) compute the mean of the sampled exam points and
# (3) store it in the means vector
for(i in 1:N) {
points_sample <- sample(learning2014$points, size = 50, replace = F)
means[i] <- mean(points_sample)
}
# Visualize the distribution of the means with a histogram
hist(means, freq = F)
```
`@sample_code`
```{r}
# learning214 and means are available
# Draw a single sample of n = 50 exam points
n <- 50
points_sample <- sample(learning2014$points, size = n, replace = F)
# Compute the sample mean
mu_hat <- mean(points_sample)
# Compute the sample standard deviation
s <- NA
# Compute the standard error of the mean
error <- NA
# Round and print
c(mu_hat, error)
# (1) Histogram of the previously simulated sample means and
# (2) The normal approximation of that distribution, based on a single sample
hist(means, freq = F); curve(dnorm(x, mean = NA, sd = NA), add = T)
```
`@solution`
```{r}
# learning214 and means are available
# Draw a single sample of n = 50 exam points
n <- 50
points_sample <- sample(learning2014$points, size = n, replace = F)
# Compute the sample mean
mu_hat <- mean(points_sample)
# Compute the sample standard deviation
s <- sd(points_sample)
# Compute the standard error of the mean
error <- s/ sqrt(n)
# print
c(mu_hat, error)
# (1) Histogram of the previously simulated sample means and
# (2) The normal approximation of that distribution, based on a single sample
hist(means, freq = F); curve(dnorm(x, mean = mu_hat, sd = error), add = T)
```
`@sct`
```{r}
# submission correctness tests
test_object("s")
test_object("error")
test_function("dnorm", args = c("mean","sd"))
# test if the students code produces an error
test_error()
# Final message the student will see upon completing the exercise
success_msg("Very well done!! You are making really good progress!")
```