-
Notifications
You must be signed in to change notification settings - Fork 13
/
Copy pathindexing.Rmd
161 lines (102 loc) · 3.96 KB
/
indexing.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
# Indexing
```{r indexsetup, include=FALSE, eval=TRUE, cache=FALSE}
knitr::opts_chunk$set(eval=FALSE)
```
<div class="" style="text-align: center;">
<i class="fas fa-table fa-5x" style = 'color:#990024CC'></i>
</div>
<br>
What follows is probably more of a refresher for those that have used R quite a bit already. Presumably you've had enough R exposure to be aware of some of this. However, much of data processing regards data frames, or other tables of mixed data types, so more time will be spent on slicing and dicing of data frames instead. Even so, it would be impossible to use R effectively without knowing how to handle basic data types.
## Slicing Vectors
Taking individual parts of a vector of values is straightforward and something you'll likely need to do a lot. The basic idea is to provide the indices for which elements you want to exract.
```{r vectorSlice, eval=TRUE}
letters[4:6] # lower case letters a-z
letters[c(13, 10, 3)]
```
## Slicing Matrices/data.frames
With 2-d objects we can specify rows and columns. Rows are indexed to the left of the comma, columns to the right.
```{r matrixSlice}
myMatrix[1, 2:3] # matrix[rows, columns]
```
## Label-based Indexing
We can do this by name if they are available.
```{r dfindexlabel}
mydf['row1', 'b']
```
## Position-based Indexing
Otherwise we can index by number.
```{r dfindexpos}
mydf[1, 2]
```
## Mixed Indexing
Even both!
```{r dfindexmix}
mydf['row1', 2]
```
If the row/column value is empty, all rows/columns are retained.
```{r dfindexslice}
mydf['row1', ]
mydf[, 'b']
```
## Non-contiguous
Note that the indices supplied do not have to be in order or in sequence.
```{r dfindexnoncont}
mydf[c(1, 3), ]
```
## Boolean
Boolean indexing requires some `TRUE`-`FALSE` indicator. In the following, if column A has a value greater than or equal to 2, it is `TRUE` and is selected. Otherwise it is `FALSE` and will be dropped.
```{r dfindexbool}
mydf[mydf$a >= 2, ]
```
## List/data.frame Extraction
We have a couple ways to get at elements of a list, and likewise for data frames as they are also lists.
<span class="func">[</span> : grab a slice of elements/columns
<span class="func">[[</span> : grab specific elements/columns
<span class="func">$</span> : grab specific elements/columns
<span class="func">@</span>: extract slot for S4 objects
```{r dflistslice}
my_list_or_df[2:4]
```
```{r dflistextract}
my_list_or_df[['name']]
```
```{r dflistextract2}
my_list_or_df$name
```
```{r dflistextract3}
my_list@name
```
<div class='note'>
In general, position-based indexing should be avoided, except in the case of iterative programming of the sort that will be covered later. The reason is that these become *magic numbers* when not commented, such that no one will know what they refer to. In addition, any change to the rows/columns of data will render the numbers incorrect, where labels would still be applicable.
<img class='img-note' src="img/R.ico" style="display:block; margin: 0 auto;">
</div>
## Indexing Exercises
This following is a refresher of base R indexing only.
Here is a <span class="objclass">matrix</span>, a <span class="objclass">data.frame</span> and a <span class="objclass">list</span>.
```{r ixex0}
mymatrix = matrix(rnorm(100), 10, 10)
mydf = cars
mylist = list(mymatrix, thisdf = mydf)
```
### Exercise 1
For the <span class="objclass">matrix</span>, in separate operations, take a slice of rows, a selection of columns, and a single element.
```{r ixex1, echo=F}
mymatrix[1:5, ]
mymatrix[, 1:5]
mymatrix[1, 2]
```
### Exercise 2
For the <span class="objclass">data.frame</span>, grab a column in 3 different ways.
```{r ixex2, echo=F}
mydf$disp
mydf[, 2]
mydf['disp']
```
### Exercise 3
For the <span class="objclass">list</span>, grab an element by number and by name.
```{r ixex3, echo=F}
mylist[2]
mylist$thisdf
```
## Python Indexing Notebook
[Available on GitHub](/~https://github.com/m-clark/data-processing-and-visualization/blob/master/jupyter_notebooks/indexing.ipynb)