The run_analysis.R
script is used to prepare the data as defined in the course project.
The file downloaded represent data collected from the accelerometers from the Samsung Galaxy S smartphone, the data was saved in a file called projectfiles_UCI_HAR_Dataset
, and extracted under the folder called UCI HAR Dataset
.
For each record it is provided:
- Triaxial acceleration from the accelerometer (total acceleration) and the estimated body acceleration.
- Triaxial Angular velocity from the gyroscope.
- A 561-feature vector with time and frequency domain variables.
- Its activity label.
- An identifier of the subject who carried out the experiment.
The features selected for this database come from the accelerometer and gyroscope 3-axial raw signals tAcc-XYZ and tGyro-XYZ.
features
: List of all features.activity_labels
: Links the class labels with their activity name.
The following data are available for the train and test data (subject_test
, X_test
and y_test
). Their descriptions are equivalent.
subject_train
: Each row identifies the subject who performed the activity for each window sample.X_train
: Training set, with features names included as a column names.y_train
: Contains training labels of activities code labels.
X_data
: Is created by mergingX_train
andX_test
.y_data
: Is created by mergingy_train
andy_test
.subject_data
: Is created by mergingsubject_train
andsubject_test
.
Note: Merging is done by row binding.
The means and stds calculated for each measure are selected.
X_mean_std
: Is created by selecting the measurements on the mean and std (standard deviation) for each measurement.
In y_data
the activity code labels were replaced by their respective activity name, labels taken from the activity_labels
variable.
In the data set X_mean_std
measures were renamed by more descriptive and complete names, in the code using regular expressions, the abbreviations
for the measure names were found and replaced by their full names (e.g. Acc
to Accelerometer
).
From the data set in step 4, creates a second, independent tidy data set with the average of each variable for each activity and each subject.
all_data
: contains the entire data set and is created by mergingX_mean_std
,y_data
andsubject_data
usingbind_cols()
functiontidy_data
: is created by grouping the data setall_data
by activity and subject, then summarizingall_data
taking the means of each variable for each activity and each subject.
The final dataset (tidy_data
) is exported into tidy_data.csv
file