| title | author | date | output |
|---|---|---|---|
CodeBook.md |
Anuj Parashar |
final_summarised_dataset.txt |
The purpose of this project is to demonstrate your ability to collect, work with, and clean a data set. The goal is to prepare tidy data that can be used for later analysis.
##Study design and data processing
###Collection of the raw data The original data is already available at the following location: https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip
###Notes on the original (raw) data For a detailed understanding of how the data is captured and organized, please go through the "README.txt" available with the data (above zip file).
##Creating the tidy datafile
###Guide to create the tidy data file
- Download the above zip file and extract its content.
- Open the run_analysis.R file and modify the working directory path with the path where you have extracted the above zip file.
- Source the r script, it would create a output file "final_summarised_dataset.txt" in the working directory set in previous step.
###Cleaning of the data Here are the five steps involved in data cleaning:
- Merges the training and the test sets to create one data set.
- Extracts only the measurements on the mean and standard deviation for each measurement.
- Uses descriptive activity names to name the activities in the data set
- Appropriately labels the data set with descriptive variable names.
- From the data set in step 4, creates a second, independent tidy data set with the average of each variable for each activity and each subject.
For a more detailed description of how this has been acheived in the script, please refer to the readme file at the following location: https://github.com/promisinganuj/getting-and-cleaning-data/blob/master/README.md
##Description of the variables in the tiny_data.txt file ###Dimensions of the dataset: 'data.frame': 180 obs. of 68 variables
###Summary of the data: As mentioned in the step 5 above, this dataset is the average of each variable (feature) for each activity (6 in total) and each subject (30 in total). So, in total there are 30 * 6 observations in this dataset. Also, there are 66 recorded variables for each activity and subject, thats 66 + 2 = 68 columns.
###Variables: $ activityId : int 1 2 3 4 5 6 1 2 3 4 ...
$ subjectId : int 1 1 1 1 1 1 2 2 2 2 ...
$ timeBodyAccMeanX : num 0.277 0.255 0.289 0.261 0.279 ...
$ timeBodyAccMeanY : num -0.01738 -0.02395 -0.00992 -0.00131 -0.01614 ...
$ timeBodyAccMeanZ : num -0.1111 -0.0973 -0.1076 -0.1045 -0.1106 ...
$ timeBodyAccStdDevX : num -0.284 -0.355 0.03 -0.977 -0.996 ...
$ timeBodyAccStdDevY : num 0.11446 -0.00232 -0.03194 -0.92262 -0.97319 ...
$ timeBodyAccStdDevZ : num -0.26 -0.0195 -0.2304 -0.9396 -0.9798 ...
$ timeGravityAccMeanX : num 0.935 0.893 0.932 0.832 0.943 ...
$ timeGravityAccMeanY : num -0.282 -0.362 -0.267 0.204 -0.273 ...
$ timeGravityAccMeanZ : num -0.0681 -0.0754 -0.0621 0.332 0.0135 ...
$ timeGravityAccStdDevX : num -0.977 -0.956 -0.951 -0.968 -0.994 ...
$ timeGravityAccStdDevY : num -0.971 -0.953 -0.937 -0.936 -0.981 ...
$ timeGravityAccStdDevZ : num -0.948 -0.912 -0.896 -0.949 -0.976 ...
$ timeBodyAccJerkMeanX : num 0.074 0.1014 0.0542 0.0775 0.0754 ...
$ timeBodyAccJerkMeanY : num 0.028272 0.019486 0.02965 -0.000619 0.007976 ...
$ timeBodyAccJerkMeanZ : num -0.00417 -0.04556 -0.01097 -0.00337 -0.00369 ...
$ timeBodyAccJerkStdDevX : num -0.1136 -0.4468 -0.0123 -0.9864 -0.9946 ...
$ timeBodyAccJerkStdDevY : num 0.067 -0.378 -0.102 -0.981 -0.986 ...
$ timeBodyAccJerkStdDevZ : num -0.503 -0.707 -0.346 -0.988 -0.992 ...
$ timeBodyGyroMeanX : num -0.0418 0.0505 -0.0351 -0.0454 -0.024 ...
$ timeBodyGyroMeanY : num -0.0695 -0.1662 -0.0909 -0.0919 -0.0594 ...
$ timeBodyGyroMeanZ : num 0.0849 0.0584 0.0901 0.0629 0.0748 ...
$ timeBodyGyroStdDevX : num -0.474 -0.545 -0.458 -0.977 -0.987 ...
$ timeBodyGyroStdDevY : num -0.05461 0.00411 -0.12635 -0.96647 -0.98773 ...
$ timeBodyGyroStdDevZ : num -0.344 -0.507 -0.125 -0.941 -0.981 ...
$ timeBodyGyroJerkMeanX : num -0.09 -0.1222 -0.074 -0.0937 -0.0996 ...
$ timeBodyGyroJerkMeanY : num -0.0398 -0.0421 -0.044 -0.0402 -0.0441 ...
$ timeBodyGyroJerkMeanZ : num -0.0461 -0.0407 -0.027 -0.0467 -0.049 ...
$ timeBodyGyroJerkStdDevX : num -0.207 -0.615 -0.487 -0.992 -0.993 ...
$ timeBodyGyroJerkStdDevY : num -0.304 -0.602 -0.239 -0.99 -0.995 ...
$ timeBodyGyroJerkStdDevZ : num -0.404 -0.606 -0.269 -0.988 -0.992 ...
$ timeBodyAccMagMean : num -0.137 -0.1299 0.0272 -0.9485 -0.9843 ...
$ timeBodyAccMagStdDev : num -0.2197 -0.325 0.0199 -0.9271 -0.9819 ...
$ timeGravityAccMagMean : num -0.137 -0.1299 0.0272 -0.9485 -0.9843 ...
$ timeGravityAccMagStdDev : num -0.2197 -0.325 0.0199 -0.9271 -0.9819 ...
$ timeBodyAccJerkMagMean : num -0.1414 -0.4665 -0.0894 -0.9874 -0.9924 ...
$ timeBodyAccJerkMagStdDev : num -0.0745 -0.479 -0.0258 -0.9841 -0.9931 ...
$ timeBodyGyroMagMean : num -0.161 -0.1267 -0.0757 -0.9309 -0.9765 ...
$ timeBodyGyroMagStdDev : num -0.187 -0.149 -0.226 -0.935 -0.979 ...
$ timeBodyGyroJerkMagMean : num -0.299 -0.595 -0.295 -0.992 -0.995 ...
$ timeBodyGyroJerkMagStdDev : num -0.325 -0.649 -0.307 -0.988 -0.995 ...
$ freqBodyAccMeanX : num -0.2028 -0.4043 0.0382 -0.9796 -0.9952 ...
$ freqBodyAccMeanY : num 0.08971 -0.19098 0.00155 -0.94408 -0.97707 ...
$ freqBodyAccMeanZ : num -0.332 -0.433 -0.226 -0.959 -0.985 ...
$ freqBodyAccStdDevX : num -0.3191 -0.3374 0.0243 -0.9764 -0.996 ...
$ freqBodyAccStdDevY : num 0.056 0.0218 -0.113 -0.9173 -0.9723 ...
$ freqBodyAccStdDevZ : num -0.28 0.086 -0.298 -0.934 -0.978 ...
$ freqBodyAccJerkMeanX : num -0.1705 -0.4799 -0.0277 -0.9866 -0.9946 ...
$ freqBodyAccJerkMeanY : num -0.0352 -0.4134 -0.1287 -0.9816 -0.9854 ...
$ freqBodyAccJerkMeanZ : num -0.469 -0.685 -0.288 -0.986 -0.991 ...
$ freqBodyAccJerkStdDevX : num -0.1336 -0.4619 -0.0863 -0.9875 -0.9951 ...
$ freqBodyAccJerkStdDevY : num 0.107 -0.382 -0.135 -0.983 -0.987 ...
$ freqBodyAccJerkStdDevZ : num -0.535 -0.726 -0.402 -0.988 -0.992 ...
$ freqBodyGyroMeanX : num -0.339 -0.493 -0.352 -0.976 -0.986 ...
$ freqBodyGyroMeanY : num -0.1031 -0.3195 -0.0557 -0.9758 -0.989 ...
$ freqBodyGyroMeanZ : num -0.2559 -0.4536 -0.0319 -0.9513 -0.9808 ...
$ freqBodyGyroStdDevX : num -0.517 -0.566 -0.495 -0.978 -0.987 ...
$ freqBodyGyroStdDevY : num -0.0335 0.1515 -0.1814 -0.9623 -0.9871 ...
$ freqBodyGyroStdDevZ : num -0.437 -0.572 -0.238 -0.944 -0.982 ...
$ freqBodyAccMagMean : num -0.1286 -0.3524 0.0966 -0.9478 -0.9854 ...
$ freqBodyAccMagStdDev : num -0.398 -0.416 -0.187 -0.928 -0.982 ...
$ freqBodyBodyAccJerkMagMean : num -0.0571 -0.4427 0.0262 -0.9853 -0.9925 ...
$ freqBodyBodyAccJerkMagStdDev : num -0.103 -0.533 -0.104 -0.982 -0.993 ...
$ freqBodyBodyGyroMagMean : num -0.199 -0.326 -0.186 -0.958 -0.985 ...
$ freqBodyBodyGyroMagStdDev : num -0.321 -0.183 -0.398 -0.932 -0.978 ...
$ freqBodyBodyGyroJerkMagMean : num -0.319 -0.635 -0.282 -0.99 -0.995 ...
$ freqBodyBodyGyroJerkMagStdDev: num -0.382 -0.694 -0.392 -0.987 -0.995 ...
###activityId
- The activity performed by the subjects.
- Six Unique values: 1 WALKING, 2 WALKING_UPSTAIRS, 3 WALKING_DOWNSTAIRS, 4 SITTING, 5 STANDING, 6 LAYING
- Unit: NA
###subjectId
- The id of the volunteers who performed these activities.
- Thirty Unique values: 1 - 30
- Unit: NA
###Rest of the variable are average values for each subject and each activity.
-
Each variable is abbreviated using the following meanings:
time: Time domain signal
freq: Frequence domain signal
BodyAcc: Body acceleration signals
GravityAcc: Gravity acceleration signals
Gyro: Gyroscope reading
Jerk: Jerk Signal
Mag: Magnitude
Mean: Mean value
StdDev: Standard Deviation