Sunbelt Computer Software

Introduction

This is the ReadMe file for Coursera's Getting and Cleaning Data Course Project. It describes the steps taken in the R script run_analysis.R (in this GitHub repository) to process the given dataset and fulfill the project's requirements.

The data for the project was downloaded from:

Coursera Repository

The projet requirement was: create one R script called run_analysis.R that does the following:

Merges the training and the test sets to create one data set.
Extracts only the measurements on the mean and standard deviation for each measurement.
Uses descriptive activity names to name the activities in the data set.
Appropriately labels the data set with descriptive variable names.
Creates a second, independent tidy data set with the average of each variable for each activity and each subject.

Files used

README.txt: General information about the data and files - MUST read!!!
features.txt: contains the variable names for the training and test sets.
features_info.txt: explains the variables and its names' construction.
activity_labels.txt: descriptive names for variables in y_test.txt and y_train.txt
subject_test.txt: subject identification for each row of X_test.txt
X_test.txt: variables data for test subjects.
y_test.txt: activity identification number for each row of X_test.txt
subject_train.txt: subject identification for each row of X_train.txt
X_train.txt: variables data for train subjects.
y_train.txt: activity identification number for each row of X_train.txt

Files not used

The choice was made to not process any files on the "Inertial Signals" folder. This is due to the fact that after careful examination of the Inertial data, none of them contained any variable on mean or standard deviation of measurements. The data would be processed only to be discarded when subsetting the merged dataset.

Steps Taken

Download the dataset in an appropriate folder and unzip its contents.
Explore the files and folders structure to find relevant information files.
Study "README.txt", "activity_labels.txt", "features_info.txt" and "features.txt"
read "features.txt" and process its data in a char vector to be used as columns names for the test and train datasets.
read and process "activity_labels.txt" to obtain a vector with the desciptive names for the activity values in "y_test.txt" and "y_train.txt".
process test dataset:

read "y_test.txt" and change activity numbers for descriptive activiy names as a factor variable.
read "subject_test.txt" and store id information of participants in test dataset as a factor variable.
read test set variables data in "x_test.txt" creating the "testset" dataframe.
add column names in "features" vector to 'testset' and bind subject info and activity info to test data set 'testset'.
test dataset 'testset' is ready to be merged.

process train dataset:

read "y_train.txt" and change activity numbers for descriptive activiy names as a factor variable.
read "subject_train.txt" and store id information of participants in train dataset as a factor variable.
read train set variables data in "x_train.txt" creating the "trainset" dataframe.
add column names in "features" vector to 'trainset' and bind subject info and activity info to train data set 'trainset'.
train dataset 'trainset' is ready to be merged.

Project Course requirements section

1. rbind 'testset' and 'trainset' datasets to obtain merged dataset 'train_test_data'.
1. extract only the measurements on the mean and standard deviation for each measurement by grepping "mean()" and "std()" in var names.
1. already substituted prevoiusly the activity id by descriptive acitivity names.
1. relabeld the variables in a simple manner due to my opinion that their original names were quite descriptive and further processing would make variable names unconfortably long. Just took off "()" out of variable names and replaced "-" by "." to enhance compatibility with R syntax.
5 created a second, independent tidy data set 'avg_data' with the average of each variable for each activity and each subject by agregating the rows data with function 'mean()' with respect do activity name and subject id.
1. ordered resulting 'avg_data' dataset by activity and subject_id in numeric order and generated output file 'tinydata.txt' in 'data' folder.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Codebook.md		Codebook.md
README.md		README.md
run_analysis.R		run_analysis.R

Sunbelt Computer Software

PL/B Language Development and Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Files used

Files not used

Steps Taken

Project Course requirements section

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Sunbelt Computer Software

PL/B Language Development and Support

Folders and files

Latest commit

History

Repository files navigation

Introduction

Files used

Files not used

Steps Taken

Project Course requirements section

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages