Project Overview

This is the capstone project for the HarvardX PH526x- Using Python for Research course.

Created a tool that predicts the type of physical activity (e.g., walking, climbing stairs) from tri-axial smartphone accelerometer data with a precision of 93.74%. (ERROR ~ 0.1618 degrees).
Engineered features from time series signals on python.
Optimized (Logistic, Gradient Boosting, Random Forest, Nearest Neighbors and Decision Tree) classifiers using GridsearchCV to reach the best model and train time.
Built a client facing API using flask

Code and Resources Used

Python Version: 3.7 Packages: pandas, numpy, sklearn, matplotlib, seaborn, flask, json, pickle, siml, detecta, imblearn, pprint, collections.

Dataset

The input data used for training in this project consists of two files. The first file, train_time_series.csv, contains the raw accelerometer data, which has been collected using the Beiwe research platform, and it has the following format: timestamp, UTC time, accuracy, and measurements of linear acceleration along each of the three orthogonal axes x, y, z.

The second file, train_labels.csv, contains the activity labels numbered with integers:

1 = standing
2 = walking
3 = stairs down
4 = stairs up

Because the accelerometers are sampled at high frequency, the labels in train_labels.csv are only provided for every 10th observation in train_time_series.csv

Analysis

Check for Imbalanced class

The data isn't well balanced, having considerably more data points for walking (2) activity, followed by stairs down (3). This might be related with the lifestyle of the subjects of study, however, should be considered when builiding the model to avoid any bias towards the more dominant classes.

Variable analysis

Moving and stationnary activities behave diferently, where the magnitude of of linear acceleration of the latter remains in ranges lower than those of stationary activities.

Mean of magnitude of acceleration:

Featurization

Calculation of the magnitude of each vector from its components using the using the Pythagorean Theorem, included to dataset as column "m".
Parsing of 'UTC Time' column from string to timestamp.
Sampling of time and frequency for time series signals and labels.
Applied Frequency Transformation Functions (Fast Fourier Transform (FFT), Power Spectral Density (PSD) and Auto-correlation) to transform the signals from the time-domain to the frequency-domain and extract features from them (frequencies at which oscillations occur and their corresponding amplitudes). Following along with Ahmet Taspinar article referenced above.

Resampling for class imbalance

Considering that the most dominant class contains over 50% of the total data, Imbalanced-learn's Oversample Adaptive Synthetic (ADASYN) algorith was used.

This algorithm generates different number of samples depending on an estimate of the local distribution of the class to be oversampled.

Original dataset shape Counter({2: 213, 3: 88, 4: 47, 1: 27})
Resampled dataset shape Counter({3: 231, 2: 213, 1: 212, 4: 209})

Models

Split the train data into train and validation sets with a validation size of 20%
Scikit-learn is used for all the 5 algorithms listed below to select the best one for hyperparameter tuning, taking into condideration the train time:

Hyperparameters of Random Forest model are tuned by grid search CV, even though it was the second best performer, it was able to achieve a high accuracy on the validation data in considerably less time than the Gradient Boost algorithm.

Productionization

Building of Flask API endpoint that was hosted on a local webserver by following along with the TDS and Ken Jee's tutorials, referenced in the section above. The API endpoint takes in a request with a vector with frequency and amplitude features (taken from the components x,y,z and magnitude of measurements of linear acceleration) and predicts the type of physical activity (e.g., walking, climbing stairs).

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.ipynb_checkpoints		.ipynb_checkpoints
FlaskAPI		FlaskAPI
Images		Images
__pycache__		__pycache__
EDA.ipynb		EDA.ipynb
Prediction_model.ipynb		Prediction_model.ipynb
README.md		README.md
test_labels.csv		test_labels.csv
test_time_series.csv		test_time_series.csv
train_labels.csv		train_labels.csv
train_time_series.csv		train_time_series.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Overview

Code and Resources Used

Dataset

Analysis

Check for Imbalanced class

Variable analysis

Featurization

Resampling for class imbalance

Models

Productionization

About

Releases

Packages

Languages

vanessadlafp/Human_Activity_Recognition

Folders and files

Latest commit

History

Repository files navigation

Project Overview

Code and Resources Used

Dataset

Analysis

Check for Imbalanced class

Variable analysis

Featurization

Resampling for class imbalance

Models

Productionization

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages