The general setup for the problem is a common one: we have a single table of sensor observations over time. Now that collecting information is easier than ever, most industries have already generated time-series type problems by the way that they store data. As such, it is crucial to be able to handle data in this form. Thankfully, built-in functionality from Featuretools handles time varying data well.
We'll demonstrate an end-to-end workflow using a Turbofan Engine Degradation Simulation Data Set from NASA. This notebook demonstrates a rapid way to predict the Remaining Useful Life (RUL) of an engine using an initial dataframe of time-series data. There are three sections of the notebook:
- Understand the Data
- Generate features
- Make predictions with Machine Learning
If you're running this notebook yourself, note that the Challenge Dataset will be downloaded into the data folder in this repository. If you'd prefer to download the data yourself, download and unzip the file from https://ti.arc.nasa.gov/c/13/.
- Quickly make end-to-end workflow using time-series data
- Find interesting automatically generated features
- An advanced notebook using custom primitives and hyperparameter tuning
The main notebook can be found here. To run that notebook, you will need to download Featuretools with
pip install featuretools
and the turbofan data from NASA. The function load_data
in utils.py takes the path to the text file and returns a pandas dataframe. With the notebook as written, we expect the path to the train
data first.
Featuretools was created by the developers at Feature Labs. If building impactful data science pipelines is important to you or your business, please get in touch.