SOFTENG 751 - Parallel Machine Learning (Group 9)

Description

This project attempts to assess the performance of several hyperparameter tuning algorithms:

Grid Search
Random Search
Successive Halving Algorithm (SHA)
Asynchronous Successive Halving Algorithm (ASHA)

Grid and random search have been implemented using the scikit-learn machine learning package. While SHA and ASHA have been implemented from scratch, using this research paper. All implementations run in parallel by default, with grid search, random search and SHA using process-based parallelism and ASHA using asynchronous function calls to an AWS lambda.

The dataset used can be found here.

Results obtained from the scripts in the benchmarking/ folder can be found in results.xlsx. This spreadsheet compares the performance of all four tuning algorithms, and also contains data specific to ASHA and the effects of its various input parameters.

Setup

Install Python 3
Clone this repository
Run pip install -r requirements.txt
Note
- If you want to run the ASHA algorithm, you will need to modify tuning/asha.py to access your own AWS Lambda. The code for this lambda function can be found in lambda/run_xgboost.py. You will also need to place your AWS credentials in ~/.aws/credentials, and set a default region in ~/.aws/config.
- Additionally, you will need to include a copy of your training data in the lambda/ folder when uploading to AWS.
- Refer to here for more details about the credential and configuration files.

Usage

python main.py -a [algorithm] -p [parameters]

Algorithms and Parameters

Grid Search

-a grid

Parameters

{
    "n_workers": number of configurations to evaluate in parallel (-1 => use all available cores),
    "max_samples": max. number of samples in each hyperparameter list (increases total combinations),
    "cv": number of folds for cross-validation
}

Example

python main.py -a grid -p '{"n_workers": -1, "max_samples": 4, "cv": 3}'

Random Search

-a random

Parameters

{
    "n_workers": number of configurations to evaluate in parallel (-1 => use all available cores),
    "n_iter": number of random configurations (iterations) to evaluate,
    "cv": number of folds for cross-validation
}

Example

python main.py -a random -p '{"n_workers": -1, "n_iter": 10, "cv": 3}'

Successive Halving Algorithm (SHA)

-a sha

Parameters

{
    "n_workers": number of configurations to evaluate in parallel (-1 => use all available cores),
    "n_configs": number of configurations to evaluate,
    "min_r": minimum resources (boosting rounds) given to each configuration,
    "max_r": maximum resources (boosting rounds) given to each configuration,
    "reduction_factor": amount of configurations to be dropped per iteration (2 = reduce by half)
    "cv": number of folds for cross-validation
}

Example

python main.py -a sha -p '{"n_workers": -1, "n_configs": 32, "min_r": 1, "max_r": 32, "reduction_factor": 2, "cv": 3}'

Asynchronous Successive Halving Algorithm (ASHA)

-a asha

Parameters

{
    "n_workers": number of workers allocated to evaluate the configurations in parallel,
    "min_r": minimum resources (boosting rounds) given to each configuration,
    "max_r": maximum resources (boosting rounds) given to each configuration,
    "reduction_factor": amount of configurations to be dropped per iteration (2 = reduce by half),
    "early_stopping_rounds": number of rounds in which the error must decrease
    "cv": number of folds for cross-validation
}

Example

python main.py -a asha -p '{"n_workers": 100, "min_r": 1, "max_r": 64, "reduction_factor": 4, "cv": 3}'

Note

If running the example commands in windows command line, you will need to escape the double quotes, i.e.

python main.py -a random -p '{"n_iter": 10, "cv": 3}'

Becomes

python main.py -a random -p '{\"n_iter\": 10, \"cv\": 3}'

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.idea		.idea
benchmarking		benchmarking
data		data
lambda		lambda
tuning		tuning
.gitignore		.gitignore
751 Report - Group 9.pdf		751 Report - Group 9.pdf
README.md		README.md
main.py		main.py
preprocess_data.py		preprocess_data.py
presentation_slides.pdf		presentation_slides.pdf
requirements.txt		requirements.txt
results.xlsx		results.xlsx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SOFTENG 751 - Parallel Machine Learning (Group 9)

Description

Setup

Usage

Algorithms and Parameters

Grid Search

Parameters

Example

Random Search

Parameters

Example

Successive Halving Algorithm (SHA)

Parameters

Example

Asynchronous Successive Halving Algorithm (ASHA)

Parameters

Example

Note

About

Releases

Packages

Contributors 3

Languages

AbhinavBehal/serverless-parameter-tuning

Folders and files

Latest commit

History

Repository files navigation

SOFTENG 751 - Parallel Machine Learning (Group 9)

Description

Setup

Usage

Algorithms and Parameters

Grid Search

Parameters

Example

Random Search

Parameters

Example

Successive Halving Algorithm (SHA)

Parameters

Example

Asynchronous Successive Halving Algorithm (ASHA)

Parameters

Example

Note

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages