This document has instructions for running ResNet50 v1.5 FP32 training using Intel-optimized TensorFlow.
Note that the ImageNet dataset is used in these ResNet50 v1.5 examples. Download and preprocess the ImageNet dataset using the instructions here. After running the conversion script you should have a directory with the ImageNet dataset in the TF records format.
Set the DATASET_DIR
to point to this directory when running ResNet50 v1.5.
Script name | Description |
---|---|
fp32_training_demo.sh |
Executes a short run using small batch sizes and a limited number of steps to demonstrate the training flow |
fp32_training_1_epoch.sh |
Executes a test run that trains the model for 1 epoch and saves checkpoint files to an output directory. |
fp32_training_full.sh |
Trains the model using the full dataset and runs until convergence (90 epochs) and saves checkpoint files to an output directory. Note that this will take a considerable amount of time. |
multi_instance_training_demo.sh |
Uses mpirun to execute 2 processes with 1 process per socket with a batch size of 256 for 50 steps. |
multi_instance_training.sh |
Uses mpirun to execute 2 processes with 1 process per socket with a batch size of 256. Checkpoint files and logs for each instance are saved to the output directory. Note that this will take a considerable amount of time. |
Setup your environment using the instructions below, depending on if you are using AI Kit:
Setup using AI Kit | Setup without AI Kit |
---|---|
To run using AI Kit you will need:
|
To run without AI Kit you will need:
|
After finishing the setup above, set environment variables for the path to your
DATASET_DIR
for ImageNet and an OUTPUT_DIR
where log files and checkpoints will be written.
Navigate to your model zoo directory and then run a quickstart script.
# cd to your model zoo directory
cd models
export DATASET_DIR=<path to the ImageNet TF records>
export OUTPUT_DIR=<directory where log files and checkpoints will be written>
./quickstart/image_recognition/tensorflow/resnet50v1_5/training/cpu/fp32/<script name>.sh
- To run more advanced use cases, see the instructions here
for calling the
launch_benchmark.py
script directly. - To run the model using docker, please see the oneContainer
workload container:
https://software.intel.com/content/www/us/en/develop/articles/containers/resnet50v1-5-fp32-training-tensorflow-container.html.