Here is the final output of the project:
- Creates SageMaker notebook instance, IAM role, S3 bucket for output
aws cloudformation create-stack --stack-name ShipDetection --template-body file://cloudformation/infrastructure.yaml --capabilities CAPABILITY_NAMED_IAM
- Check the status until you receive "CREATE_COMPLETE"
aws cloudformation describe-stacks --stack-name ShipDetection --query 'Stacks[0].StackStatus'
- Keep track of the S3 bucket name
aws cloudformation describe-stacks --stack-name ShipDetection --query 'Stacks[0].Outputs[0].OutputValue
TODO:
- Figure out a clever way to give consumers of this project read access to the S3 bucket in my account so I don't need to toggle public access manually.
- Add this to-be-created repo to the start up of the notebook so all of the files (i.e. train_ship_segmentations_v2.csv) is already there for future users.
Short cut: Continue to Step 3
OR
Long way: Continue to DIY steps
Once the CloudFormation code sets up your SageMaker notebook and S3 bucket, you are free to continue. This step leverages my cleaned, transformed, and hosted data.
- You'll need the name of the S3 bucket you created here for SageMaker output.
- Open
notebooks/ship-object-detection.ipynb
and run through steps.
QUESTION:
- It's weird that this is test data provided by Kaggle, but does not seem to be used. I wonder if this should be combined with the training data and then split out later. Maybe I'm missing something.
The long way? Good for you! Remember, you'll only need to follow these steps if you want to start completely from scratch. I am hosting training and validation data in my S3 bucket to simplify the process. That's not your style, though, is it?
Added time: 6 hours (mostly waiting)
- Spin up an EC2 instance or perform the instructions locally
- Install python3 & pip3
- Install Kaggle API
- Provide API credentials locally
- Download the data
kaggle competitions download -c airbus-ship-detection
- Unzip training data
mkdir training && unzip -qo .zip -d training
- Unzip test data
mkdir test && unzip -qo .zip -d test
- Upload training data
aws s3 sync training s3://your-bucket-name/training/
- Upload test data
aws s3 sync test s3://your-bucket-name/test/
Expect to download and transfer about 29 GB.
Total size | Total objects | |
---|---|---|
Training: | 27.0 GB | 192556 |
Test: | 2.2 GB | 15606 |
This step shows you how to use a cited notebook to convert masks to bounding boxes.
Added time: 1-2 hours
- Open
notebooks/from-masks-to-bounding-boxes.ipynb
and run through steps.
TODO:
- Possibly move
data/train_ship_segmentations_v2.csv
from the repo to S3
This step walks you though modifying data to create annotation files in JSON format.
Added time: 1-2 hours
- Open
notebooks/data-prep.ipynb
and run through steps.
- Tear down the CloudFormation stack
aws cloudformation delete-stack --stack-name ShipDetection
- Remove the SageMaker endpoint (this will rack up your bill!)
- Make sure you only have the one endpoint
bash aws sagemaker list-endpoints
- Assuming you do, list the endpoints again and pass the first one into a delete command
bash aws sagemaker list-endpoints --query 'Endpoints[0].EndpointName' --output text | xargs -I {} aws sagemaker delete-endpoint --endpoint-name {}
- Remove the SageMaker model
- Make sure you only have the one model
aws sagemaker list-models --query 'Models[0].ModelName' --output text
- Assuming you do, list the models again and pass the first one into a delete command
aws sagemaker list-models --query 'Models[0].ModelName' --output text | xargs -I {} aws sagemaker delete-model --model-name {}
- Remove the SageMaker endpoint configs
- Make sure you only have the one config
aws sagemaker list-endpoint-configs --query 'EndpointConfigs[0].EndpointConfigName' --output text
- Assuming you do, list the configs again and pass the first one into a delete command
aws sagemaker list-endpoint-configs --query 'EndpointConfigs[0].EndpointConfigName' --output text | xargs -I {} aws sagemaker delete-endpoint-config --endpoint-config-name {}