VISTA: Vision Transformer enhanced by U-Net and Image Colorfulness Frame Filtration for Automatic Retail Checkout
Code for our CVPR 2022 Workshop paper VISTA: Vision Transformer enhanced by U-Net and Image Colorfulness Frame Filtration for Automatic Retail Checkout. The method described achieves 3rd place in the AI City Challenge 2022 Track 4: Multi-Class Product Counting & Recognition for Automated Retail Checkout. See here for details.
[arXiv]
Figure 1. Illustration of the overall segmentation and classification pipeline
This code requires Python 3.8.12 and PyTorch 1.8.2. Run pip install -r requirements.txt
to install all the dependencies.
See training/segmentation
for details.
See training/classification
for details.
After steps 2a and 2b, make sure both segmentation and classification models are present in the test/models
directory. Then see README.md
for details.
will be added here.
We thank AICITY 22 organizers for making data available for use. We also thank Giga Tech Ltd. for providing funding for this work.