Skip to content

Latest commit

 

History

History
31 lines (21 loc) · 2.46 KB

README.md

File metadata and controls

31 lines (21 loc) · 2.46 KB

Continuous-Arts Example

This is an example to demonstrate the use of a Continuous Integration toolkit based on a machine learning program, as described in James Coupe's article A Continuous Integration Toolkit for Artists.

This is a Python program which is to be run using the links provided in the images.json. This uses Amazon Textract to perform OCR on the downloaded images and outputs the recognized text in the console.

This is primarily intended to be run in a CI pipeline, but can also be executed locally to check and verify the results.

Running locally

A Linux or MacOS is assumed. See this to setup a python virtual env in windows.

  • Make sure Python 3.7+ is installed
  • Create a directory ~/.virtualenvs
  • Create a virtual environment called arts for example. Run python3 -m venv ~/.virtualenvs/arts to create it
  • Switch to it by running source ~/.virtualenvs/arts/bin/activate
  • Set 2 environment variables to be able to use AWS Textract: (see this to make a key pair)
    • AWS_ACCESS_ID: The ID of the access key
    • AWS_SECRET_KEY: The secret of the key
  • Run pip install -r requirements.txt to install the dependencies
  • Run python3 main.py to obtain the output in the console
  • Customize the urls in the images.json to affect the images being downloaded

Seeing the CI Runs

This project uses GitHub Actions as the CI and it uses pre-configured AWS credentials as actions secrets to connect to AWS and perform the OCR.

The status of the last pipeline runs can be viewed here. You can expand the pipeline stage called extract text and view the outcome of the OCR.

To execute the runs on your own, you can fork this repo and you would get your own copy of the pipeline and can set your own AWS variables and should be triggered via pushing more commit to your repo.