Resize: Scale down image to reduce computational overhead (I used width of 1000 px, height based on aspect ratio, INTER_AREA interpolation method to preserve image quality).
Threshold: Convert to greyscale and using OpenCv's threshold method (pixels above certain value become white(255), then invert img so text = white, background = black). This will format image for dilation.
Dilation: Extend white regions to merge nearby digits together, allowing machine to detect which digits belong together in a term. This gives meaning to the numbers. Pixels are dilated with a kernel (see below); to identify terms, a kernel with a large width is ideal in case digits are spaced out.
Contour Points: Use OpenCV's findContours() function to identify bounding edges of the terms + operators in the equation. This function will operate on the dilated image, using RETR_EXTERNAL to omit any inner contours and CHAIN_APPROX_NONE to store all points of the contour for a more accurate result. OpenCV uses Suzuki's algorithm to calculate contour points.
Bounding Rectangle: Use boundingRect() to identify smallest rectangle that encloses the contour points. This is to split image into smaller images of terms and operators.
Repeat: Repeat the process for each sub image to differentiate terms into digits, with a smaller kernel since digits are close to each other, and store values in a 3D array
Dataset: I used a dataset of 27,000 28x28 pixel images of digits 0-9, and symbols plus, minus, dot, multiply, slash for division, & variables w, y, z. The dataset is from wblachowski, licensed under the MIT License. There will be an 80:20 train:test split.
Format: Convert to grayscale. Now resize the image to have height of 28 pixels, and aspect ratio accordinly (do not stretch image by forcing both dimensions to be 28 pixels!). Then append image onto center of 28x28 black canvas so input image is 28x28 pixels. Increase intensity of pixels (reducing their grayscale value if they meet a certain threshold to darken the handwriting.
Model: Consists of 3 layers, inner layers use Relu activation, and output layer uses softmax. Image to String: Iterate through the array of images, use the trained model to predict the digit/symbol it contains, append to a string representing the equation, and use python's built in eval() function to evalute the equation.