Merge pull request #112 from getomni-ai/mark/enable-streaming

Stream OCR result by page & code restructure
getomni-ai · Dec 18, 2024 · aa3d881 · aa3d881
2 parents fab0c9f + 777da09
commit aa3d881
Show file tree

Hide file tree

Showing 14 changed files with 609 additions and 484 deletions.
diff --git a/README.md b/README.md
@@ -15,7 +15,7 @@ The general logic:
 - Pass each image to GPT and ask nicely for Markdown
 - Aggregate the responses and return Markdown
 
-Try out the hosted version here: https://getomni.ai/ocr-demo
+Try out the hosted version here: <https://getomni.ai/ocr-demo>
 
 ## Getting Started
 
@@ -76,9 +76,13 @@ const result = await zerox({
   cleanup: true, // Clear images from tmp after run.
   concurrency: 10, // Number of pages to run at a time.
   correctOrientation: true, // True by default, attempts to identify and correct page orientation.
+  errorMode: ErrorMode.IGNORE, // ErrorMode.THROW or ErrorMode.IGNORE, defaults to ErrorMode.IGNORE.
   maintainFormat: false, // Slower but helps maintain consistent formatting.
+  maxRetries: 1, // Number of retries to attempt on a failed page, defaults to 1.
   maxTesseractWorkers: -1, // Maximum number of tesseract workers. Zerox will start with a lower number and only reach maxTesseractWorkers if needed.
-  model: 'gpt-4o-mini' // Model to use (gpt-4o-mini or gpt-4o).
+  model: "gpt-4o-mini", // Model to use (gpt-4o-mini or gpt-4o).
+  onPostProcess: async ({ page, progressSummary }) => Promise<void>, // Callback function to run after each page is processed.
+  onPreProcess: async ({ imagePath, pageNumber }) => Promise<void>, // Callback function to run before each page is processed.
   outputDir: undefined, // Save combined result.md to a file.
   pagesToConvertAsImages: -1, // Page numbers to convert to image as array (e.g. `[1, 2, 3]`) or a number (e.g. `1`). Set to -1 to convert all pages.
   tempDir: "/os/tmp", // Directory to use for temporary files (default: system temp directory).
@@ -132,17 +136,23 @@ Request #3 => page_2_markdown + page_3_image
         '**Terms:**  \n' +
         'Order ID : CA-2012-AB10015140-40974  ',
       page: 1,
-      contentLength: 747
+      contentLength: 747,
+      status: 'SUCCESS',
     }
-  ]
+  ],
+  summary: {
+    failedPages: 0,
+    successfulPages: 1,
+    totalPages: 1,
+  },
 }
 ```
 
 ## Python Zerox
 
 (Python SDK - supports vision models from different providers like OpenAI, Azure OpenAI, Anthropic, AWS Bedrock etc)
 
-### Installation:
+### Installation
 
 - Install **poppler** on the system, it should be available in path variable. See the [pdf2image documentation](https://pdf2image.readthedocs.io/en/latest/installation.html) for instructions by platform.
 - Install py-zerox:
@@ -285,7 +295,7 @@ Returns
 - ZeroxOutput:
   Contains the markdown content generated by the model and also some metadata (refer below).
 
-### Example Output (Output from "azure/gpt-4o-mini"):
+### Example Output (Output from "azure/gpt-4o-mini")
 
 `Note: The output is mannually wrapped for this documentation for better readability.`
 
@@ -340,7 +350,7 @@ ZeroxOutput(
 )
 ````
 
-## Supported File Types:
+## Supported File Types
 
 We use a combination of `libreoffice` and `graphicsmagick` to do document => image conversion. For non-image / non-pdf files, we use libreoffice to convert that file to a pdf, and then to an image.
 
@@ -373,7 +383,7 @@ We use a combination of `libreoffice` and `graphicsmagick` to do document => ima
 
 ## Credits
 
-- [Litellm](https://github.com/BerriAI/litellm): https://github.com/BerriAI/litellm | This powers our python sdk to support all popular vision models from different providers.
+- [Litellm](https://github.com/BerriAI/litellm): <https://github.com/BerriAI/litellm> | This powers our python sdk to support all popular vision models from different providers.
 
 ### License