Skip to content

Commit

Permalink
Merge pull request #112 from getomni-ai/mark/enable-streaming
Browse files Browse the repository at this point in the history
Stream OCR result by page & code restructure
  • Loading branch information
annapo23 authored Dec 18, 2024
2 parents fab0c9f + 777da09 commit aa3d881
Show file tree
Hide file tree
Showing 14 changed files with 609 additions and 484 deletions.
26 changes: 18 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ The general logic:
- Pass each image to GPT and ask nicely for Markdown
- Aggregate the responses and return Markdown

Try out the hosted version here: https://getomni.ai/ocr-demo
Try out the hosted version here: <https://getomni.ai/ocr-demo>

## Getting Started

Expand Down Expand Up @@ -76,9 +76,13 @@ const result = await zerox({
cleanup: true, // Clear images from tmp after run.
concurrency: 10, // Number of pages to run at a time.
correctOrientation: true, // True by default, attempts to identify and correct page orientation.
errorMode: ErrorMode.IGNORE, // ErrorMode.THROW or ErrorMode.IGNORE, defaults to ErrorMode.IGNORE.
maintainFormat: false, // Slower but helps maintain consistent formatting.
maxRetries: 1, // Number of retries to attempt on a failed page, defaults to 1.
maxTesseractWorkers: -1, // Maximum number of tesseract workers. Zerox will start with a lower number and only reach maxTesseractWorkers if needed.
model: 'gpt-4o-mini' // Model to use (gpt-4o-mini or gpt-4o).
model: "gpt-4o-mini", // Model to use (gpt-4o-mini or gpt-4o).
onPostProcess: async ({ page, progressSummary }) => Promise<void>, // Callback function to run after each page is processed.
onPreProcess: async ({ imagePath, pageNumber }) => Promise<void>, // Callback function to run before each page is processed.
outputDir: undefined, // Save combined result.md to a file.
pagesToConvertAsImages: -1, // Page numbers to convert to image as array (e.g. `[1, 2, 3]`) or a number (e.g. `1`). Set to -1 to convert all pages.
tempDir: "/os/tmp", // Directory to use for temporary files (default: system temp directory).
Expand Down Expand Up @@ -132,17 +136,23 @@ Request #3 => page_2_markdown + page_3_image
'**Terms:** \n' +
'Order ID : CA-2012-AB10015140-40974 ',
page: 1,
contentLength: 747
contentLength: 747,
status: 'SUCCESS',
}
]
],
summary: {
failedPages: 0,
successfulPages: 1,
totalPages: 1,
},
}
```

## Python Zerox

(Python SDK - supports vision models from different providers like OpenAI, Azure OpenAI, Anthropic, AWS Bedrock etc)

### Installation:
### Installation

- Install **poppler** on the system, it should be available in path variable. See the [pdf2image documentation](https://pdf2image.readthedocs.io/en/latest/installation.html) for instructions by platform.
- Install py-zerox:
Expand Down Expand Up @@ -285,7 +295,7 @@ Returns
- ZeroxOutput:
Contains the markdown content generated by the model and also some metadata (refer below).

### Example Output (Output from "azure/gpt-4o-mini"):
### Example Output (Output from "azure/gpt-4o-mini")

`Note: The output is mannually wrapped for this documentation for better readability.`

Expand Down Expand Up @@ -340,7 +350,7 @@ ZeroxOutput(
)
````

## Supported File Types:
## Supported File Types

We use a combination of `libreoffice` and `graphicsmagick` to do document => image conversion. For non-image / non-pdf files, we use libreoffice to convert that file to a pdf, and then to an image.

Expand Down Expand Up @@ -373,7 +383,7 @@ We use a combination of `libreoffice` and `graphicsmagick` to do document => ima

## Credits

- [Litellm](https://github.com/BerriAI/litellm): https://github.com/BerriAI/litellm | This powers our python sdk to support all popular vision models from different providers.
- [Litellm](https://github.com/BerriAI/litellm): <https://github.com/BerriAI/litellm> | This powers our python sdk to support all popular vision models from different providers.

### License

Expand Down
Loading

0 comments on commit aa3d881

Please sign in to comment.