Skip to content

Msalamaumd/Extracting_Text_OCR_Archive

Repository files navigation

Extracting Text from OCR Old Archives

#f03c15 Extracting Text from OCR Old Archives

In this straightforward manual, I will guide you through the process of extracting text from OCR old archives, specifically from old scanned newspapers from the past two centuries. These archived news pages are stored as images in PNG format.

The goal is to learn how to retrieve textual content from old archived newspapers.

Learning Objectives

#FFD700 Essential Tools and Libraries
You will learn about the essential packages and libraries for handling OCR, images, and text extraction.

#FFD700 Setting Up Your Environment
Discover how to download and set up other required libraries and packages if you are running your code from an online cloud-based platform such as Google Colab. Follow the exact steps to replicate this setup on your end.

#FFD700 Single Image Text Extraction
Learn how to extract text from a single image in just two simple, direct steps.

#FFD700 Multiple Images Text Extraction
Explore two methods for extracting text from multiple images: one simple method, which may not be recommended depending on your needs, and another more efficient method that allows for the extraction of text into a dataframe. This step is ideal for further preprocessing procedures before running any advanced textual analysis.

If you have any inquiries or suggestions for alternative methods, please share them with me in the discussion section.

Thanks,

Mohamed Salama

Reach out at [email protected]

About

Extrcating Text from OCR Old Archives

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published