Extracting Text from OCR Old Archives

Extracting Text from OCR Old Archives

In this straightforward manual, I will guide you through the process of extracting text from OCR old archives, specifically from old scanned newspapers from the past two centuries. These archived news pages are stored as images in PNG format.

The goal is to learn how to retrieve textual content from old archived newspapers.

Learning Objectives

Essential Tools and Libraries
You will learn about the essential packages and libraries for handling OCR, images, and text extraction.

Setting Up Your Environment
Discover how to download and set up other required libraries and packages if you are running your code from an online cloud-based platform such as Google Colab. Follow the exact steps to replicate this setup on your end.

Single Image Text Extraction
Learn how to extract text from a single image in just two simple, direct steps.

Multiple Images Text Extraction
Explore two methods for extracting text from multiple images: one simple method, which may not be recommended depending on your needs, and another more efficient method that allows for the extraction of text into a dataframe. This step is ideal for further preprocessing procedures before running any advanced textual analysis.

If you have any inquiries or suggestions for alternative methods, please share them with me in the discussion section.

Thanks,

Mohamed Salama

Reach out at [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
0.Key Libraries Used in This Tutorial		0.Key Libraries Used in This Tutorial
1.ExtractingJustTextFromOneImageOnly		1.ExtractingJustTextFromOneImageOnly
2.ExtractingTextFromMultipleImagesMethodOne		2.ExtractingTextFromMultipleImagesMethodOne
3.ExtractingTextFromMultipleImagesMethodTwo		3.ExtractingTextFromMultipleImagesMethodTwo
README.md		README.md
Salama's_Extraction_Script.ipynb		Salama's_Extraction_Script.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Extracting Text from OCR Old Archives

Learning Objectives

About

Releases

Packages

Languages

Msalamaumd/Extracting_Text_OCR_Archive

Folders and files

Latest commit

History

Repository files navigation

Extracting Text from OCR Old Archives

Learning Objectives

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages