Extract Text from a PDF page (PDFContentExtractText)

When text needs to be copied directly from a PDF page, ImageGear can extract text from a PDF page's content.

Text is extracted from a range of PDF pages using the ExtractText function. You can specify that the text be extracted in left to right, top to bottom order, or in the order of appearance within the PDF file.

For more information about the ImageGear .NET API, please refer to the ImageGear .NET Online Documentation.

System Requirements

For a list of the system and development software necessary to build and run these samples, please refer to the ImageGear .NET Online Documentation.

Building the Sample

Starting with ImageGear v26.0, ImageGear supports .NET Core.

All samples can be built using Microsoft Visual Studio 2022.

To build this sample:

Open the .sln file in the project directory using Visual Studio 2022.
Select a Solution Configuration (Debug or Release) and an available Solution Platform (x64 or Any CPU).
Build with Build Solution located in the Build menu.

To build this sample under Linux:

Install the Microsoft .NET SDK for your Linux distribution.
Run dotnet build PDFContentExtractText.sln. By default this will build the Debug Solution Configuration.

Running the Sample

When the sample is built, it produces a console application executable in the bin subdirectory. Run this application by double-clicking the application icon, or run it directly from Command Prompt (cmd.exe), PowerShell, or similar. Note that the working directory must be the same as the directory containing the sample executable in order to find the sample input image and the output directory. The input image(s) and output directory are specified relative to the location of the application in all of these samples.

To run this sample under Linux, run the sample from "bin/Debug/net6.0/" or "bin/x64/Debug/net6.0/" (depending on the solution platform) using ./PDFContentExtractText.

NOTE: ImageGear .NET runs in evaluation mode if started without a license. In evaluation mode, documents and images will be watermarked when exported or displayed. If you would like to work with a full-featured evaluation of the product, please contact Accusoft at info@accusoft.com.

PDF Support in ImageGear .NET

ImageGear .NET is a robust, multi-platform, multi-language PDF solution. For more information on PDF support in ImageGear .NET, please visit us at Accusoft.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Extract Text from a PDF page (PDFContentExtractText)

System Requirements

Building the Sample

Running the Sample

PDF Support in ImageGear .NET

Files

README.md

Latest commit

History

README.md

File metadata and controls

Extract Text from a PDF page (PDFContentExtractText)

System Requirements

Building the Sample

Running the Sample

PDF Support in ImageGear .NET