diff --git a/docs/_static/gifs/auto-detect.gif b/docs/_static/gifs/auto-detect.gif index 2fda09b..bbfb414 100755 Binary files a/docs/_static/gifs/auto-detect.gif and b/docs/_static/gifs/auto-detect.gif differ diff --git a/docs/_static/gifs/download.gif b/docs/_static/gifs/download.gif index f43191b..1be01cf 100755 Binary files a/docs/_static/gifs/download.gif and b/docs/_static/gifs/download.gif differ diff --git a/docs/_static/gifs/saved-rule.gif b/docs/_static/gifs/saved-rule.gif index f356ead..6545927 100755 Binary files a/docs/_static/gifs/saved-rule.gif and b/docs/_static/gifs/saved-rule.gif differ diff --git a/docs/_static/gifs/table-and-column.gif b/docs/_static/gifs/table-and-column.gif index 2c93285..9e9f6ae 100755 Binary files a/docs/_static/gifs/table-and-column.gif and b/docs/_static/gifs/table-and-column.gif differ diff --git a/docs/_static/gifs/upload.gif b/docs/_static/gifs/upload.gif index 1dfd6ad..15648ab 100755 Binary files a/docs/_static/gifs/upload.gif and b/docs/_static/gifs/upload.gif differ diff --git a/public/index.html b/public/index.html index c9d006f..af9b074 100644 --- a/public/index.html +++ b/public/index.html @@ -96,28 +96,28 @@

About

-

+

-
Extracting tables from PDFs is hard
-

The Portable Document Format (PDF) was not designed for tabular data. Sadly, a lot of open data is shared as PDFs and getting tables out for analysis and record-keeping is a pain. Excalibur makes PDF table extraction very easy. You can download the extracted tables as CSVs or an Excel spreadsheet. All data remains on your machine.

+
The Portable Document Format
+

A PDF file defines instructions to place characters at precise x,y coordinates relative to the bottom-left corner of the page. Words are simulated by placing some characters closer than others. Spaces are simulated by placing words relatively far apart. And finally tables are simulated by placing words as they would appear in a spreadsheet. The format has no internal representation of a table structure.

-

+

-
Why another tool?
-

There are both open and closed-source tools that are widely used for PDF table extraction. They either give a nice output or fail miserably. Excalibur is powered by Camelot (written by one of the authors) which gives users complete control over table extraction. If you don't get the desired output with default settings, you can tweak them and get the job done!

+
Extracting tables from PDFs is hard
+

The Portable Document Format was not designed for tabular data. Sadly, a lot of open data is shared as PDFs and getting tables out for analysis is a pain. A simple copy-and-paste doesn't work. Excalibur makes PDF table extraction very easy, by automatically detecting tables in PDFs and letting you save them into CSVs and Excels through a web interface.

-

+

-
Automate your workflow
-

Excalibur can detect tables in your PDFs automatically. For cases where it doesn't, you can tweak table extraction settings, save them as presets and then apply them on different PDFs with similar table structures. After v0.5.0, Excalibur will have a web API which can be used to start table extraction jobs and download extracted tables when jobs finish.

+
Why another tool?
+

There are both open and closed-source tools that are widely used for PDF table extraction. They either give a nice output or fail miserably. Excalibur is powered by Camelot which gives users additional settings to tweak table extraction and get the best results. You can see how it performs better than other open-source tools and libraries in this comparison.

@@ -125,8 +125,8 @@
Automate your workflow

-
Built for scale
-

Excalibur can be configured with MySQL and Celery to execute table extraction jobs in a parallel and distributed manner. By default, jobs are executed sequentially. You can check out the documentation at https://excalibur-py.readthedocs.io for more details.

+
Secure and built for scale
+

You get complete control over your data, since all file storage and processing happens on your own local or remote machine. Excalibur can also be configured with MySQL and Celery to execute table extraction jobs in a parallel and distributed manner. By default, jobs are executed sequentially.

@@ -146,8 +146,9 @@

Usage

-

Upload your PDF

-

You can upload your PDF using the web interface. You can also see previous uploads. All file storage and processing happens on your own local or remote machine, which means that you have complete control over your data.

+

Upload a PDF

+

You can upload a PDF using the web interface. You can also interact with previous uploads. +

@@ -157,8 +158,8 @@

Upload your PDF

-

Auto-detect table areas

-

You don't need to draw table areas and column separators in most cases, because Excalibur can do that automatically.

+

Autodetect tables

+

Excalibur can automatically detect tables in your PDF.

@@ -169,7 +170,7 @@

Auto-detect table areas

Or draw table areas and/or column separators

-

You can draw table areas and also add column separators in cases where the tables are buried deep inside the text on the page.

+

You can guide the tool by drawing table areas and column separators in cases where the tables are buried deep inside the text and autodetection fails.

@@ -179,8 +180,8 @@

Or draw table areas and/or column separators

-

Or load a saved table extraction rule

-

Each new table extraction rule (table areas, column separators and other settings) is saved by default. You can load it next time you see a PDF with a similar table structure.

+

Or load saved settings

+

You can save table extraction settings for a PDF once, and apply them on new PDFs to extract tables with similar structures.

@@ -190,8 +191,8 @@

Or load a saved table extraction rule

-

Download extracted tables in structured formats

-

You can view the extracted tables and then download them as CSVs or an Excel spreadsheet. Excalibur also supports JSON and HTML.

+

View and download data

+

Finally, you can view the extracted tables and download them as CSVs or Excels. Excalibur also supports JSON and HTML.