Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allows for multiple tables per page #31

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

SteveLane
Copy link

This edit allows for multiple tables per page to be read using list_matrices
method. If there is only one table on a page, a matrix is returned, else a list
of matrices are returned.

This edit allows for multiple tables per page to be read using list_matrices
method. If there is only one table on a page, a matrix is returned, else a list
of matrices are returned.
@SteveLane
Copy link
Author

@leeper I haven't added any tests for this - it's just the bare change. I can add tests, or anything else you'd like - just let me know.

Returns a list of strings if there are multiple tables per page.
Now produces a list of data frames if there's multiple tables per page.
The java import using 'asis' thinks there's two tables present. This wasn't an
issue in the previous version of extract_tables (as it only extracted the first
table), but it is now when we extract all tables.
@SteveLane
Copy link
Author

SteveLane commented Dec 22, 2016

Of course, list_characters and list_data_frames needed updating as well, sorry for the extra commits.
Also - whilst testing, the non-western table check failed. The extract_tables function actually pulls in two tables. Not sure if it's the package bug, or a tabula-java bug - I'll open an issue anyway (Issue #32).

@codecov-io
Copy link

codecov-io commented Dec 22, 2016

Codecov Report

❗ No coverage uploaded for pull request base (master@b2e7791). Click here to learn what that means.
The diff coverage is 95.12%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master      #31   +/-   ##
=========================================
  Coverage          ?   57.82%           
=========================================
  Files             ?       12           
  Lines             ?      569           
  Branches          ?        0           
=========================================
  Hits              ?      329           
  Misses            ?      240           
  Partials          ?        0
Impacted Files Coverage Δ
R/output.R 97.11% <95.12%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b2e7791...88df806. Read the comment docs.

@leeper
Copy link
Member

leeper commented Jan 15, 2017

Sorry, @SteveLane, for the delay on this. I will try to get to it as soon as I can.

R/output.R Outdated
for (j in seq_len(ncol(out[[n]]))) {
out[[n]][i, j] <- tab$getCell(i-1L, j-1L)$getText()
outTab <- list()
for(nTabs in seq_len(nxt$size())){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add spaces between brackets.

R/output.R Outdated
outTab[[nTabs]] <- matrix(NA_character_,
nrow = tab$getRows()$size(),
ncol = tab$getCols()$size())
for (i in seq_len(nrow(outTab[[nTabs]]))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add comments indicating what's going on here?

R/output.R Outdated
if (!is.null(encoding)) {
Encoding(out[[n]]) <- encoding
## Put outTab into out, depending on size
if(nxt$size() == 1L){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add spaces between brackets.

R/output.R Outdated
@@ -79,18 +89,34 @@ list_matrices <- function(tables, encoding = NULL, ...) {
list_characters <- function(tables, sep = "\t", encoding = NULL, ...) {
m <- list_matrices(tables, encoding = encoding, ...)
lapply(m, function(x) {
paste0(apply(x, 1, paste, collapse = sep), collapse = "\n")
if(inherits(x, "matrix")){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add spaces between brackets.

R/output.R Outdated
o <- try(read.delim(text = x, stringsAsFactors = stringsAsFactors, ...))
if (inherits(o, "try-error")) {
return(x)
if(inherits(x, "character")){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add spaces between brackets.

R/output.R Outdated
if (inherits(o, "try-error")) {
return(x)
if(inherits(x, "character")){
o <- try(read.delim(text = x, stringsAsFactors = stringsAsFactors, ...))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this try() be silent?

R/output.R Outdated
} else {
return(o)
lapply(x, function(y){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add space between brackets.

R/output.R Outdated
} else {
return(o)
lapply(x, function(y){
o <- try(read.delim(text = y, stringsAsFactors = stringsAsFactors, ...))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this try() be silent?

@pachadotdev
Copy link
Contributor

let me test how these changes go with tabula 1.0.5

@pachadotdev
Copy link
Contributor

hi @SteveLane
I will need to review this manually and add bits of it after the multiple refactors I made.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants