-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allows for multiple tables per page #31
base: main
Are you sure you want to change the base?
Conversation
This edit allows for multiple tables per page to be read using list_matrices method. If there is only one table on a page, a matrix is returned, else a list of matrices are returned.
@leeper I haven't added any tests for this - it's just the bare change. I can add tests, or anything else you'd like - just let me know. |
Returns a list of strings if there are multiple tables per page.
Now produces a list of data frames if there's multiple tables per page.
The java import using 'asis' thinks there's two tables present. This wasn't an issue in the previous version of extract_tables (as it only extracted the first table), but it is now when we extract all tables.
Of course, |
Codecov Report
@@ Coverage Diff @@
## master #31 +/- ##
=========================================
Coverage ? 57.82%
=========================================
Files ? 12
Lines ? 569
Branches ? 0
=========================================
Hits ? 329
Misses ? 240
Partials ? 0
Continue to review full report at Codecov.
|
Sorry, @SteveLane, for the delay on this. I will try to get to it as soon as I can. |
R/output.R
Outdated
for (j in seq_len(ncol(out[[n]]))) { | ||
out[[n]][i, j] <- tab$getCell(i-1L, j-1L)$getText() | ||
outTab <- list() | ||
for(nTabs in seq_len(nxt$size())){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add spaces between brackets.
R/output.R
Outdated
outTab[[nTabs]] <- matrix(NA_character_, | ||
nrow = tab$getRows()$size(), | ||
ncol = tab$getCols()$size()) | ||
for (i in seq_len(nrow(outTab[[nTabs]]))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add comments indicating what's going on here?
R/output.R
Outdated
if (!is.null(encoding)) { | ||
Encoding(out[[n]]) <- encoding | ||
## Put outTab into out, depending on size | ||
if(nxt$size() == 1L){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add spaces between brackets.
R/output.R
Outdated
@@ -79,18 +89,34 @@ list_matrices <- function(tables, encoding = NULL, ...) { | |||
list_characters <- function(tables, sep = "\t", encoding = NULL, ...) { | |||
m <- list_matrices(tables, encoding = encoding, ...) | |||
lapply(m, function(x) { | |||
paste0(apply(x, 1, paste, collapse = sep), collapse = "\n") | |||
if(inherits(x, "matrix")){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add spaces between brackets.
R/output.R
Outdated
o <- try(read.delim(text = x, stringsAsFactors = stringsAsFactors, ...)) | ||
if (inherits(o, "try-error")) { | ||
return(x) | ||
if(inherits(x, "character")){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add spaces between brackets.
R/output.R
Outdated
if (inherits(o, "try-error")) { | ||
return(x) | ||
if(inherits(x, "character")){ | ||
o <- try(read.delim(text = x, stringsAsFactors = stringsAsFactors, ...)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this try()
be silent?
R/output.R
Outdated
} else { | ||
return(o) | ||
lapply(x, function(y){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add space between brackets.
R/output.R
Outdated
} else { | ||
return(o) | ||
lapply(x, function(y){ | ||
o <- try(read.delim(text = y, stringsAsFactors = stringsAsFactors, ...)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this try()
be silent?
let me test how these changes go with tabula 1.0.5 |
hi @SteveLane |
This edit allows for multiple tables per page to be read using list_matrices
method. If there is only one table on a page, a matrix is returned, else a list
of matrices are returned.