Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling for postgres schemas? #12

Open
khondula opened this issue Aug 9, 2018 · 8 comments
Open

Handling for postgres schemas? #12

khondula opened this issue Aug 9, 2018 · 8 comments

Comments

@khondula
Copy link

khondula commented Aug 9, 2018

Would it be possible to support Postgres databases that have tables in schemas? I'm running into problems I think because table names have to be prefixed by the schema name in the sql.

# try archiving just tables in specified schema
odm2_tables <- dbGetQuery(db, 
  "SELECT * FROM information_schema.tables 
  WHERE table_schema = 'odm2'")

ark(db, dir, lines = 50000, tables = odm2_tables$table_name)

Error:

Exporting actionannotations in 50000 line chunks:
Error in postgresqlExecStatement(conn, statement, ...) : 
  RS-DBI driver: (could not Retrieve the result : ERROR:  relation "actionannotations" does not exist
LINE 1: SELECT * FROM actionannotations LIMIT 0
                      ^
)
@cboettig
Copy link
Member

Good question! I actually don't have much experience with postgres using schemas. Here's the query that we use to select the "table" to be archived:

https://github.com/cboettig/arkdb/blob/295f0bc354d932aa8d75a1164e58a910744c6cbe/R/ark.R#L130-L132

Looks like we'll want to support some mechanism for a user to specify the schema, rather than just the table name, to indicate how the table should be archived? What should the corresponding dbGetQuery query look like for this case?

@khondula
Copy link
Author

That would be great! The table name has to be be referred to as a "qualified" name as schema.tablename. So perhaps something like this?

query <- paste0("SELECT table_schema 
                 FROM information_schema.tables 
                 WHERE table_name = '", table, "'")
schema <- DBI::dbGetQuery(db, query)
query <- paste0("SELECT * FROM ", schema, ".", table, " LIMIT 0")
DBI::dbGetQuery(db, query)

I'm not sure what happens when a table is not in a schema... the last line of the docs does not fill me with hope!

@cboettig
Copy link
Member

hmm... can you just manually construct the appropriate <schemaname>.<tablename> in the existing tables argument?

@khondula
Copy link
Author

Oh, good call. this worked:

schema_tables <- dbGetQuery(db, sqlInterpolate(db,
"SELECT table_name FROM information_schema.tables 
WHERE table_schema = ?schema", schema = "schema_name"))

ark(db, dir, tables = paste0("schema_name",".", schema_tables$table_name))

@cboettig
Copy link
Member

Nice, thanks. I'll add that example into the documentation. Does that sound like a sufficient strategy here?

@ilarischeinin
Copy link

This doesn't work for me. Maybe an update to DBI, RPostgres/RPostgreSQL, or arkdb has broken it at some point.

Below is a reprex (with a PostgreSQL server run inside docker). Before that, I do want to say that this schema specification is, or has been, a bit of a mess. Depending on the case (SQL/DBI/RPostgres/RPostgreSQL/dbplyr), it is one of these options that you need:

  • "schema.table"
  • c("schema", "table")
  • DBI::Id(schema = "schema", table = "table")
  • dbplyr::in_schema(schema = "schema", table = "table")

Reprex

Run the server:

docker pull postgres:12.0
docker run \
  --detach \
  --env POSTGRES_PASSWORD=reprex \
  --name reprex \
  --publish 5432:5432 \
  --rm \
  postgres:12.0

The R reprex:

# open connection
con <-
  DBI::dbConnect(
    RPostgres::Postgres(),
    host = "127.0.0.1",
    dbname = "postgres",
    user = "postgres",
    password = "reprex"
  )

# generate simple dummy data
x <- data.frame(i = 1:10)

# insert that into the database
DBI::dbExecute(con, "CREATE SCHEMA foobar;")
DBI::dbWriteTable(con, DBI::Id(schema = "foobar", table = "x"), x)

# check that it is there and we can read it
DBI::dbReadTable(con, DBI::Id(schema = "foobar", table = "x"))
DBI::dbGetQuery(con, "SELECT * FROM foobar.x")

# but none of these work
arkdb::ark(con, dir = tempdir(), tables = "x")
arkdb::ark(con, dir = tempdir(), tables = "foobar.x")
arkdb::ark(con, dir = tempdir(), tables = c("foobar", "x"))
arkdb::ark(con, dir = tempdir(), tables = DBI::Id(schema = "foobar", table = "x"))
arkdb::ark(con, dir = tempdir(), tables = dbplyr::in_schema(schema = "foobar", table = "x"))

As a side note, even though they give errors, the commands above still generate output files. I think they shouldn't.

Clean up

docker stop reprex
docker rmi postgres:12.0

RPostgres vs. RPostgreSQL

If you're using RPostgreSQL instead of RPostgres as above, instead of

DBI::dbReadTable(con, DBI::Id(schema = "foobar", table = "x"))

you'd need to do

DBI::dbReadTable(con, c(schema = "foobar", table = "x"))

Session info

─ Session info ───────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 3.6.2 (2019-12-12)
 os       macOS Catalina 10.15.2      
 system   x86_64, darwin15.6.0        
 ui       X11                         
 language (EN)                        
 collate  en_US.UTF-8                 
 ctype    en_US.UTF-8                 
 tz       Europe/Helsinki             
 date     2019-12-18                  

─ Packages ───────────────────────────────────────────────────────────────────
 package     * version date       lib source        
 arkdb         0.0.5   2018-10-31 [1] CRAN (R 3.6.0)
 assertthat    0.2.1   2019-03-21 [1] CRAN (R 3.6.0)
 backports     1.1.5   2019-10-02 [1] CRAN (R 3.6.0)
 bit           1.1-14  2018-05-29 [1] CRAN (R 3.6.0)
 bit64         0.9-7   2017-05-08 [1] CRAN (R 3.6.0)
 blob          1.2.0   2019-07-09 [1] CRAN (R 3.6.0)
 callr         3.4.0   2019-12-09 [1] CRAN (R 3.6.0)
 cli           2.0.0   2019-12-09 [1] CRAN (R 3.6.0)
 crayon        1.3.4   2017-09-16 [1] CRAN (R 3.6.0)
 DBI           1.1.0   2019-12-15 [1] CRAN (R 3.6.2)
 dbplyr        1.4.2   2019-06-17 [1] CRAN (R 3.6.0)
 desc          1.2.0   2018-05-01 [1] CRAN (R 3.6.0)
 devtools      2.2.1   2019-09-24 [1] CRAN (R 3.6.0)
 digest        0.6.23  2019-11-23 [1] CRAN (R 3.6.0)
 dplyr         0.8.3   2019-07-04 [1] CRAN (R 3.6.0)
 ellipsis      0.3.0   2019-09-20 [1] CRAN (R 3.6.0)
 fansi         0.4.0   2018-10-05 [1] CRAN (R 3.6.0)
 fs            1.3.1   2019-05-06 [1] CRAN (R 3.6.0)
 glue          1.3.1   2019-03-12 [1] CRAN (R 3.6.0)
 hms           0.5.2   2019-10-30 [1] CRAN (R 3.6.1)
 magrittr      1.5     2014-11-22 [1] CRAN (R 3.6.0)
 memoise       1.1.0   2017-04-21 [1] CRAN (R 3.6.0)
 pillar        1.4.2   2019-06-29 [1] CRAN (R 3.6.0)
 pkgbuild      1.0.6   2019-10-09 [1] CRAN (R 3.6.0)
 pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 3.6.0)
 pkgload       1.0.2   2018-10-29 [1] CRAN (R 3.6.0)
 prettyunits   1.0.2   2015-07-13 [1] CRAN (R 3.6.0)
 processx      3.4.1   2019-07-18 [1] CRAN (R 3.6.0)
 progress      1.2.2   2019-05-16 [1] CRAN (R 3.6.0)
 ps            1.3.0   2018-12-21 [1] CRAN (R 3.6.0)
 purrr         0.3.3   2019-10-18 [1] CRAN (R 3.6.0)
 R6            2.4.1   2019-11-12 [1] CRAN (R 3.6.0)
 Rcpp          1.0.3   2019-11-08 [1] CRAN (R 3.6.0)
 remotes       2.1.0   2019-06-24 [1] CRAN (R 3.6.0)
 rlang         0.4.2   2019-11-23 [1] CRAN (R 3.6.0)
 RPostgres     1.1.3   2019-12-07 [1] CRAN (R 3.6.0)
 RPostgreSQL   0.6-2   2017-06-24 [1] CRAN (R 3.6.0)
 rprojroot     1.3-2   2018-01-03 [1] CRAN (R 3.6.0)
 sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 3.6.0)
 testthat      2.3.1   2019-12-01 [1] CRAN (R 3.6.0)
 tibble        2.1.3   2019-06-06 [1] CRAN (R 3.6.0)
 tidyselect    0.2.5   2018-10-11 [1] CRAN (R 3.6.0)
 usethis       1.5.1   2019-07-04 [1] CRAN (R 3.6.0)
 vctrs         0.2.0   2019-07-05 [1] CRAN (R 3.6.0)
 withr         2.1.2   2018-03-15 [1] CRAN (R 3.6.0)
 zeallot       0.1.0   2018-01-28 [1] CRAN (R 3.6.0)

[1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library

@cboettig cboettig reopened this Dec 23, 2019
@cboettig
Copy link
Member

@ilarischeinin Thanks for the reprex. Yeah, as you can see, arkdb doesn't have any explicit logic for postgres schema at the moment, it is largely just counting on the DBI abstractions to be portable. I'd welcome a PR for this, though I imagine many postgres users may already be happy with using COPY to do a bulk export to .csv, which is probably faster than arkdb's chunk iteration. (Having arkdb attempt native methods is also on the to-do list, #24)

@1beb
Copy link
Contributor

1beb commented Sep 23, 2021

A note here, dbplyr has a built in method for accessing a schema:

con <- DBI::dbConnect(...)
my_table <- tbl(con, in_schema("schema", "table"))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants