Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbGetQuery handles empty recordsets in a problematic manner #51

Open
GoogleCodeExporter opened this issue Nov 3, 2015 · 4 comments
Open

Comments

@GoogleCodeExporter
Copy link

What steps will reproduce the problem?
1. Create a table with 2 or more columns:
    dbGetQuery(con, "CREATE TABLE junk (col1 INTEGER, col2 VARCHAR(100))");

2. Use dbGetQuery() to retrieve a data.frame with the contents of this table:
    a <- dbGetQuery(con, "SELECT * FROM junk");

3. I would expect to receive back a data.frame with 0 rows and 2 columns named 
"col1" and "col2".  Instead, I get back:
    print(a)
    data frame with 0 columns and 0 rows

If I try to use this data.frame naively by commands like:
    print(a[1])
I get errors like:
    Error in .subset2(x, i, exact = exact) : subscript out of bounds

4. In contrast, if the table is not empty, I get a data.frame with the 
appropriate number of rows and two columns:
    dbGetQuery(con, "INSERT INTO junk VALUES (1, 'wowee')");
    a <- dbGetQuery(con, "SELECT * FROM junk");
    print(a)

      col1  col2
    1    1 wowee

Now the returned data.frame behaves in the expected manner:
    print(a[1])

      col1
    1    1

What is the expected output? What do you see instead?

I would expect to get back a data.frame with a predictable structure from a 
dbGetQuery() regardless of whether or not the recordset is empty.  With the 
current design, I need to check every return from dbGetQuery() to see if it is 
empty before I attempt to access columns or use the return result in a merge() 
or match() function.  A common code construct for my application looks like:
    Lookup <- dbGetQuery(con, "SELECT id, description FROM lookup_table");
    F <- merge(my_data, Lookup, by.x="id", by.y="id", all.x=TRUE, all.y=FALSE);

but the merge fails if the lookup table happens to be empty.

Note that the R database interface to SQLite3 returns a zero-row data.frame 
with the correct columns when the dbGetQuery() returns no rows.  The problem 
therefore is not intrinsic to DBI, but seems to be specific to RPostgreSQL.

What version of the product are you using? On what operating system?
R 2.15.2, RPostgreSQL 0.4 compiled on R 2.15.3, Windows 7 64-bit

Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 3 Apr 2013 at 8:52

@GoogleCodeExporter
Copy link
Author

Looking through the source code, it seems likely that the system being used to 
detect whether the statement was a SELECT is failing when the result set has 
zero length, and this causes the return of a data.frame with 0 rows and 0 
columns.

Original comment by [email protected] on 24 May 2013 at 3:42

@GoogleCodeExporter
Copy link
Author

i glanced quickly through the source and believe the easiest fix would be at 
the R-code level.
in PostgreSQL.R, DBIResult class methods that return result set data frames 
(e.g. "fetch"), have this check:

  out <- postgresqlFetch(res, n, ...)
  if(is.null(out))
    out <- data.frame(out)
  out

the casting of the null "out" to a data frame is where the 0-row-0-col data 
frame is created.
instead of that casting, i'd use a function specifically designed to return a 
"smart" empty data frame, like so:

  emptyResultDataFrame <- function(res)
    {
      klasses <- dbColumnInfo(res)$Sclass
      names(klasses) <- dbColumnInfo(res)$name
      data.frame(sapply(klasses, function(klass) eval(parse(text = sprintf("%s(0)", klass))), simplify = FALSE))
    }


and thus alter the existing code blocks that look like so:

  out <- postgresqlFetch(res, n, ...)
  if(is.null(out))
    out <- data.frame(out)
  out

to:

  out <- postgresqlFetch(res, n, ...)
  if(is.null(out))
    out <- emptyResultDataFrame(res)
  out


NOTE 1: i'm pretty sure there's a better way to initialize an empty vector from 
a class name stored as a string (saving the trouble of my eval/parse trick 
above), so people more familiar with low-level expressions in R can probably 
make an improvement there.

NOTE 2: there's also the chance for a bug here, since i'm *assuming* that the 
Sclass field in the getColumnInfo() table returns valid R classes/types which 
have 'empty' constructors that follow the 'numeric(0)' or 'character(0)' or 
'logical(0)' etc. paradigm.
there's probably also a safer way to initialize an object of zero-length with 
the class name as a string, as mentioned in NOTE 1, and this would also help 
prevent the possible bug introduced by my assumption.

NOTE 2b: i'm aware that "vector(klass, 0)" will work, but only for base 
('mode') types... so i'm not sure if the Sclass field can be populated by 
high-level types in R that might break the "vector()" function, which is why i 
opted for the eval/parse trick instead.

anyhow, i rolled this fix into a forked version of RPostgreSQL on my system and 
it works like a charm to fix the annoying 0-row-0-column bug, so feel free to 
include some version of this in the next release.

cheers,

-murat

Original comment by [email protected] on 3 Jul 2013 at 9:09

@GoogleCodeExporter
Copy link
Author

Original comment by [email protected] on 16 Jul 2013 at 12:16

  • Added labels: Priority-High
  • Removed labels: Priority-Medium

@GoogleCodeExporter
Copy link
Author

Any update on this? We're still seeing this. The result of dbGetQuery might or 
might not have column names, depending on if the result is empty. Is there any 
way to get the column names from the empty result without querying for them?

Original comment by [email protected] on 7 Jul 2014 at 9:06

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant