Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix 168: Handle empty blobs as bind parameters #167

Closed

Conversation

msfstef
Copy link

@msfstef msfstef commented Apr 9, 2024

Fixes issue #168 where empty bytearrays/blobs were being encoded as NULLs when using the bind_blob method.

There was already a test covering empty blob writing and reading but not using bind parameters. I've added an additional test for blobs with bind parameters similar to the one for the 64-bit integers that fails without the suggested fix.

An alternative approach would be to write

const ptr = byteLength === 0 ?
        Module._sqlite3_malloc(1) :
        Module._sqlite3_malloc(byteLength);

but I find using the UTF8 utility to allocate an "empty buffer" to be clearer in terms of intent.

Similar issue with similar resolution in better-sqlite3:
WiseLibs/better-sqlite3#651
Prinzhorn/better-sqlite3@3cbe134

Checklist

  • I grant to recipients of this Project distribution a perpetual,
    non-exclusive, royalty-free, irrevocable copyright license to reproduce, prepare
    derivative works of, publicly display, sublicense, and distribute this
    Contribution and such derivative works.
  • I certify that I am legally entitled to grant this license, and that this
    Contribution contains no content requiring a license from any third party.

@msfstef msfstef changed the title Fix: Handle empty blobs as bind parameters Fix 168: Handle empty blobs as bind parameters Apr 9, 2024
@sgbeal
Copy link

sgbeal commented Apr 9, 2024

FWIW, sqlite does not really differentiate between an empty blob and NULL values, in that sqlite3_column_blob() will return NULL for a length-0 blob. Converting a length-0 blob to a length-1 blob for binding purposes goes against how the C library handles those.

@msfstef
Copy link
Author

msfstef commented Apr 9, 2024

FWIW, sqlite does not really differentiate between an empty blob and NULL values, in that sqlite3_column_blob() will return NULL for a length-0 blob. Converting a length-0 blob to a length-1 blob for binding purposes goes against how the C library handles those.

Huh, you're right, I have been looking through the spec to figure out what the expected behaviour is. That being said, currently writing INSERT INTO t VALUES (X''); and then reading it out will return an empty array, whereas doing INSERT INTO t VALUES (?); with bind value new Uint8Array(0) will return null so there is inconsistent behaviour. Other sqlite drivers seem to also be inconsistent in what they do.

Semantically I find it useful to be able to differentiate the two (empty but present vs not present) as long as it doesn't cause conflicts in the otherwise expected behaviour.

Some more examples of other SQLite drivers handling this:
dotnet/efcore#17494
ericsink/SQLitePCL.raw@e324598

Also short discussion on SQLite forums with answer by one of the most prolific users there:
https://sqlite.org/forum/forumpost/f0f092c0bb?t=h&hist
https://sqlite.org/forum/reports?type=f&view=byuser

@sgbeal
Copy link

sgbeal commented Apr 9, 2024

Semantically I find it useful to be able to differentiate the two...

FWIW, i also prefer that, but that's not how the C API works. There was a long discussion on this distinction late last year in the forum:

https://sqlite.org/forum/forumpost/668405ab8459d671

@msfstef
Copy link
Author

msfstef commented Apr 9, 2024

https://sqlite.org/forum/forumpost/668405ab8459d671

Thank you for the link! What have I gotten myself into...

My understanding is that we agree that having an empty bytearray and NULL be different is semantically useful, and the current API already differentiates the two since using blob string literals with INSERT INTO table VALUES (X'') and then selecting it will return an empty array. It's already in the test suite, and even in the SQLite test suite.

Providing a pointer to a non-empty bytearray but specifying that 0 bytes be copied from it ultimately results in setting an empty string literal that is flagged as a UTF-8 encoded blob, which results in the same behaviour of using string literals directly rather than using bind parameters.

Given that we agree on this behaviour being useful, and that it's already part of the API but not consistently, and that a lot of other popular SQLite libraries do the same, could you please elaborate on your disagreement?

You said that's not how the C API works but this is about circumventing the sqlite_bind_blob not being able to create the same statements that we can do directly using string literals? The suggested fix has the expected behaviour, are you saying that it has unintended consequences as well?

edit: path followed
sqlite3_bind_blob: https://github.com/sqlite/sqlite/blob/1278ac411c04d3956380f5bcc241c2d5f7d1b57c/src/vdbeapi.c#L1703-L1714
bind_text with encoding 0 and nData 0: https://github.com/sqlite/sqlite/blob/1278ac411c04d3956380f5bcc241c2d5f7d1b57c/src/vdbeapi.c#L1681-L1683
sqlite3VdbeMemSetStr with n 0 and enc 0: https://github.com/sqlite/sqlite/blob/1278ac411c04d3956380f5bcc241c2d5f7d1b57c/src/vdbemem.c#L1158-L1251

@sgbeal
Copy link

sgbeal commented Apr 9, 2024

could you please elaborate on your disagreement?

i just generally believe that wrappers should not deviate from the C API's behavior for such low-level details. That's not a strong disagreement, though, and i can sympathize with wrappers feeling that they should change this, but it's been my experience that making such changes is an attempt to make the wrapper "too clever," a design decision which often backfires. The closer a wrapper stays to the C API, the less room there is for surprise and/or incompatibility for users. (Granted, the C API's behavior is somewhat surprising here, but its behavior is well-established and won't ever change.)

That said: i have no say-so whatsoever in wa-sqlite's development, i'm just expressing an old man's opinion based on prior experience using and writing sqlite3 wrappers. Please don't take my objection too seriously!

@msfstef
Copy link
Author

msfstef commented Apr 9, 2024

The closer a wrapper stays to the C API, the less room there is for surprise and/or incompatibility for users.

I agree completely, and as you said I'm surprised the specification and API do not have a clear approach for this which has led me to have to check each wrapper implementation independently! I could use a "reserved" bytearray as an "empty bytearray" symbol at the application level to not have ambiguity but I'd really like to avoid that, and I'm seeing several other wrappers using this approach.

That said: i have no say-so whatsoever in wa-sqlite's development, i'm just expressing an old man's opinion based on prior experience using and writing sqlite3 wrappers. Please don't take my objection too seriously!

I did not check but I definitely thought you were a maintainer! Thank you so much for your insight and for taking the time to discuss this then, I started out hoping this was a simple wrapper implementation bug but it seems there's more to it than I expected.

Ultimately my concern is that the behaviour between using literals and bind parameters is inconsistent, which is both unexpected and hard to explain as a "limitation".

@sgbeal
Copy link

sgbeal commented Apr 9, 2024

I did not check but I definitely thought you were a maintainer!

Not of this project. i'm just a spectator here, but i'm "the JS/WASM guy" for the sqlite project and i pay close attention to what Roy does because he's demonstrably a genius in all things related to JS, WASM, and the sqlite VFS API. All of us in the sqlite project tend to towards old-school/conservative development, whereas Roy is out there pushing new-school boundaries, so keeping tabs on what he's doing is both highly informative and inspiring.

@rhashimoto
Copy link
Owner

@msfstef, thanks for taking the time to investigate this and submit a PR and include a test.

After reading the enlightening discussion and links (I didn't actually know about this behavior), my take is this is a bug in the SQLite C API - a zero-length blob should be distinct from NULL - and the question is whether to "fix" it with the JavaScript implementation. I'm finding @sgbeal's position, drawn from experience with many wrappers, sensible and persuasive. I'll take a few days to see if I change my mind, but that's how I'm leaning.

@msfstef
Copy link
Author

msfstef commented Apr 11, 2024

@rhashimoto thank you for the feedback!

I agree that this would ultimately be a workaround and the more I look at it the more I believe that this should be something that SQLite itself should handle more gracefully, so I completely understand if you'd rather stick with staying close to the SQLite implementation.

I've opened yet another thread on the SQLite forum about this, hopefully it leads somewhere :)

But just for good measure, to advocate for this change one last time, I personally find this JS level workaround to be simple and innocuous enough to be removed if and when SQLite addresses this in any way, and it results in a more consistent behaviour. Ultimately it's not a complete abuse of the sqlite3_bind_blob API as it still specifies that 0 bytes of the given data should be bound, so as long as that parameter is respected it should continue to work (?).

@rhashimoto rhashimoto closed this Apr 14, 2024
msfstef added a commit to electric-sql/electric that referenced this pull request Apr 16, 2024
After going down a rabbit-hole in SQLite-land, my conclusion is that
even though inserting empty blobs as string literals works seamlessly,
the issue of not being able to bind "empty blobs" (0-length byte arrays)
is an issue with the SQLite C API itself and is not wrapper-specific.

SQLite wrappers can implement a "work around" the `sqlite3_bind_blob` C
API to enable empty blob binding but because the fundamental issue is
with the underlying API there will always be inconsistency between
wrappers when it comes to binding empty blobs.

Ultimately being able to differentiate between a `NULL` and a 0-length
byte-array is an edge case, even though the two are semantically
different and the SQLite C API can already do that for blob literals,
and given that this is not a wrapper issue as much as it is a general
SQLite issue I think it's unnecessary to draw attention to it in our
docs.

Since we also transform values going in and out of SQLite through our TS
client, we _could_ technically always convert empty byte arrays to
`NULL` for consistency, but I think it's probably better to accept that
this inconsistency is going to be present until (and if) the SQLite
project does something about it.

You can follow along these discussions for more details:
PR I initially opened to "fix" `wa-sqlite`:
rhashimoto/wa-sqlite#167
Thread I started on the SQLite forums:
https://sqlite.org/forum/forumpost/aac9343c98

@balegas what are your thoughts?

Also, after fixing this we see that only one driver we have does not
support blobs (the expo-sqlite one), we might eventually want to
deprecate that and recommend the `expo-sqlite/next` only but I suppose
Expo will do that themselves at some point.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants