Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RangeError in server_messages.dart #387

Open
ryanheise opened this issue Nov 13, 2024 · 6 comments
Open

RangeError in server_messages.dart #387

ryanheise opened this issue Nov 13, 2024 · 6 comments

Comments

@ryanheise
Copy link

I just noticed this in my logs today:

Zone error: RangeError: Value not in range: -1603639493
#0      _rangeCheck (dart:typed_data-patch/typed_data_patch.dart:5113)
#1      _ByteBuffer.asUint8List (dart:typed_data-patch/typed_data_patch.dart:1928)
#2      new Uint8List.view (dart:typed_data:903)
#3      ByteDataReader.read (package:buffer/buffer.dart:407)
#4      new DataRowMessage.parse (package:postgres/src/messages/server_messages.dart:186)
#5      _messageTypeMap.<anonymous closure> (package:postgres/src/message_window.dart:23)
#6      MessageFramer.addBytes (package:postgres/src/message_window.dart:80)
#7      _readMessages.<anonymous closure>.<anonymous closure>.handleChunk (package:postgres/src/v3/protocol.dart:73)
#8      _rootRunUnary (dart:async/zone.dart:1407)
#9      _CustomZone.runUnary (dart:async/zone.dart:1308)
#10     _CustomZone.runUnaryGuarded (dart:async/zone.dart:1217)
#11     _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:365)
#12     _BufferingStreamSubscription._add (dart:async/stream_impl.dart:297)
#13     _SyncStreamControllerDispatch._sendData (dart:async/stream_controller.dart:784)
#14     _StreamController._add (dart:async/stream_controller.dart:658)
#15     _StreamController.add (dart:async/stream_controller.dart:606)
#16     _Socket._onData (dart:io-patch/socket_patch.dart:2455)
#17     _rootRunUnary (dart:async/zone.dart:1415)
#18     _CustomZone.runUnary (dart:async/zone.dart:1308)
#19     _CustomZone.runUnaryGuarded (dart:async/zone.dart:1217)
#20     _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:365)
#21     _BufferingStreamSubscription._add (dart:async/stream_impl.dart:297)
#22     _SyncStreamControllerDispatch._sendData (dart:async/stream_controller.dart:784)
#23     _StreamController._add (dart:async/stream_controller.dart:658)
#24     _StreamController.add (dart:async/stream_controller.dart:606)
#25     new _RawSocket.<anonymous closure> (dart:io-patch/socket_patch.dart:1942)
#26     _NativeSocket.issueReadEvent.issue (dart:io-patch/socket_patch.dart:1385)
#27     _microtaskLoop (dart:async/schedule_microtask.dart:40)
#28     _startMicrotaskLoop (dart:async/schedule_microtask.dart:49)
#29     _runPendingImmediateCallback (dart:isolate-patch/isolate_patch.dart:118)
#30     _RawReceivePort._handleMessage (dart:isolate-patch/isolate_patch.dart:185)

After this happens, the connection is essentially hung on the query forever. If I go into psql from the console and select from pg_stat_activity I can see all the queries are in the idle state and nothing is blocking them. This is not recoverable unless I manually kill the idle processes with pg_terminate_backend.

I can only see this exception by listening for zone errors so I can't directly catch it at the point in my code where the exception was triggered. I guess the exception should not happen in the first place, but if it does happen, it would be nice if server_messages.dart handled the exception and forwarded it back to the original call site.

I believe there might also be some other scenarios where the connection can hang, and the only way is to kill the idle processes manually. However, this is the only time when I've been given an actual exception to report. Since those other cases are still unknown, I am hoping there is an explanation in general for why this package might be hanging on idle processes where nothing is blocking them (I'm assuming it's something like the processes are remaining in the idle state because this package hasn't yet closed the cursors or something). As yet, I haven't worked out a way to reproduce it other than letting the application run for long enough until by chance the DB connection hangs. The stacktrace above, however, has only ever happened once, so I'm not sure if I'm likely to ever see it again.

This exception occurred with release 3.4.2.

@isoos
Copy link
Owner

isoos commented Nov 13, 2024

@ryanheise thanks for reporting this, it is worth to investigate what we can improve here. Why postgresql version are you using? Any special extension or data type that could have been part of the result?

@ryanheise
Copy link
Author

Here is the output from within psql:

postgres=# select version();
                                                       version
---------------------------------------------------------------------------------------------------------------------
 PostgreSQL 15.4 (Debian 15.4-2.pgdg120+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit

postgres=# \dx
                                    List of installed extensions
  Name   | Version |   Schema   |                            Description
---------+---------+------------+-------------------------------------------------------------------
 amcheck | 1.3     | pg_catalog | functions for verifying relation integrity
 pg_trgm | 1.6     | public     | text similarity measurement and index searching based on trigrams
 plpgsql | 1.0     | pg_catalog | PL/pgSQL procedural language
 vector  | 0.5.1   | public     | vector data type and ivfflat and hnsw access methods

Although I did not log the specific SQL query, there are queries based on the pgvector extension, both to insert/update embeddings and also to query embeddings. In each of these cases, I am passing the vectors in their string/text representation (I wasn't able to get it to work directly as an array).

@ryanheise
Copy link
Author

I actually do have the idle query saved from my psql session. I can't give the entire schema, but mytable has columns with the following types:

......     boolean                                 
myhash     bytea                           not null
......     integer                                 
......     integer                         not null
......     text                                    
......     text                            not null
......     text[]                                  
......     text[]                          not null
......     timestamp(3) with time zone             
......     timestamp(3) with time zone     not null
......     vector(512)                             

And the idle query was long these lines:

SELECT m.*, encode(m.myhash, 'base64') AS myhashBase64
FROM mytable m
WHERE ...

@isoos
Copy link
Owner

isoos commented Nov 13, 2024

@ryanheise these were helpful, thank you! I did not explore the actual parsing error yet, but I have found a missing part where the protocol-level bug was not bubbling up, and I wrote a test for it #388. I've published a new version, hopefully it will expose the actual protocol-level bug with better details.

@ryanheise
Copy link
Author

I believe I may have encountered the problematic query again, and with the newer version of this package (3.4.3), I captured this error message:

Severity.error Socket error: Bad state: Message parser consumed more bytes than expected. type=68 expectedLength=3012.

Unfortunately, I had forgotten to print the stack trace, although it appears from the source that this would have an unambiguous stack trace.

Interestingly, I did learn that the query matched a large number of rows (around 5000 with close to 50 columns) on the occasion when it produced this error, so maybe that is something. The same query does work correctly in psql.

@isoos
Copy link
Owner

isoos commented Jan 13, 2025

@ryanheise: please upgrade the version, as 3.4.5 should have already fixed it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants