Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAP-564] [Regression] unpack_from requires a buffer of at least 5 bytes for unpacking 5 bytes at offset 0 (actual buffer size is 0) #456

Closed
2 tasks done
pdebelak opened this issue May 18, 2023 · 18 comments · Fixed by #601
Labels
bug Something isn't working regression

Comments

@pdebelak
Copy link

pdebelak commented May 18, 2023

Is this a regression in a recent version of dbt-redshift?

  • I believe this is a regression in dbt-redshift functionality
  • I have searched the existing issues, and I could not find an existing issue for this regression

Current Behavior

Some models sometimes give an error unpack_from requires a buffer of at least 5 bytes for unpacking 5 bytes at offset 0 (actual buffer size is 0) that is potentially related to whatever the issue is in aws/amazon-redshift-python-driver#142

Expected/Previous Behavior

Models run correctly.

Steps To Reproduce

  1. Use redshift-connector instead of psycopg2 to connect to redshift

Relevant log output

16:19:06  Runtime Error in model my_model (models/my_model.sql)
16:19:06    unpack_from requires a buffer of at least 5 bytes for unpacking 5 bytes at offset 0 (actual buffer size is 0)

Environment

- OS: debian 11.7
- Python: 3.8.16
- dbt-core (working version): 1.4.5
- dbt-redshift (working version): 1.4.0
- dbt-core (regression version): 1.5.0
- dbt-redshift (regression version): 1.5.1

Additional Context

No response

@pdebelak pdebelak added bug Something isn't working regression triage labels May 18, 2023
@github-actions github-actions bot changed the title [Regression] unpack_from requires a buffer of at least 5 bytes for unpacking 5 bytes at offset 0 (actual buffer size is 0) [ADAP-564] [Regression] unpack_from requires a buffer of at least 5 bytes for unpacking 5 bytes at offset 0 (actual buffer size is 0) May 18, 2023
@dbeatty10
Copy link
Contributor

@pdebelak thank you for reaching out!

And thanks for linking to aws/amazon-redshift-python-driver#142

Is there a specific dbt project setup (model, seed, etc) that I can use to reproduce this error? Or is it really dependent upon something on the Redshift side of things that will be hard to reproduce?

My gut says that this is a redshift_connector issue that we won't be able to do much about rather than a dbt-redshift issue that we can fix -- any additional info you can give will be helpful to determine one from the other.

@pdebelak
Copy link
Author

@dbeatty10 I haven't figured out a reproducible example for this, but I agree that this seems to be an issue with redshift-connector. I opened the issue here because this is another one of the breaking changes caused by the switch from psycopg2 to redshift-connector and I think the solution is to either just revert #251 or at least make the driver configurable. I opened a discussion about this yesterday as well since it's a big change and would be breaking (much in the change to switch from psycopg2 to redshift-connector was a large, breaking change).

@dbeatty10
Copy link
Contributor

@pdebelak Would you be willing to run the following and share the output?

dbt --log-format json run ...

Thank you for opening that Discussion -- I'll respond there.

@dbeatty10
Copy link
Contributor

Root cause

When a network connection is lost unexpectedly, the error message doesn't help the user realize it's really a network connection issue.

Background

Prior to Mar 27, 2021, the README of pg8000 (predecessor to redshift_connector) said:

Occasionally, the network connection between pg8000 and the server may go
down. If pg8000 encounters a problem writing to a socket it raises
BrokenPipeError: [Errno 32] Broken pipe. If pg8000 encounters a problem
reading from a socket it raises struct.error: unpack_from requires a buffer of at least 5 bytes.

Potential solution

The solution is likely two parts:

  1. Within redshift_connector, Raise a better Exception
  2. Within dbt-redshift, handle that Exception as desired

These commits in pg8000 could be used as inspiration for how to catch this error within redshift_connector and re-raise it as a network error instead:

Related issues

Here's a listing of related issues across both redshift_connector and pg8000:

pg8000

redshift_connector

@dataders
Copy link
Contributor

dataders commented Jun 6, 2023

closing in favor of: aws/amazon-redshift-python-driver#164

@pdebelak I have confidence that we can get this fixed on the driver once and for all! thanks again for all your work reporting this

@dataders dataders closed this as not planned Won't fix, can't repro, duplicate, stale Jun 6, 2023
@nischay-merkle
Copy link

I'm facing the same issue with redshift_connector:
struct.error: unpack_from requires a buffer of at least 5 bytes for unpacking 5 bytes at offset 0 (actual buffer size is 0)
And is affecting our production setup.

@dataders
Copy link
Contributor

@nischay-merkle sorry to hear that! what version of the dbt-redshift adapter are you using? I am confident that should be resolved in the latest patch of 1.5, 1.5.8

@nischay-merkle
Copy link

@dataders I'm currently using: redshift-connector 2.0.903. It's a python Client library (https://pypi.org/project/redshift-connector/) for redshift.

@dbeatty10
Copy link
Contributor

@nischay-merkle are you experiencing a problem with dbt-redshift? If so, can you share the output from this command?

dbt --version

If you are using redshift_connector without using dbt-redshift, then you should subscribe to this issue instead:

TLDR

This error message happens when a network connection is lost unexpectedly, and here are Amazon's recommendations to avoid it.

@fix-a-thing
Copy link

I'm facing the same issue. And like in this comment, it happens at 10 minutes.

It does not affect production, but development. We run 1.2 in production and develop in the same. Here we get the error: SSL SYSCALL error: EOF detected. Again it happens after 10 minutes.

As we are planning to upgrade to 1.6 soon, I just got the latest rc version using:

  • Running with dbt=1.6.0-rc1
  • Registered adapter: redshift=1.6.0-rc1
  • dbt-redshift==1.6.0rc1

And here I'm getting: unpack_from requires a buffer of at least 5 bytes for unpacking 5 bytes at offset 0 (actual buffer size is 0)
I guess it is some connection timeout? It used to work, but now it doesn't, not sure what changed.

@dbeatty10
Copy link
Contributor

I guess it is some connection timeout?

Yep, that error message comes from a networking issue.

There's an issue open here to improve the error message, and here are Amazon's recommendations to avoid it.

@BeltranCunef
Copy link

BeltranCunef commented Aug 2, 2023

I'm facing the same issue, in my case I'm not using redshift-connector. My dbt versions are the following:

dbt-core 1.5.4 dbt-postgres = 1.5.4 dbt-redshift = 1.5.8

The error commented before on the post randomly raises, independently of the table. I'm just running some sql scripts in order to deploy changes on my databases. It only happens on GitHub, same jobs run correctly at DBT Cloud.

@nickagel
Copy link

I'm also facing this issue with 1.6

@dbeatty10
Copy link
Contributor

@BeltranCunef and @nickagel

As proposed here, it would be more helpful if the error message said something like this instead:

BrokenPipeError: Lost database connection

I have run into this error message myself several times, and I frequently forget what it means! 😰

That error message means that the database connection was lost while processing. It comes from redshift_connector which is the database driver that dbt-redshift uses for both 1.5.x and 1.6.x. That repo is maintained by AWS and not dbt Labs.

There's an issue open to improve the error message, and here are Amazon's recommendations to avoid it.

I'm going to leave this issue as closed since the best place to solve this is via aws/amazon-redshift-python-driver#164. Please feel free to upvote or comment on that issue.

@dnascimento
Copy link

dnascimento commented Sep 5, 2023

What can I do to avoid loosing database connection? Is that a parameter? Our model takes 30min to build in python

@davyto
Copy link

davyto commented Sep 5, 2023

we are also getting this error with dbt-core 1.6.1 and dbt-redshift 1.6.1

@Sairam90
Copy link

Sairam90 commented Dec 6, 2024

Getting this error when using redshift-connector ==2.1.1 and working with multiprocessing.. any ideas ?

@dbeatty10
Copy link
Contributor

dbeatty10 commented Dec 6, 2024

@Sairam90 dbt-core doesn't support safe parallel execution for multiple invocations in the same process. See below for more details:

If you are experiencing a problem with a supported feature of dbt-core / dbt-redshift, please open a new issue (and reference this one).

If you are getting an error message similar to "unpack_from requires a buffer ..." it is most likely a bug report that is appropriate to open in the amazon-redshift-python-driver repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working regression
Projects
None yet
Development

Successfully merging a pull request may close this issue.