Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] dbt-postgres transaction management: extraneous BEGIN when listing schemas #9434

Open
2 tasks done
Michael-cd30 opened this issue Jan 24, 2024 · 10 comments
Open
2 tasks done
Labels
bug Something isn't working help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors Team:Adapters Issues designated for the adapter area of the code transactions Issues relating to the adapter's transaction behavior

Comments

@Michael-cd30
Copy link

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

"dbt run" on postgres tries to open transactions while another is already in progress.
As a result, PostgreSQL displays warnings.

A run on a single model raises 24 warnings.

Expected Behavior

"dbt run" should not try to run neested transactions on PostgreSQL.

Steps To Reproduce

  • run a "dbt run" command on one model
  • look at PG logs
  • you can also set log_statement to all to see the "BEGIN" statements sent to PG

Relevant log output

2024-01-24 12:09:27 2024-01-24 11:09:27.210 UTC [8955] WARNING:  there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.224 UTC [8956] WARNING:  there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.224 UTC [8957] WARNING:  there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.225 UTC [8954] WARNING:  there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.268 UTC [8958] WARNING:  there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.282 UTC [8961] WARNING:  there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.300 UTC [8959] WARNING:  there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.300 UTC [8960] WARNING:  there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.327 UTC [8963] WARNING:  there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.331 UTC [8962] WARNING:  there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.351 UTC [8965] WARNING:  there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.362 UTC [8964] WARNING:  there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.374 UTC [8966] WARNING:  there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.393 UTC [8967] WARNING:  there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.448 UTC [8970] WARNING:  there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.449 UTC [8968] WARNING:  there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.449 UTC [8969] WARNING:  there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.499 UTC [8971] WARNING:  there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.501 UTC [8972] WARNING:  there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.502 UTC [8973] WARNING:  there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.549 UTC [8974] WARNING:  there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.570 UTC [8974] WARNING:  there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.654 UTC [8975] WARNING:  there is already a transaction in progress
2024-01-24 12:09:27 2024-01-24 11:09:27.767 UTC [8976] WARNING:  there is already a transaction in progress

Environment

- OS: docker / debian
- Python: 3.11
- dbt: 1.7.4
- dbt-postgres: 1.7.4

Which database adapter are you using with dbt?

postgres

Additional Context

No response

@Michael-cd30 Michael-cd30 added bug Something isn't working triage labels Jan 24, 2024
@dbeatty10 dbeatty10 added the Team:Adapters Issues designated for the adapter area of the code label Jan 24, 2024
@dbeatty10
Copy link
Contributor

Thanks for reaching out @Michael-cd30 !

I can see why you'd want to avoid extraneous warning messages in your PG logs. However, unless it it causing errors while building data sets in dbt this is unlikely to be a priority for us. Accordingly, I'm going to label this as "help wanted" for a motivated community member to solve.

@dbeatty10 dbeatty10 added help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors and removed triage labels Jan 24, 2024
@alison985
Copy link

@Michael-cd30 How many threads do you have your dbt project configured for? If you configured dbt for 24 threads but there's a limit on the db user to 1 thread, I could see this scenario happening.

@Michael-cd30
Copy link
Author

Michael-cd30 commented Feb 1, 2024

Hi @alison985

The used profile (targetdev in screenshot) is configured to run 4 threads.
And, for the test I've run only one model so I don't think the problem deals with concurrency.

image

@alison985
Copy link

@Michael-cd30 Technically a database user account could be running queries outside of dbt simultaneously. Also, technically, in some places you can limit total concurrent threads on the database regardless of user. However, I assume you already accounted for those.

I also agree 1 model creation, with no dependencies or tests, having 24 warning messages is strange. It would be great to see

I would have expected to start seeing an exponential back-off timestamps for something that's trying to get a thread. Other things that stand out to me as differences from my normal work(and therefore may be related) are:

  • You seem to be running Microsoft
  • It's odd to me that the dbt-postgres and dbt versions line up exactly. I would have expected to see a difference.
  • Some of the milliseconds on the warning timestamps are the same for two rows consecutively.
  • You're running a dbt run --select to a filename instead of the name of the model

Any chance that model has sub-queries or CTEs in it? Or a loop or macro it's calling?

Though honestly, at this point I'd personally go to the database server logs instead of the dbt logs. Figure out what is happening there and then figure out what dbt must be doing to cause that message.

@Michael-cd30
Copy link
Author

@alison985 - We first saw the problem on our servers : a dedicated PostgreSQL server + a Docker server from which Apache Airflow runs DBT. Both are Linux.

In order to write this issue, I've tested in a simpler context, directly from my Windows workstation. In this simple case, the PostgreSQL service runs locally, so I'm sure that no other command is running in parallel.

The query uses CTE as recommended, but remains very simple and loop-free.

The dbt-postgres connector is maintained by dbt-labs and the code is in the same repository as the core, which probably explains why the version is identical, see : https://github.com/dbt-labs/dbt-core/tree/main/plugins/postgres/dbt/adapters/postgres

I've also looked at PostgreSQL logs activating log_statement = 'all' in order to see commands received by the server. It seems that DBT starts by analyzing the contents of all schemas. This analysis starts with two "BEGIN" for each schema. One would be enough and, because the analysis is a simple SELECT, it doesn't even seem useful to open transactions.

Here's an extract:

2024-02-01 23:21:11.910 CET [25336] LOG: instruction : BEGIN
2024-02-01 23:21:11.910 CET [25336] LOG: instruction : BEGIN
2024-02-01 23:21:11.910 CET [25336] ATTENTION: une transaction est déjà en cours
2024-02-01 23:21:11.912 CET [25336] LOG: instruction : /* {"app": "dbt", "dbt_version": "1.7.4", "profile_name": "taradata", "target_name": "localdev", "connection_name": "list_taradata_wrh_administratif"} */
select
'taradata' as database,
tablename as name,
schemaname as schema,
'table' as type
from pg_tables
where schemaname ilike 'wrh_administratif'
union all
select
'taradata' as database,
viewname as name,
schemaname as schema,
'view' as type
from pg_views
where schemaname ilike 'wrh_administratif'
union all
select
'taradata' as database,
matviewname as name,
schemaname as schema,
'materialized_view' as type
from pg_matviews
where schemaname ilike 'wrh_administratif'

2024-02-01 23:21:11.917 CET [9672] LOG: instruction : BEGIN
2024-02-01 23:21:11.917 CET [9672] LOG: instruction : BEGIN
2024-02-01 23:21:11.917 CET [9672] ATTENTION: une transaction est déjà en cours
2024-02-01 23:21:11.918 CET [23792] LOG: instruction : BEGIN
2024-02-01 23:21:11.919 CET [23792] LOG: instruction : BEGIN
2024-02-01 23:21:11.919 CET [23792] ATTENTION: une transaction est déjà en cours
2024-02-01 23:21:11.919 CET [9672] LOG: instruction : /* {"app": "dbt", "dbt_version": "1.7.4", "profile_name": "taradata", "target_name": "localdev", "connection_name": "list_taradata_stg_data_laregion_fr"} */
select
'taradata' as database,
tablename as name,
schemaname as schema,
'table' as type
from pg_tables
where schemaname ilike 'stg_data_laregion_fr'
union all
select
'taradata' as database,
viewname as name,
schemaname as schema,
'view' as type
from pg_views
where schemaname ilike 'stg_data_laregion_fr'
union all
select
'taradata' as database,
matviewname as name,
schemaname as schema,
'materialized_view' as type
from pg_matviews
where schemaname ilike 'stg_data_laregion_fr'

2024-02-01 23:21:11.919 CET [22488] LOG: instruction : BEGIN
2024-02-01 23:21:11.919 CET [22488] LOG: instruction : BEGIN

and so on...

I haven't tested but I'm quite sure the more "schema" my dbt-project will have, the more warning I'll receive.

Regards,

@manticoreroko
Copy link

We have a similar problem in greenplum during dbt execution.
WARNING ERROR_TEXT: there is already a transaction in progress.
Example dbt execute: dbt build --models folder.chain_folder.*

@dbeatty10 dbeatty10 changed the title [Bug] dbt-postgres transaction management [Bug] dbt-postgres transaction management: extraneous BEGIN when listing schemas Feb 2, 2024
@dbeatty10
Copy link
Contributor

Thanks for that info @Michael-cd30 and @manticoreroko !

We're unable to be prioritize this ourselves right now, but we'd be willing to give a look at a PR submitted by a community member.

Here's a couple places to start looking:

Theoretically, the code is already supposed to check if there's already a known transaction that is open or not, but it must not be working somehow.

I'd suggest setting threads: 1 within profiles.yml just to be sure multiple threads aren't somehow related.

@Michael-cd30
Copy link
Author

Same with 1 thread.

image

@thisisandreeeee
Copy link

@Michael-cd30 were you able to find a fix for this?

@rm-minus-r-star
Copy link

rm-minus-r-star commented Dec 4, 2024

UPDATE: Turns out the below is misleading. I just tried again from the command line with threads = 1, and it failed. So the problem is that the behaviour is random. There appears to in fact be no dependence or logic based on threads and container vs command line. Why does dbt apparently sometimes (and in the case below, almost always) throw transactions at the DB too frequently?


I am having a similar, puzzling experience.

Running with dbt=1.8.8
Registered adapter: postgres=1.8.2

I am ultimately needing dbt to run within a docker container. The container runs everything fine except snapshot on my largest new set of data. That is, on smaller new data ingest sets, snapshot runs fine in the same container. Everything else within the container -- all other seeds and runs -- work fine. This is the case irrespective of whether I have threads set to 4 or 1 (which I tried on the advice in the thread above).

The failure on the client side is

21:15:57    Database Error in snapshot snapshot_dataset (snapshots/snapshot_dataset.sql)
  server closed the connection unexpectedly
  	This probably means the server terminated abnormally
  	before or while processing the request.
  server closed the connection unexpectedly
  	This probably means the server terminated abnormally
  	before or while processing the request.

After two weeks of badgering my postgresql host, I've finally got information from the logs corresponding to this failure:

1732050632811 PG:2024-11-19 21:10:32.036 GMT ttid=... sqlstate=00000 [840] LOG:  connection received: host=.... port=5968
1732050632812 PG:2024-11-19 21:10:32.039 GMT ttid=... sqlstate=00000 [840] LOG:  connection authenticated: identity="..." method=md5 (/var/db/postgres/compute/pgdata/pg_hba.conf:100)
1732050632812 PG:2024-11-19 21:10:32.039 GMT ttid=... sqlstate=00000 [840] LOG:  connection authorized: user=... database=... application_name=dbt
1732050632812 PG:2024-11-19 21:10:32.370 GMT ttid=... sqlstate=25001 [840] WARNING:  there is already a transaction in progress
1732050632982 PG:2024-11-19 21:10:32.938 GMT ttid=... sqlstate=00000 [840] LOG:  disconnection: session time: 0:00:00.902 user=... database=... host=.... port=5968

From the command line (dbt snapshot ...), instead of from the container, the failing snapshot works with threads = 1, but with threads = 4 I get the same response from the server. As that error response is uninformative and it's difficult to get help from the host, it's unknown whether it is the same "there is already a transaction in progress" or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors Team:Adapters Issues designated for the adapter area of the code transactions Issues relating to the adapter's transaction behavior
Projects
None yet
Development

No branches or pull requests

6 participants