Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] TypeError: can not serialize 'Undefined' object dbt=1.8.2 and databricks 1.8.0 #10270

Closed
2 tasks done
thuymo87 opened this issue Jun 7, 2024 · 7 comments
Closed
2 tasks done
Labels
awaiting_response bug Something isn't working

Comments

@thuymo87
Copy link

thuymo87 commented Jun 7, 2024

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

(myenv) monguyen@LNZM-C02DV0L3ML data_platform % dbt run -s published_accounts_history
03:15:11 Running with dbt=1.8.2
using legacy validation callback
03:15:14 Registered adapter: databricks=1.8.0
03:15:14 Unable to do partial parsing because saved manifest not found. Starting full parse.
03:15:17 Encountered an error:
can not serialize 'Undefined' object
03:15:17 Traceback (most recent call last):
File "/Users/monguyen/Documents/GitHub/data-platform-airflow-v2/dp_airflow/dbt/data_platform/myenv/lib/python3.12/site-packages/dbt/cli/requires.py", line 138, in wrapper
result, success = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/monguyen/Documents/GitHub/data-platform-airflow-v2/dp_airflow/dbt/data_platform/myenv/lib/python3.12/site-packages/dbt/cli/requires.py", line 101, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/monguyen/Documents/GitHub/data-platform-airflow-v2/dp_airflow/dbt/data_platform/myenv/lib/python3.12/site-packages/dbt/cli/requires.py", line 218, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/monguyen/Documents/GitHub/data-platform-airflow-v2/dp_airflow/dbt/data_platform/myenv/lib/python3.12/site-packages/dbt/cli/requires.py", line 247, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/monguyen/Documents/GitHub/data-platform-airflow-v2/dp_airflow/dbt/data_platform/myenv/lib/python3.12/site-packages/dbt/cli/requires.py", line 294, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/monguyen/Documents/GitHub/data-platform-airflow-v2/dp_airflow/dbt/data_platform/myenv/lib/python3.12/site-packages/dbt/cli/requires.py", line 320, in wrapper
ctx.obj["manifest"] = parse_manifest(
^^^^^^^^^^^^^^^
File "/Users/monguyen/Documents/GitHub/data-platform-airflow-v2/dp_airflow/dbt/data_platform/myenv/lib/python3.12/site-packages/dbt/parser/manifest.py", line 1898, in parse_manifest
manifest = ManifestLoader.get_full_manifest(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/monguyen/Documents/GitHub/data-platform-airflow-v2/dp_airflow/dbt/data_platform/myenv/lib/python3.12/site-packages/dbt/parser/manifest.py", line 330, in get_full_manifest
manifest = loader.load()
^^^^^^^^^^^^^
File "/Users/monguyen/Documents/GitHub/data-platform-airflow-v2/dp_airflow/dbt/data_platform/myenv/lib/python3.12/site-packages/dbt/parser/manifest.py", line 525, in load
self.write_manifest_for_partial_parse()
File "/Users/monguyen/Documents/GitHub/data-platform-airflow-v2/dp_airflow/dbt/data_platform/myenv/lib/python3.12/site-packages/dbt/parser/manifest.py", line 811, in write_manifest_for_partial_parse
manifest_msgpack = self.manifest.to_msgpack(extended_mashumaro_encoder)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 3, in mashumaro_to_msgpack
File "", line 85, in mashumaro_to_msgpack
File "/Users/monguyen/Documents/GitHub/data-platform-airflow-v2/dp_airflow/dbt/data_platform/myenv/lib/python3.12/site-packages/dbt/parser/manifest.py", line 150, in extended_mashumaro_encoder
return msgpack.packb(data, default=extended_msgpack_encoder, use_bin_type=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/monguyen/Documents/GitHub/data-platform-airflow-v2/dp_airflow/dbt/data_platform/myenv/lib/python3.12/site-packages/msgpack/init.py", line 36, in packb
return Packer(**kwargs).pack(o)
^^^^^^^^^^^^^^^^^^^^^^^^
File "msgpack/_packer.pyx", line 294, in msgpack._cmsgpack.Packer.pack
File "msgpack/_packer.pyx", line 300, in msgpack._cmsgpack.Packer.pack
File "msgpack/_packer.pyx", line 297, in msgpack._cmsgpack.Packer.pack
File "msgpack/_packer.pyx", line 231, in msgpack._cmsgpack.Packer._pack
File "msgpack/_packer.pyx", line 231, in msgpack._cmsgpack.Packer._pack
File "msgpack/_packer.pyx", line 231, in msgpack._cmsgpack.Packer._pack
[Previous line repeated 1 more time]
File "msgpack/_packer.pyx", line 264, in msgpack._cmsgpack.Packer._pack
File "msgpack/_packer.pyx", line 291, in msgpack._cmsgpack.Packer._pack
TypeError: can not serialize 'Undefined' object

Expected Behavior

It should create a table with command: dbt run -s published_accounts_history

Steps To Reproduce

(1) In python virtual environment, run "dbt debug" and all check passed:
03:14:35 Running with dbt=1.8.2
03:14:35 dbt version: 1.8.2
03:14:35 python version: 3.12.3
03:14:35 python path: /Users/monguyen/Documents/GitHub/data-platform-airflow-v2/dp_airflow/dbt/data_platform/myenv/bin/python
03:14:35 os info: macOS-14.5-x86_64-i386-64bit
using legacy validation callback
03:14:37 Using profiles dir at /Users/monguyen/Documents/GitHub/data-platform-airflow-v2/dp_airflow/dbt/data_platform
03:14:37 Using profiles.yml file at /Users/monguyen/Documents/GitHub/data-platform-airflow-v2/dp_airflow/dbt/data_platform/profiles.yml
03:14:37 Using dbt_project.yml file at /Users/monguyen/Documents/GitHub/data-platform-airflow-v2/dp_airflow/dbt/data_platform/dbt_project.yml
03:14:37 adapter type: databricks
03:14:37 adapter version: 1.8.0
03:14:38 Configuration:
03:14:38 profiles.yml file [OK found and valid]
03:14:38 dbt_project.yml file [OK found and valid]
03:14:38 Required dependencies:
03:14:38 - git [OK found]

03:14:38 Connection:
03:14:38 host:.........
03:14:38 http_path: /sql/1.0/warehouses/500xx
03:14:38 catalog: main
03:14:38 schema: test_mo
03:14:38 Registered adapter: databricks=1.8.0
03:14:48 Connection test: [OK connection ok]

03:14:48 All checks passed!

(2) Run "dbt run -s published_accounts_history " and it show error:
"03:15:14 Unable to do partial parsing because saved manifest not found. Starting full parse.
03:15:17 Encountered an error:
can not serialize 'Undefined' object"

Relevant log output

(myenv) monguyen@LNZM-C02DV0L3ML data_platform % pip list
Package                   Version
------------------------- -----------
agate                     1.9.1
annotated-types           0.7.0
attrs                     23.2.0
Babel                     2.15.0
cachetools                5.3.3
certifi                   2024.6.2
charset-normalizer        3.3.2
click                     8.1.7
colorama                  0.4.6
daff                      1.3.46
databricks-sdk            0.17.0
databricks-sql-connector  3.1.2
dbt-adapters              1.1.1
dbt-common                1.0.4
dbt-core                  1.8.2
dbt-databricks            1.8.0
dbt-extractor             0.5.1
dbt-postgres              1.8.0
dbt-semantic-interfaces   0.5.1
dbt-spark                 1.8.0
et-xmlfile                1.1.0
google-auth               2.30.0
idna                      3.7
importlib-metadata        6.11.0
isodate                   0.6.1
jaraco.classes            3.4.0
jaraco.context            5.3.0
jaraco.functools          4.0.1
Jinja2                    3.1.4
jsonschema                4.22.0
jsonschema-specifications 2023.12.1
keyring                   25.2.1
leather                   0.4.0
Logbook                   1.5.3
lz4                       4.3.3
MarkupSafe                2.1.5
mashumaro                 3.13
minimal-snowplow-tracker  0.0.2
more-itertools            10.2.0
msgpack                   1.0.8
networkx                  3.3
numpy                     1.26.4
oauthlib                  3.2.2
openpyxl                  3.1.3
packaging                 24.0
pandas                    2.1.4
parsedatetime             2.6
pathspec                  0.12.1
pip                       24.0
protobuf                  4.25.3
psycopg2-binary           2.9.9
pyarrow                   14.0.2
pyasn1                    0.6.0
pyasn1_modules            0.4.0
pydantic                  2.7.3
pydantic_core             2.18.4
python-dateutil           2.9.0.post0
python-slugify            8.0.4
pytimeparse               1.1.8
pytz                      2024.1
PyYAML                    6.0.1
referencing               0.35.1
requests                  2.32.3
rpds-py                   0.18.1
rsa                       4.9
six                       1.16.0
sqlparams                 6.0.1
sqlparse                  0.5.0
text-unidecode            1.3
thrift                    0.16.0
typing_extensions         4.12.1
tzdata                    2024.1
urllib3                   2.2.1

Environment

- OS:MacOS
- Python:3.12.3
- dbt:1.8.1
- Database adapter: Databricks

Which database adapter are you using with dbt?

other (mention it in "Additional Context")

Additional Context

Running dbt model from local to create table on Databricks

@thuymo87 thuymo87 added bug Something isn't working triage labels Jun 7, 2024
@dbeatty10
Copy link
Contributor

Thanks for reaching out @thuymo87.

Could you share a simple set of dbt project files to help us try to reproduce the issue you are seeing? That way we can try to isolate where that error is coming from.

Something like this:

models/published_accounts_history.sql

select 1 as id

@dbeatty10 dbeatty10 changed the title [Bug] <Unable to do partial parsing because saved manifest not found dbt=1.8.2 and databricks 1.8.0> [Bug] TypeError: can not serialize 'Undefined' object dbt=1.8.2 and databricks 1.8.0 Jun 7, 2024
@thuymo87
Copy link
Author

thuymo87 commented Jun 8, 2024

Hi dbeauty,
I resolved the issue after downgrade to python 3.11.9. Is it just because of the compatibility?
Cheers
Mo

@dbeatty10
Copy link
Contributor

Glad to hear that you were able to get it to work (even it it required downgrading to Python 3.11).

Our Python compatibility matrix indicates that v1.8 supports Python 3.12, but there might be a specific edge case that you ran into within your published_accounts_history model.

If you are able to create a simplified model that replicate the issue in Python 3.12 and share your code, that would help us fix such an edge case.

I don't know if they are related to your situation at all, but here's two other cases that got a similar error message (but each were presumably not using Python 3.12):

@thuymo87
Copy link
Author

thuymo87 commented Jun 10, 2024

Hi dbeauty10,

This is published_accounts_history.sql:

{{ config(
    alias = 'accounts_history',
    materialized='view',
    tags=["published_analytics"]
) }}

SELECT
  account_id,
  first_currency_selection,

FROM {{ source('transformed_customers', 'accounts_history') }}

And this is sources.yml:

version: 2
sources:
  - name: transformed_customers
    database: athena_production
    schema: transformed_customers
    tables:
      - name: accounts
      - name: accounts_history

Cheers
Mo

@dbeatty10
Copy link
Contributor

Thanks for providing that detail, Mo.

I tried to reproduce the error you got by switching to Python 3.12.3 using pyenv and then installing dbt-databricks~=1.8.0, but I wasn't able to trigger that error.

Do you get the same error if you simplify your models/published_accounts_history.sql like below? If it does still give an error, it will help simplify the example we are working with to narrow down what is causing the issue.

{{ config(
alias = 'accounts_history',
materialized='view',
tags=["published_analytics"]
) }}

SELECT
    1 as account_id,
    'NZD' as first_currency_selection

Copy link
Contributor

github-actions bot commented Sep 9, 2024

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Sep 9, 2024
Copy link
Contributor

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 16, 2024
@github-actions github-actions bot removed the stale Issues that have gone stale label Sep 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting_response bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants