Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

duckdb 1.1 delta reader ignore azure secret #83

Closed
djouallah opened this issue Sep 9, 2024 · 14 comments · Fixed by #88
Closed

duckdb 1.1 delta reader ignore azure secret #83

djouallah opened this issue Sep 9, 2024 · 14 comments · Fixed by #88

Comments

@djouallah
Copy link

djouallah commented Sep 9, 2024

azure secret works fine when using parquet but are not passed when using delta_scan

this works

token =notebookutils.credentials.getToken('storage')
storage_options={"bearer_token": token, "use_fabric_endpoint": "true"}
import duckdb
duckdb.sql(f"""
INSTALL azure ;
LOAD azure;
CREATE or replace SECRET secret4 (
    TYPE AZURE,
    PROVIDER ACCESS_TOKEN,
    ACCESS_TOKEN '{token}' ,
    ACCOUNT_NAME 'onelake'
);
""")
duckdb.sql(" select * from parquet_scan('abfss://[email protected]/ETL.Lakehouse/Tables/dbo/results/*.parquet') ")

this does not works ( although it give wrong results as it is not using the delta log to scan only relevant parquet files)

token =notebookutils.credentials.getToken('storage')
storage_options={"bearer_token": token, "use_fabric_endpoint": "true"}
import duckdb
duckdb.sql(f"""
INSTALL azure ;
LOAD azure;
CREATE or replace SECRET secret4 (
    TYPE AZURE,
    PROVIDER ACCESS_TOKEN,
    ACCESS_TOKEN '{token}' ,
    ACCOUNT_NAME 'onelake'
);
""")
duckdb.sql(" select * from delta_scan('abfss://[email protected]/ETL.Lakehouse/Tables/dbo/results') ")

IOException: IO Error: Hit DeltaKernel FFI error (from: While trying to read from delta table: 'abfss://[email protected]/ETL.Lakehouse/Tables/dbo/results/'): Hit error: 8 (ObjectStoreError) with message (Error interacting with object store: Generic MicrosoftAzure error: Client error with status 401 Unauthorized: <?xml version="1.0" encoding="utf-8"?><Error><Code>NoAuthenticationInformation</Code><Message>Server failed to authenticate the request. Please refer to the information in the www-authenticate header.
RequestId:f47557aa-c01e-0091-2a10-03480f000000
Time:2024-09-09T23:33:20.7177725Z</Message></Error>)

@djouallah djouallah changed the title duckdb 1.1 delta reader don't use azure secret duckdb 1.1 delta reader ignore azure secret Sep 9, 2024
@gdubya
Copy link
Contributor

gdubya commented Sep 10, 2024

It works for me using the credential chain / CLI. Maybe it's something with the token auth?

EDIT: I'm also testing against a normal ADLS account, not Fabric. I'll test against Fabric later this evening.

@Tikavdm
Copy link

Tikavdm commented Sep 10, 2024

I agree with @djouallah I cannot get it work with with delta_scan, but seems like parquet_scan works:

Simple example of trying to use the new token-based auth with Azure extension. This works exactly like this with a CONNECTION STRING, but using the new token-based auth I get an error, which I'm not sure how to figure out if it is a security policy on my side or something not working with the latest extension. I'm using Duckdb 1.1.0

credential = AzureMLOnBehalfOfCredential()
os.environ['AZURE_ACCESS_TOKEN'] = credential.get_token("https://storage.azure.com/").token

conn = duckdb.connect(':memory:')
conn.install_extension("delta")
conn.load_extension("delta")
conn.install_extension("azure")
conn.load_extension("azure")
conn.execute("""SET azure_transport_option_type =  curl""")
conn.execute(f"""CREATE SECRET secret1 (
    TYPE AZURE,
    PROVIDER ACCESS_TOKEN,
    ACCESS_TOKEN '{os.getenv('AZURE_ACCESS_TOKEN')}',
    ACCOUNT_NAME '{os.getenv('AZURE_STORAGE_ACCOUNT')}'
);""")

full_path=f'abfss://{CONTAINER_NAME}/{delta_table_path}'
query = f"SELECT * FROM delta_scan('{full_path}') LIMIT 3"
conn.execute(query).fetchdf()

Error I get:

Traceback (most recent call last) Cell In [43], line 37 35 full_path=f'abfss://{CONTAINER_NAME}/{delta_table_path}' 36 query = f"SELECT * FROM delta_scan('{full_path}') LIMIT 3" ---> 37 conn.execute(query).fetchdf() IOException: IO Error: Hit DeltaKernel FFI error (from: While trying to read from delta table: 'abfss://xxx'): Hit error: 8 (ObjectStoreError) with message (Error interacting with object store: Generic MicrosoftAzure error: Client error with status 401 Unauthorized: <?xml version="1.0" encoding="utf-8"?><Error><Code>NoAuthenticationInformation</Code><Message>Server failed to authenticate the request. Please refer to the information in the www-authenticate header.

@Tikavdm
Copy link

Tikavdm commented Sep 10, 2024

@djouallah, I'm also just checking: Does using the parquet_scan still provide the correct output, even though it is a delta_table? This can then at least be a workaround (I know you will probably lose some delta table functionality, but as long as it reads the latest version of the data, it is potentially a workaround for now).

@djouallah
Copy link
Author

@Tikavdm no, you should never ever use parquet scan to read a delta table, you will get wrong results if the table had delete or compact, optimize operation, I used that example only to show the issue with authentication

@gdubya
Copy link
Contributor

gdubya commented Sep 10, 2024

Confirmed i get the same problem using an ACCESS_TOKEN against my storage account, so it sounds like a problem with that feature.

@gdubya
Copy link
Contributor

gdubya commented Sep 10, 2024

Oh, that's right. This code needs to be updated to handle access tokens since that is a new feature for the azure extension.

duckdb/duckdb-azure#64

@djouallah
Copy link
Author

still same issue, using deltalake extension 0.2

IOException: IO Error: Hit DeltaKernel FFI error (from: While trying to read from delta table: 'abfss://[email protected]/ETL.Lakehouse/Tables/T2/chdb/'): Hit error: 8 (ObjectStoreError) with message (Error interacting with object store: The operation lacked valid authentication credentials for path ETL.Lakehouse/Tables/T2/chdb/_delta_log/_last_checkpoint: Client error with status 401 Unauthorized: <?xml version="1.0" encoding="utf-8"?><Error><Code>InvalidAuthenticationInfo</Code><Message>Server failed to authenticate the request. Please refer to the information in the www-authenticate header.
RequestId:d2946d3c-601e-001f-7194-0407ae000000
Time:2024-09-11T21:48:26.8575298Z</Message><AuthenticationErrorDetail>Issuer validation failed. Issuer did not match.</AuthenticationErrorDetail></Error>)

@gdubya
Copy link
Contributor

gdubya commented Sep 12, 2024

@djouallah have you tried the nightly?

@djouallah
Copy link
Author

@gdubya no but i am using the latest delta extension which 0.2, it should have the fix right

@gdubya
Copy link
Contributor

gdubya commented Sep 12, 2024

Ah, yes, you're right (according to https://github.com/duckdb/duckdb_delta/releases/tag/v0.2.0). Weird. I'll test it again.

@hirsimaki-markus
Copy link

hirsimaki-markus commented Sep 12, 2024

Is this a related issue #91? And if there is some workaround that applies to #91 it would be great

@gdubya
Copy link
Contributor

gdubya commented Sep 12, 2024

@hirsimaki-markus no, that's a different issue. This one is specifically related to access tokens, but you're using a service principal.

@Tikavdm
Copy link

Tikavdm commented Sep 12, 2024

For the record mine works now!! Thanks @gdubya . So token-based authentication works with the latest duckdb 1.1.0 and the updated extension of delta.

@gdubya
Copy link
Contributor

gdubya commented Sep 12, 2024

@djouallah It looks like is now a Fabric issue, so you can reopen #43

I tested by copying a Delta table from Fabric to a regular ADLS account. ADLS works, Fabric does not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants