Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline fails when reading gzipped files in GCS #786

Open
benjamin-awd opened this issue Nov 13, 2024 · 0 comments
Open

Pipeline fails when reading gzipped files in GCS #786

benjamin-awd opened this issue Nov 13, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@benjamin-awd
Copy link
Contributor

Came across a weird issue trying to read gzipped NDJSON files in GCS -- the pipeline fails with:

{"message":"Error: Generic GCS error: Header: Content-Length Header missing from response. Retrying..."},"target":"arroyo_storage"}

Sending a HEAD request to the actual object shows that the Content-Length indeed is missing

curl -I \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
 "https://storage.googleapis.com/$BUCKET_PATH/1731283272-ac141e8d-2350-47d0-9b97-74faf53bb407.log.gz"

> HTTP/2 200 
cache-control: private, max-age=0
last-modified: Mon, 11 Nov 2024 00:01:12 GMT
vary: Accept-Encoding
x-goog-generation: 1731283272904269
x-goog-metageneration: 1
x-goog-stored-content-encoding: gzip
x-goog-stored-content-length: 4719827
content-type: application/x-ndjson

I managed to bypass the issue by modifying the upstream object_store crate to use x-goog-stored-content-length, but not sure if that's the correct way to go about it

let content_length = headers
        .get(CONTENT_LENGTH)
        .or_else(|| headers.get("x-goog-stored-content-length"))
        .context(MissingContentLengthSnafu)?;

Is there something in Arroyo that can handle this?

(Could be potentially be a similar issue as apache/libcloud#1544)

@mwylde mwylde added the bug Something isn't working label Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants