Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for legacy in-memory storage behavior. In memory storage - persists encoding for upload / download #69

Closed
wants to merge 2 commits into from

Conversation

piiraa
Copy link
Contributor

@piiraa piiraa commented Sep 25, 2023

    def download
      # Here we a create buffer with default encoding,
      # but with old in-memory behavior, we just read the string value (encoding wasn't lost)
      buffer = StringIO.new
      stream.download(file: buffer).tap do |result|
        result.delete(:file)
        result[:content] = buffer.string
      end
    end

@piiraa piiraa force-pushed the file-streaming--encoding-fix branch from 62771e1 to 62300b5 Compare September 25, 2023 14:47
@ivgiuliani
Copy link
Contributor

Can we add a test that demonstrates the issue? Ideally one that fails before this change, and passes afterwards

Copy link

@stephenbinns stephenbinns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general looks good, but can we add a spec for the behaviour?

@piiraa
Copy link
Contributor Author

piiraa commented Sep 25, 2023

Test

RSpec.describe BucketStore, :integration do
  # ...
  
  context "inmemory storage" do
    let(:bucket_store) { described_class.for("#{base_bucket_uri}/#{temp_filename}") }
    let(:base_bucket_uri) { "inmemory://bucket" }
    let(:temp_filename) { "temp-file" }

    shared_examples "encoding persistance" do |encoding|
      it "persists encoding between upload/download for #{encoding}", :aggregate_failures do
        upload_data = String.new("some string", encoding: encoding)
        bucket_store.upload!(upload_data)
        download = bucket_store.download
        download_data = download.fetch(:content)

        expect(upload_data.encoding).to eq(encoding)
        expect(download_data.encoding).to eq(encoding)
        expect(upload_data).to eq(download_data)
      end
    end

    include_examples "encoding persistance", Encoding::UTF_8
    include_examples "encoding persistance", Encoding::ISO_8859_1
  end
end

v0.5.0 (before streaming)

➜  bucket-store git:(v0.5.0) ✗ bundle exec rspec --tag integration spec/bucket_store_integration_spec.rb   
Run options: include {:integration=>true}

Randomized with seed 42711
..

Finished in 0.00298 seconds (files took 0.7375 seconds to load)
2 examples, 0 failures

Randomized with seed 42711

v0.6.0 (streaming)

➜  bucket-store git:(v0.6.0) bundle exec rspec --tag integration spec/bucket_store_integration_spec.rb
Run options: include {:integration=>true}

Randomized with seed 39485
.F

Failures:

  1) BucketStore inmemory storage persists encoding between upload/download for ISO-8859-1
     Failure/Error: expect(download_data.encoding).to eq(encoding)
     
       expected: #<Encoding:ISO-8859-1>
            got: #<Encoding:UTF-8>
     
       (compared using ==)
     
       Diff:
       @@ -1 +1 @@
       -#<Encoding:ISO-8859-1>
       +#<Encoding:UTF-8>
       
     Shared Example Group: "encoding persistance" called from ./spec/bucket_store_integration_spec.rb:48
     # ./spec/bucket_store_integration_spec.rb:41:in `block (4 levels) in <top (required)>'

Finished in 0.01427 seconds (files took 0.8374 seconds to load)
2 examples, 1 failure

Failed examples:

rspec './spec/bucket_store_integration_spec.rb[1:1:2]' # BucketStore inmemory storage persists encoding between upload/download for ISO-8859-1

Randomized with seed 39485

Current PR

➜  bucket-store git:(file-streaming--encoding-fix) bundle exec rspec --tag integration spec/bucket_store_integration_spec.rb
Run options: include {:integration=>true}

Randomized with seed 53800
..

Finished in 0.00406 seconds (files took 0.81981 seconds to load)
2 examples, 0 failures

Randomized with seed 53800

cc @ivgiuliani

Comment on lines +193 to +196
upload_data = String.new("some string", encoding: encoding)
bucket_store.upload!(upload_data)
download = bucket_store.download
download_data = download.fetch(:content)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With let(:upload_data) string encoding bleeds out into the next test and fails 🤷

@piiraa piiraa requested a review from stephenbinns September 25, 2023 16:20
Copy link
Contributor

@ivgiuliani ivgiuliani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to the in memory adapter are fine, but we want to ensure that there's consistent behaviour between the different adapters. After this change the in-memory adapter will store the encoding but other adapters won't, which means that depending which adapter we're using, for the same input we'll get a different output. You can test this by moving the spec in the shared example group and test it against the other adapters (you can run integration tests with rspec --tag integrations, there's instructions in the README).

Practically it means we need to do something equivalent to what we have here for all the other adapters (gcs, s3 and disk). I had a brief look at GCS and they allow specifying an encoding on upload, but we need to look into the rest of the adapters...

Comment on lines +186 to +203
context "inmemory storage" do
let(:bucket_store) { described_class.for("#{base_bucket_uri}/#{temp_filename}") }
let(:base_bucket_uri) { "inmemory://bucket" }
let(:temp_filename) { "temp-file" }

shared_examples "encoding persistance" do |encoding|
it "persists encoding between upload/download for #{encoding}", :aggregate_failures do
upload_data = String.new("some string", encoding: encoding)
bucket_store.upload!(upload_data)
download = bucket_store.download
download_data = download.fetch(:content)

expect(upload_data.encoding).to eq(encoding)
expect(download_data.encoding).to eq(encoding)
expect(upload_data).to eq(download_data)
expect(upload_data.object_id).to_not eq(download_data.object_id)
end
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be part of the shared examples and not specific of the in-memory adapter as otherwise we can't guarantee that the adapters behave in the same way.

@piiraa
Copy link
Contributor Author

piiraa commented Sep 26, 2023

The changes to the in memory adapter are fine, but we want to ensure that there's consistent behaviour between the different adapters. After this change the in-memory adapter will store the encoding but other adapters won't, which means that depending which adapter we're using, for the same input we'll get a different output. You can test this by moving the spec in the shared example group and test it against the other adapters (you can run integration tests with rspec --tag integrations, there's instructions in the README).

Practically it means we need to do something equivalent to what we have here for all the other adapters (gcs, s3 and disk). I had a brief look at GCS and they allow specifying an encoding on upload, but we need to look into the rest of the adapters...

#create_file, content_encoding ?
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Encoding
I think this is not related to character encoding, but to the file overall

We could add something like

    BucketStore.for("inmemory://bucket/path/x", encoding: Encoding::ISO_8859_1).download
    # or
    BucketStore.for("inmemory://bucket/path/y").download(encoding: Encoding::ISO_8859_1)

Because right now we can influence download encoding with

buffer = StringIO.new do |io|
    io.set_encoding(Encoding::ISO_8859_1)
    BucketStore.for("inmemory://bucket/path/z").stream.download(file: io)
end

But then again, we can use this example if we need other encodings

Fixed issues from the implementation side 👍

@piiraa piiraa closed this Sep 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants