Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug-1827506: Implement the storage interface for GCS. #2957

Merged
merged 19 commits into from
Jul 29, 2024

Conversation

smarnach
Copy link
Contributor

@smarnach smarnach commented Jul 9, 2024

This pull request implements the storage interface for Google Cloud Storage and makes the code backend-agnostic.

  • All tests that make use of storage backends are run for both S3 and GCS.
  • Each test now uses a dedicated bucket. This ensures test isolation, but most importantly the tests no longer interfere with the running devstack, which I found to be quite annoying. This is mainly achieved via the parametrized symbol_storage fixture.
  • The s3_helper and botomock fixtures are no longer needed and have been removed.
  • I switched the development environment to use GCS by default.

Next steps:

  • Decide what to do about the different Content-Encoding behaviour between S3 and GCS, as discussed in bug-1827506. I'm currently still waiting for a response from Google's support.
  • Decide what additional tests we need.
  • Update the documentation.
  • Add new environment variables to stage before merging this PR.

@smarnach smarnach requested a review from a team as a code owner July 9, 2024 13:07
@smarnach smarnach force-pushed the storage-interface-gcs branch from 6c31908 to 72bd4ec Compare July 9, 2024 13:19
@smarnach
Copy link
Contributor Author

smarnach commented Jul 9, 2024

Original comment

The issue with accessing the GCS emulator in the local dev environment has been resolved. I needed to introduce an nginx container to simulate a CDN anyway.

When testing this locally, there is a little inconvenience. For S3, we have this hack to make testing from the browser (and from the command line outside of the container) convenient. For the GCS emulator, the same hack doesn't work: We have to specify the host name it listens for public download requests on, and we can only specify a single host name. That single host name has to be gcs-emulator for the tests to work. Rewriting this to localhost doesn't help, since the GCS emulator will reject public requests for localhost.

On my system, I worked around the issues by adding gcs-emulator to my /etc/hosts file:

127.0.0.1 localhost gcs-emulator

If that also works on macOS, maybe that's good enough?

Some other solutions I can think of:

  • Patch the GCS emulator to allow multiple public hosts.
  • Run nginx in front of the GCS emulator and rewrite requests.
  • Proxy Docker's internal DNS resolver to the host computer.

All of these alternative solutions are more complex than I like.

@smarnach smarnach force-pushed the storage-interface-gcs branch 2 times, most recently from 9e74534 to 5ba4051 Compare July 15, 2024 09:09
requirements.in Outdated Show resolved Hide resolved
frontend/src/Common.js Outdated Show resolved Hide resolved
tecken/settings.py Outdated Show resolved Hide resolved
Base automatically changed from storage-interface to main July 16, 2024 08:50
@smarnach smarnach force-pushed the storage-interface-gcs branch from 5ba4051 to c23101d Compare July 16, 2024 09:33
@smarnach smarnach changed the base branch from main to remove-bucket-endpoint-url July 16, 2024 09:52
@smarnach
Copy link
Contributor Author

I split out all commits that are for other bugs into separate PRs. Many of the PRs needed to be filed against other branches, so we now have to merge the commits in this order:

The other PRs I filed are completely orthogonal to everything else and can be merged at any time.

@smarnach
Copy link
Contributor Author

Local testing is easier using the code in #2964 than this branch (see the comment above).

@smarnach smarnach force-pushed the remove-bucket-endpoint-url branch from 777a60d to 7a4ff40 Compare July 17, 2024 08:31
Base automatically changed from remove-bucket-endpoint-url to main July 17, 2024 10:04
@smarnach smarnach force-pushed the storage-interface-gcs branch from c23101d to f07eee1 Compare July 17, 2024 11:09
@smarnach smarnach marked this pull request as draft July 18, 2024 13:21
@smarnach smarnach force-pushed the storage-interface-gcs branch 3 times, most recently from d579d69 to ca471c8 Compare July 19, 2024 11:45
@smarnach smarnach marked this pull request as ready for review July 19, 2024 12:12
@smarnach smarnach force-pushed the storage-interface-gcs branch from ca471c8 to df1fc5e Compare July 19, 2024 13:16
Copy link
Contributor

@willkg willkg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is pretty substantial and makes some big changes to the codebase to make it possible to implement GCS storage. We knew going into this that it would be hard to review and require multiple back-and-forths. That's how big PRs go.

I left a few comments on minor things we should change that are worth doing in this PR.

There are two comments that probably require bigger changes:

The first is how StorageError is structured. I'm pretty sure the structure in this PR will cause us a lot of pain when looking at problems in Sentry. We should do something about that in this PR.

The second is how upload data is handled in tests. I think I want us to change that before I review the rest of the test code and validate the changes.

@@ -96,8 +96,8 @@ def possible_upload_urls(request):
context = {
"urls": [
{
"url": upload_backend.url,
"bucket_name": upload_backend.name,
"bucket_name": upload_backend.bucket,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gets used by the "upload now" ui view. I bet we could remove this ui view and the API endpoints it uses. I wrote up a bug to consider doing that.

https://bugzilla.mozilla.org/show_bug.cgi?id=1909013

tecken/ext/gcs/storage.py Outdated Show resolved Hide resolved
tecken/ext/gcs/storage.py Outdated Show resolved Hide resolved
tecken/ext/gcs/storage.py Outdated Show resolved Hide resolved
tecken/ext/gcs/storage.py Outdated Show resolved Hide resolved
tecken/libstorage.py Outdated Show resolved Hide resolved
tecken/settings.py Show resolved Hide resolved
tecken/settings.py Show resolved Hide resolved
tecken/settings.py Show resolved Hide resolved
tecken/tests/utils.py Show resolved Hide resolved
@willkg
Copy link
Contributor

willkg commented Jul 25, 2024

We talked about this on Zoom. I'm going to do a pass to minimally fix the StorageError issue and part of the issue with the tests that I had. Then I'll finish up the review.

@willkg willkg force-pushed the storage-interface-gcs branch from afb7052 to a324a95 Compare July 26, 2024 00:28
@willkg
Copy link
Contributor

willkg commented Jul 26, 2024

^^^ That rebases against main tip and addresses my concerns with StorageError and the tests.

@smarnach Do the last 3 commits look ok to you? If so, then we have a couple of __repr__ suggestions and then I can review this again.

@smarnach
Copy link
Contributor Author

@smarnach Do the last 3 commits look ok to you? If so, then we have a couple of __repr__ suggestions and then I can review this again.

Thanks a lot! The commits look good to me. I particularly like that the symfile tests and uploads now use valid data.

Copy link
Contributor

@willkg willkg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In testing, everything looked fine.

Validation:

  1. ran unit tests 👍
  2. ran system tests 👍
  3. looked at upload data
    1. regular uploads are marked correctly 👍
    2. try uploads are marked correctly 👍
  4. looked at file upload data
    1. compressed vs. uncompressed looks good 👍
    2. try vs. non try have the right public url 👍
    3. at the bottom of the upload file for try, it says "Try Symbols: No" which is wrong--but this is broken in stage, too. I wrote up bug 1910180 😒
    4. public url is something like http://localhost:8000/TestBaseProfiler/F9E38D3857BEE30A4339CC3B15892CF00/TestBaseProfiler.sym?try and clicking on it redirects to https://gcs-emulator:8001/publicbucket/try/v1/TestBaseProfiler/F9E38D3857BEE30A4339CC3B15892CF00/TestBaseProfiler.sym which fails. I think this gets fixed in pull bug-1905455: Enable anonymous access to storage buckets. #2965. 🤷
  5. i uploaded a pdb sym file
    1. verified sym file header extraction works 👍
    2. verified codeinfo lookup works 👍

We dropped checking the Location value in the download tests. I'm not sure why we did that. I kind of think we should be checking the Location value. After we land this, we can talk about it and maybe implement it in a followup PR.

I think this is good. Let's land it!

tecken/libsym.py Show resolved Hide resolved
tecken/tests/conftest.py Show resolved Hide resolved

assert parsed.path == (
"/publicbucket/v1/xul.pdb/44E4EC8C2F41492B9369D6B9A059577C2/xul.sym"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we should be testing the HTTP 302 response Location. Something like this:

def test_client_happy_path(client, db, symbol_storage, metricsmock):
    upload = UPLOADS["compressed"]
    upload.upload(symbol_storage)

    bucket = symbol_storage.upload_backend.bucket
    prefix = symbol_storage.upload_backend.prefix

    url = reverse(
        "download:download_symbol",
        args=(upload.debug_file, upload.debug_id, upload.sym_file),
    )

    response = client.get(url)
    assert response.status_code == 302

    parsed = urlparse(response["location"])
    assert parsed.netloc in ["localstack:4566", "gcs-emulator:8001"]
    assert parsed.path == (
        f"/{bucket}/{prefix}/{upload.debug_file}/{upload.debug_id}/{upload.sym_file}"
    )

    metricsmock.assert_histogram_once(
        "tecken.symboldownloader.file_age_days",
        value=0,
        tags=["storage:regular", "host:testnode"],
    )

    response = requests.get(response.headers["location"])
    assert response.status_code == 200
    assert response.content == upload.original_body

    response = client.head(url)
    assert response.status_code == 200
    assert response["Access-Control-Allow-Origin"] == "*"
    assert response["Access-Control-Allow-Methods"] == "GET"

We can add this in a follow-up PR unless there's some compelling reason I'm not seeing for not doing that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Determining the actual redirect location gets more complex with the next PR that adds a potential CDN. At that point, we'd also need to check whether the storage backend has public_url set and take that into account. It's possible, but the question is how useful it is – we'd basically be re-implementing some logic from the application code in the tests and check that both implementations have the same result.

For this reason I added the code to actually download the symbols file and compare the body to what we uploaded. If we get the correct file back, that seems to be a much stronger indication that the redirect location is correct.

smarnach and others added 16 commits July 29, 2024 13:38
This adds the chained exception message to the StorageError message.
This will prevent Sentry from coalescing unconnected StorageErrors and
make it easier to understand what happened from the exception without
having to look up the exception chain.
This removes a file that wasn't used and moves the other test files into
a data/ subdirectory.

This adds a __init__.py making the test directory a Python package.
This moves extract_sym_file_header to a new tecken/libsym.py where it
can be used by the upload view and the tests.

This reworks the tests to use well-formed sym files in the tests/data/
directory.
@smarnach smarnach force-pushed the storage-interface-gcs branch from e807085 to ea0ff03 Compare July 29, 2024 11:47
@smarnach
Copy link
Contributor Author

@willkg There were two outstanding comments from your first review that I addressed in the last two commits. I think this is ready to land now – could you take a final look at the last two PRs?

@@ -11,23 +12,25 @@
from tecken.libstorage import StorageError


logger = logging.getLogger("tecken")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed https://bugzilla.mozilla.org/show_bug.cgi?id=1910363 to rename the loggers in Tecken.

Copy link
Contributor

@willkg willkg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@smarnach smarnach merged commit b08c54c into main Jul 29, 2024
2 checks passed
@smarnach smarnach deleted the storage-interface-gcs branch July 29, 2024 13:49
@smarnach
Copy link
Contributor Author

Quoting from the description of this PR:

Add new environment variables to stage before merging this PR.

I forgot to do that. I'll file a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants