Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enforce same-region storage for new repositories #7847

Merged
merged 11 commits into from
Jun 17, 2024

Conversation

itaigilo
Copy link
Contributor

Closes #7846.

Change Description

Disable (by default) the option to create a repository in a cloud region different than the region in which the block storage resides in.

It can be enabled by a server configuration option.

Background

Currently, users can create repositories using storage that resides in a region different than the region of the block store.
This hurts both performance and costs, and should be disabled by default.

Testing Details

Tested manually -
Unit tests are TBD (working on it now).

Breaking Change?

This changes the default behavior -
But I don't think it should be considered breaking-change.
If it's important, this can easily be solved by defaulting the config value to true.

Additional info

Example of the error:
Screenshot 2024-06-10 at 13 33 32

Copy link

github-actions bot commented Jun 10, 2024

♻️ PR Preview c0e873c has been successfully destroyed since this PR has been closed.

🤖 By surge-preview

@itaigilo itaigilo added new-feature Issues that introduce new feature or capability area/block-adapter include-changelog PR description should be included in next release changelog AWS labels Jun 10, 2024
@itaigilo itaigilo requested a review from a team June 10, 2024 10:40
Copy link

E2E Test Results - DynamoDB Local - Local Block Adapter

13 passed

Copy link
Member

@N-o-Z N-o-Z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@itaigilo Thank you,

This is a partial review, if you intend on adding unit tests as part of this PR I would prefer if you finish with the PR before submitting it for review.
It makes it harder for me as a reviewer to provide partial reviews of code and keep track of added logic

@@ -34,6 +34,7 @@ func setDefaults(cfgType string) {
viper.SetDefault("auth.encrypt.secret_key", DefaultAuthSecret)
viper.SetDefault(BlockstoreTypeKey, "local")
}
viper.SetDefault("installation.allow_inter_region_storage", true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏽

@@ -144,6 +144,11 @@ func (a *Adapter) BlockstoreType() string {
return block.BlockstoreTypeTransient
}

func (a *Adapter) BlockstoreMetadata(ctx context.Context) block.BlockstoreMetadata {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to also add this to mock adapter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, thanks.

@@ -832,6 +832,13 @@ func (a *Adapter) BlockstoreType() string {
return block.BlockstoreTypeS3
}

func (a *Adapter) BlockstoreMetadata(ctx context.Context) block.BlockstoreMetadata {
region, _ := a.clients.GetBucketRegionDefault(ctx, "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You must check err and return err from this function as necessary.
Even if the current implementation will never actually return an error. You can't know when this implementation will change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. Updating this.

@@ -144,6 +144,11 @@ func (a *Adapter) BlockstoreType() string {
return block.BlockstoreTypeTransient
}

func (a *Adapter) BlockstoreMetadata(ctx context.Context) block.BlockstoreMetadata {
// not implemented at the moment
return block.BlockstoreMetadata{Region: nil}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should return an error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, RuntimeStats doesn't return an error when un-implemented...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking again, it makes more sense to return block.BlockstoreMetadata, error.
Changing this.

@@ -305,6 +305,7 @@ An object describing the local (on-disk) cache of metadata from permanent storag
* `installation.user_name` `(string : )` - When specified, an initial admin user will be created when the server is first run. Works only when `database.type` is set to local. Requires `installation.access_key_id` and `installation.secret_access_key`.
* `installation.access_key_id` `(string : )` - Admin's initial access key id (used once in the initial setup process)
* `installation.secret_access_key` `(string : )` - Admin's initial secret access key (used once in the initial setup process)
* `installation.allow_inter_region_storage` `(bool : false)` - Allow storage in a different region than the one the server is running in.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the default (true)

Comment on lines 2114 to 2115
storeRegion := c.BlockAdapter.BlockstoreMetadata(ctx).Region
if storeRegion == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
storeRegion := c.BlockAdapter.BlockstoreMetadata(ctx).Region
if storeRegion == nil {
region := c.BlockAdapter.BlockstoreMetadata(ctx).Region
if region == nil {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer being explicit, and explain what I'm actually using (which is the block-store region).

@@ -583,6 +583,11 @@ func (a *Adapter) BlockstoreType() string {
return block.BlockstoreTypeAzure
}

func (a *Adapter) BlockstoreMetadata(ctx context.Context) block.BlockstoreMetadata {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we not implementing it for Azure as well?

@@ -630,6 +630,11 @@ func (a *Adapter) BlockstoreType() string {
return block.BlockstoreTypeGS
}

func (a *Adapter) BlockstoreMetadata(ctx context.Context) block.BlockstoreMetadata {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we not implementing it for GCS as well?

@@ -861,6 +868,15 @@ func (a *Adapter) ResolveNamespace(storageNamespace, key string, identifierType
return block.DefaultResolveNamespace(storageNamespace, key, identifierType)
}

func (a *Adapter) GetRegion(ctx context.Context, storageNamespace string) (string, error) {
bucket, found := strings.CutPrefix(storageNamespace, "s3://")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this validation is required - we should have already validated the namespace before

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not about validating -
It's about sending the bucket name (for example aaa) to the AWS API, and not the namespace (s3://aaa).
(And I also tried sending s3://aaa to the API - it didn't work.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets add a GetSchema method to the block adapter or at least use the BlockstoreTypeS3 const.

@@ -1946,6 +1947,14 @@ func (c *Controller) CreateRepository(w http.ResponseWriter, r *http.Request, bo
return
}

allowInterRegionStorage := c.Config.Installation.AllowInterRegionStorage
if !allowInterRegionStorage {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this check should be inside validateInterRegionStorage

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting.
I don't really mind actually, both seem ok to me 😄

Copy link
Contributor Author

@itaigilo itaigilo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@N-o-Z thanks for your review.

Will add impl for Azure and GCS,
As well as tests.

The point of sending for review this before adding tests, is to validate the direction is solid first. Will update once it's done, so we can reduce review iterations, of course.

@@ -1946,6 +1947,14 @@ func (c *Controller) CreateRepository(w http.ResponseWriter, r *http.Request, bo
return
}

allowInterRegionStorage := c.Config.Installation.AllowInterRegionStorage
if !allowInterRegionStorage {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting.
I don't really mind actually, both seem ok to me 😄

Comment on lines 2114 to 2115
storeRegion := c.BlockAdapter.BlockstoreMetadata(ctx).Region
if storeRegion == nil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer being explicit, and explain what I'm actually using (which is the block-store region).

@@ -861,6 +868,15 @@ func (a *Adapter) ResolveNamespace(storageNamespace, key string, identifierType
return block.DefaultResolveNamespace(storageNamespace, key, identifierType)
}

func (a *Adapter) GetRegion(ctx context.Context, storageNamespace string) (string, error) {
bucket, found := strings.CutPrefix(storageNamespace, "s3://")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not about validating -
It's about sending the bucket name (for example aaa) to the AWS API, and not the namespace (s3://aaa).
(And I also tried sending s3://aaa to the API - it didn't work.)

@@ -144,6 +144,11 @@ func (a *Adapter) BlockstoreType() string {
return block.BlockstoreTypeTransient
}

func (a *Adapter) BlockstoreMetadata(ctx context.Context) block.BlockstoreMetadata {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, thanks.

@@ -144,6 +144,11 @@ func (a *Adapter) BlockstoreType() string {
return block.BlockstoreTypeTransient
}

func (a *Adapter) BlockstoreMetadata(ctx context.Context) block.BlockstoreMetadata {
// not implemented at the moment
return block.BlockstoreMetadata{Region: nil}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, RuntimeStats doesn't return an error when un-implemented...

@@ -1946,6 +1947,13 @@ func (c *Controller) CreateRepository(w http.ResponseWriter, r *http.Request, bo
return
}

if !c.Config.Installation.AllowInterRegionStorage {
if err := block.ValidateInterRegionStorage(r.Context(), c.BlockAdapter, body.StorageNamespace); err != nil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to @arielshaqed 's suggestion, extracted this into a separate file.
It encapsulates it from this huge file, and make it easier to test.

@@ -4576,7 +4575,7 @@ func TestController_PrepareGarbageCollectionCommitted(t *testing.T) {
}

func TestController_ClientDisconnect(t *testing.T) {
handler, deps := setupHandlerWithWalkerFactory(t, store.NewFactory(nil))
handler, deps := setupHandler(t)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Started from an attempt to add my tests here -
Ended up not to, but on the way I did a small cleanup, which I think worth keeping (mainly in server_test.go.

@@ -0,0 +1,165 @@
package testutil
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved mockAdapter into testutil, so it can be used by other tests.

Copy link
Contributor Author

@itaigilo itaigilo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@N-o-Z done some refactoring and add tests,
PTAL again.

And regarding implementing for Azure and CGS -
Opening now a separate issue, and will start working on it right after this one.

@itaigilo itaigilo requested a review from N-o-Z June 13, 2024 14:09
Copy link
Member

@N-o-Z N-o-Z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks,
Some more comments

@@ -6,6 +6,7 @@ import (
"crypto/md5" //nolint:gosec
"crypto/rand"
"fmt"
"github.com/treeverse/lakefs/pkg/testutil"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gofmt imports

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make validate-fmt says Your code formatting is according to gofmt standards.

Am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

region, _ := a.clients.GetBucketRegionDefault(ctx, "")
return block.BlockstoreMetadata{
Region: &region,
func (a *Adapter) BlockstoreMetadata(ctx context.Context) (block.BlockstoreMetadata, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer to return a pointer + err

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -583,6 +583,10 @@ func (a *Adapter) BlockstoreType() string {
return block.BlockstoreTypeAzure
}

func (a *Adapter) BlockstoreMetadata(ctx context.Context) (block.BlockstoreMetadata, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue discusses adding support for same-region storage on all block adapters. Please open followup issues to support this on Azure and GCS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue #7868 .

@@ -861,6 +868,15 @@ func (a *Adapter) ResolveNamespace(storageNamespace, key string, identifierType
return block.DefaultResolveNamespace(storageNamespace, key, identifierType)
}

func (a *Adapter) GetRegion(ctx context.Context, storageNamespace string) (string, error) {
bucket, found := strings.CutPrefix(storageNamespace, "s3://")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets add a GetSchema method to the block adapter or at least use the BlockstoreTypeS3 const.

package block_test

import (
"context"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go fmt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make validate-fmt says Your code formatting is according to gofmt standards.

Am I missing something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It ignores it since this is a test file but goimports requires separating standard library from 3rd party imports (this will fail on non-test files).
We exclude test files from lint and fmt due to coding style failures we have there for testing reasons but we still would like to maintain some of the formatting conventions.
If you run goimports on the file you will see the errors

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's important - why not apply only imports validation on the test files?

Anyway, fixed.

"testing"
)

func TestController_ValidateInterRegionStorage(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use test cases to remove code repetition.


func ValidateInterRegionStorage(ctx context.Context, adapter Adapter, storageNamespace string) error {
blockstoreMetadata, err := adapter.BlockstoreMetadata(ctx)
if errors.Is(err, ErrOperationNotSupported) || blockstoreMetadata.Region == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I think it's a bug to check for blockstoreMetadata.Region == nil
    In which situations are we expected to get a nil value?
    It seems to me we either get an err or the Region field should be populated
  2. Please add a debug log in this case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to make BlockstoreMetadata more general.
So for example, in the future, it will also include other properties.
In this case, BlockstoreMetadata may return the other props and not region, which says it won't be an error, but just not implemented.

If you find this more confusing, I can remove this check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question is, not whether Metadata stores other information regarding the blockstore but rather if there is any real life scenario where the Region property will be nil.
If it makes sense - leave this as it is.
If there's no current scenario where this can actually be valid - we should remove this

func (a *Adapter) BlockstoreMetadata(ctx context.Context) (block.BlockstoreMetadata, error) {
region, err := a.clients.GetBucketRegionDefault(ctx, "")
if err != nil {
return block.BlockstoreMetadata{}, err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return nil, err

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor Author

@itaigilo itaigilo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@N-o-Z all your comments have been fixed or addressed - PTAL again.

package block_test

import (
"context"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make validate-fmt says Your code formatting is according to gofmt standards.

Am I missing something?

@@ -6,6 +6,7 @@ import (
"crypto/md5" //nolint:gosec
"crypto/rand"
"fmt"
"github.com/treeverse/lakefs/pkg/testutil"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make validate-fmt says Your code formatting is according to gofmt standards.

Am I missing something?


func ValidateInterRegionStorage(ctx context.Context, adapter Adapter, storageNamespace string) error {
blockstoreMetadata, err := adapter.BlockstoreMetadata(ctx)
if errors.Is(err, ErrOperationNotSupported) || blockstoreMetadata.Region == nil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to make BlockstoreMetadata more general.
So for example, in the future, it will also include other properties.
In this case, BlockstoreMetadata may return the other props and not region, which says it won't be an error, but just not implemented.

If you find this more confusing, I can remove this check.

@@ -583,6 +583,10 @@ func (a *Adapter) BlockstoreType() string {
return block.BlockstoreTypeAzure
}

func (a *Adapter) BlockstoreMetadata(ctx context.Context) (block.BlockstoreMetadata, error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue #7868 .

region, _ := a.clients.GetBucketRegionDefault(ctx, "")
return block.BlockstoreMetadata{
Region: &region,
func (a *Adapter) BlockstoreMetadata(ctx context.Context) (block.BlockstoreMetadata, error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

func (a *Adapter) BlockstoreMetadata(ctx context.Context) (block.BlockstoreMetadata, error) {
region, err := a.clients.GetBucketRegionDefault(ctx, "")
if err != nil {
return block.BlockstoreMetadata{}, err
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@itaigilo itaigilo requested a review from N-o-Z June 16, 2024 12:52
Copy link
Member

@N-o-Z N-o-Z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@itaigilo thanks,
Only 2 comments we need to close

Copy link
Contributor Author

@itaigilo itaigilo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@N-o-Z fixed, PTAL again.

package block_test

import (
"context"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's important - why not apply only imports validation on the test files?

Anyway, fixed.

@@ -6,6 +6,7 @@ import (
"crypto/md5" //nolint:gosec
"crypto/rand"
"fmt"
"github.com/treeverse/lakefs/pkg/testutil"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@itaigilo itaigilo requested a review from N-o-Z June 17, 2024 04:23
Copy link
Member

@N-o-Z N-o-Z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, thanks

@itaigilo itaigilo merged commit 33ee454 into master Jun 17, 2024
36 checks passed
@itaigilo itaigilo deleted the feature/block-inter-region-storage branch June 17, 2024 11:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/block-adapter AWS include-changelog PR description should be included in next release changelog new-feature Issues that introduce new feature or capability
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Don't allow repositories with storage region different than the block storage region
2 participants