Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open AWS storage to being used with non-AWS MySQL and S3 services #428

Merged
merged 6 commits into from
Dec 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions cmd/conformance/aws/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,9 @@ import (
"net/http"
"time"

aaws "github.com/aws/aws-sdk-go-v2/aws"
"github.com/aws/aws-sdk-go-v2/credentials"
"github.com/aws/aws-sdk-go-v2/service/s3"
tessera "github.com/transparency-dev/trillian-tessera"
"github.com/transparency-dev/trillian-tessera/storage/aws"
"golang.org/x/mod/sumdb/note"
Expand All @@ -42,6 +45,9 @@ var (
dbPassword = flag.String("db_password", "", "AuroraDB user")
dbMaxConns = flag.Int("db_max_conns", 0, "Maximum connections to the database, defaults to 0, i.e unlimited")
dbMaxIdle = flag.Int("db_max_idle_conns", 2, "Maximum idle database connections in the connection pool, defaults to 2")
s3Endpoint = flag.String("s3_endpoint", "", "Endpoint for custom non-AWS S3 service")
s3AccessKeyID = flag.String("s3_access_key", "", "Access key ID for custom non-AWS S3 service")
s3SecretAccessKey = flag.String("s3_secret", "", "Secret access key for custom non-AWS S3 service")
signer = flag.String("signer", "", "Note signer to use to sign checkpoints")
publishInterval = flag.Duration("publish_interval", 3*time.Second, "How frequently to publish updated checkpoints")
additionalSigners = []string{}
Expand Down Expand Up @@ -138,8 +144,27 @@ func storageConfigFromFlags() aws.Config {
*dbUser, *dbPassword, dbEndpoint, *dbName,
)

// Configure to use MinIO Server
var awsConfig *aaws.Config
var s3Opts func(o *s3.Options)
if *s3Endpoint != "" {
const defaultRegion = "us-east-1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me that this will always set the region to us-east-1 as soon as someone sets s3Endpoint. That will then set SDKConfig which means that LoadDefaultConfig will be skipped, thus skipping other values that would be in the default config. At the moment, we only use LoadDefaultConfig for S3 but that might not always be the case, so let's disambiguate this to avoid leaving something to trip on later? Note, that I don't know how LoadDefaultConfig works with MinIO, but it looks like this might be tricky if someone uses both AWS and the new flags. That should not be done in the first place, but if the code doesn't prevent it (should it?!), then let's make sure there's no abiguity?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's intentional - the idea here is that you only ever set --s3_endpoint if you're going "off-piste" and running outside of AWS, in which case the aws default config no longer applies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things I'm thinking of:

  • If someone uses a local MySQL instance, they're "off piste", but might not experience the same behavior depending on whether they specify the S3 endpoint using the flags, or LoadDefaultConfig, which is..... fine because they're off piste?
  • Right now LoadDefaultConfig is only called for S3, but my plan was to change this such that it can use IAM authentication for MySQL instead of user and password. We might have to reorganise things then, but seems fine to me with the current code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, yeah - I think this is only a problem if they want to use "half AWS, half off"(?) which is definitely a non-supported config, so, as you say, probably ok that it doesn't work.

I'm trying to avoid having the amazon SDK "inadvertently" populate the client with AWS credentials, endpoint URLs, or anything else it does "by default" if you're in off-piste mode.

s3Opts = func(o *s3.Options) {
o.BaseEndpoint = aaws.String(*s3Endpoint)
o.Credentials = credentials.NewStaticCredentialsProvider(*s3AccessKeyID, *s3SecretAccessKey, "")
o.Region = defaultRegion
o.UsePathStyle = true
}

awsConfig = &aaws.Config{
Region: defaultRegion,
}
}

return aws.Config{
Bucket: *bucket,
SDKConfig: awsConfig,
S3Options: s3Opts,
DSN: dsn,
MaxOpenConns: *dbMaxConns,
MaxIdleConns: *dbMaxIdle,
Expand Down
16 changes: 16 additions & 0 deletions storage/aws/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,22 @@ Two experimental implementations have been tested which uses either Aurora MySQL
or a local bbolt database to store the `<identity_hash>` --> `sequence` mapping.
They work well, but call for further stress testing and cost analysis.

## Compatibility

This storage implementation is intended to be used with AWS services.

However, given that it's based on services which are compatible with MySQL and
S3 protocols, it's possible that it will work with other non-AWS-based backends
which are compatible with these protocols.

Given the vast array of combinations of backend implementations and versions,
using this storage implementation outside of AWS isn't officially supported, although
there may be folks who can help with issues in the Transparency-Dev slack.

Similarly, PRs raised against it relating to its use outside of AWS are unlikely to
be accepted unless it's shown that they have no detremental effect to the implementation's
performance on AWS.

### Alternatives considered

Other transactional storage systems are available on AWS, e.g. Redshift, RDS or
Expand Down
37 changes: 33 additions & 4 deletions storage/aws/aws.go
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,10 @@ package aws

import (
"bytes"
"compress/gzip"
"context"
"database/sql"
"encoding/base64"
"encoding/gob"
"errors"
"fmt"
Expand Down Expand Up @@ -110,6 +112,17 @@ type consumeFunc func(ctx context.Context, from uint64, entries []storage.Sequen

// Config holds AWS project and resource configuration for a storage instance.
type Config struct {
// SDKConfig is an optional AWS config to use when configuring service clients, e.g. to
// use non-AWS S3 or MySQL services.
//
// If nil, the value from config.LoadDefaultConfig() will be used - this is the only
// supported configuration.
SDKConfig *aws.Config
// S3Options is an optional function which can be used to configure the S3 library.
// This is primarily useful when configuring the use of non-AWS S3 or MySQL services.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that S3Options has no impact on MySQL, right? The comment says otherwise.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking you're right, but intention of the comment is to make it clear that if you're setting these options you're outside of AWS and the supported config; i.e. it's only really useful to use S3Options if you're configuring the binary to use non-AWS services.

//
// If nil, the default options will be used - this is the only supported configuration.
S3Options func(*s3.Options)
// Bucket is the name of the S3 bucket to use for storing log state.
Bucket string
// DSN is the DSN of the MySQL instance to use.
Expand All @@ -133,11 +146,16 @@ func New(ctx context.Context, cfg Config, opts ...func(*options.StorageOptions))
return nil, fmt.Errorf("requested CheckpointInterval (%v) is less than minimum permitted %v", opt.CheckpointInterval, minCheckpointInterval)
}

sdkConfig, err := config.LoadDefaultConfig(ctx)
if err != nil {
return nil, fmt.Errorf("failed to load default AWS configuration: %v", err)
if cfg.SDKConfig == nil {
sdkConfig, err := config.LoadDefaultConfig(ctx)
if err != nil {
return nil, fmt.Errorf("failed to load default AWS configuration: %v", err)
}
cfg.SDKConfig = &sdkConfig
} else {
printDragonsWarning()
}
c := s3.NewFromConfig(sdkConfig)
c := s3.NewFromConfig(*cfg.SDKConfig, cfg.S3Options)

seq, err := newMySQLSequencer(ctx, cfg.DSN, uint64(opt.PushbackMaxOutstanding), cfg.MaxOpenConns, cfg.MaxIdleConns)
if err != nil {
Expand Down Expand Up @@ -873,3 +891,14 @@ func (s *s3Storage) lastModified(ctx context.Context, obj string) (time.Time, er

return *r.LastModified, r.Body.Close()
}

func printDragonsWarning() {
d := `H4sIAFZYZGcAA01QMQ7EIAzbeYXV5UCqkq1bf2IFtpNuPalj334hFQdkwLGNAwBzyXnKitOiqTYj
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<3

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't pass up the option :)

B7ZGplWEwZhZqxZ1aKuswcD0AA4GXPUhI0MEpSd5Ow09vJ+m6rVtF6m0GDccYXDZEdp9N/g1H9Pf
Qu80vNj7tiOe0lkdc8hwZK9YxavT0+FTP++vU6DUKvpEOr1+VGTk3IBXKSX9AHz5xXRgAQAA`
g, _ := base64.StdEncoding.DecodeString(d)
r, _ := gzip.NewReader(bytes.NewReader(g))
t, _ := io.ReadAll(r)
klog.Infof("Running in non-AWS mode - see storage/aws/README.md for more details.")
klog.Infof("Here be dragons!\n%s", t)
}
Loading