Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS storage implementation on S3 + MySQL (via RDS) #314

Merged
merged 5 commits into from
Nov 15, 2024

Conversation

phbnf
Copy link
Contributor

@phbnf phbnf commented Nov 15, 2024

Towards #24

This PR migrates from GCS to S3, and from Spanner to MySQL, thus enabling the code to run on AWS with S3 + RDS.
The MySQL code is greatly inspired by the old Betty SQL code @AlCutter wrote for GCP and was then ported over to Betty Aurora.

This PR introduces two method signature changes:

  • It removes the generation number from getObject, because it's not actually used, both in the GCP implementation and in this one
  • It replaces preconditions with a boolean in setObject, because a. there's only one type precondition used in this code, b. the ways preconditions are passed is different anyways so this looks nicer

The code has been tested with the conformance tool, which a followup PR will introduce. This PR removes tests, because they haven't been migrated yet, I will re-introduce them later.

@@ -64,7 +66,7 @@ const (
DefaultIntegrationSizeLimit = 5 * 4096
)

// Storage is a GCP based storage implementation for Tessera.
// Storage is a AWS based storage implementation for Tessera.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "an AWS"

return d, err
}

// init ensures that the storage represents a log in a valid state.
func (s *Storage) init(ctx context.Context) error {
_, err := s.get(ctx, layout.CheckpointPath)
if err != nil {
if errors.Is(err, gcs.ErrObjectNotExist) {
var nske *types.NoSuchKey
if errors.As(err, &nske) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since nske isn't used below, could you use errors.Is here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought as well, I did this first, and then ran into unexpected errors. Sadly, I can't reproduce them.

The tl;dr is that types.NoSuchKey is a type, it's not an error returned with errors.New(), as opposed to gcs. ErrObjectNotExist. errors.As() does a type comparison, whereas errors.Is() does value comparison: see https://go.dev/blog/go1.13-errors.

I left a comment around to explain the rationale and make sure nobody undoes this.

@@ -213,7 +221,7 @@ func (s *Storage) updateCP(ctx context.Context, newSize uint64, newRoot []byte)
if err != nil {
return fmt.Errorf("newCP: %v", err)
}
if err := s.objStore.setObject(ctx, layout.CheckpointPath, cpRaw, nil, ckptContType); err != nil {
if err := s.objStore.setObject(ctx, layout.CheckpointPath, cpRaw, false, ckptContType); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For readability, do you think it's worth adding named consts for this new bool param?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried a few things, and we spoke about it. This is now split into two functions: setObject and setObjectIfNoneMatch, seems to be the simplest option.

if err != nil {
if errors.Is(err, gcs.ErrObjectNotExist) {
var nske *types.NoSuchKey
if errors.As(err, &nske) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(same observation as above about nske - one more below too but I won't drop a comment there)

// a durable and thread/multi-process safe sequencer.
type spannerSequencer struct {
dbPool *spanner.Client
type mysqlSequencer struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: mySQL / newMySQLSequencer below?

`CREATE TABLE IF NOT EXISTS Seq(
id INT UNSIGNED NOT NULL,
seq BIGINT UNSIGNED NOT NULL,
v LONGBLOB,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth noting that LONGBLOB allows values up to 4GB here which may be overkill, MEDIUMBLOB would allow 2^24, and BLOB 2^16 bytes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4GB is big. I actually wonder what would happen with such a large bunch of entries and we should probably check.

2^23-1 is 16 MB which is also a bit big...

Having said that:
The default DefaultBatchMaxSize is 256, so that's 16384KB/256=64KB.
The vast majority of CT entries are smaller than this, but, we know of large certificates, with a lot of SAN like https://crt.sh/?id=10751627 which is 371KB. So 65KB for BLOB won't work. I've left a TODO for us to run tests and check what works well.

klog.V(1).Info("Found no rows to sequence")
return nil
}
seqsConsumed := []any{}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[]any rather than []uint64 because of something to do with passing these back as SQL args later?
Might be worth a comment here about the slightly unusual type...

}

if len(seqsConsumed) > 0 {
q := "DELETE FROM Seq WHERE id=? AND seq IN ( " + placeholder(len(seqsConsumed)) + " )"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for now, but worth looking at seeing how seq IN (?, ?, ? ...) performs vs, say, seq BETWEEN ? AND ? - in theory the latter may be faster as it can do it in a single range scan vs. potentially having to do a sequential bunch of "pecks", but would need to be looked into once you have it all working.

@phbnf phbnf merged commit af45921 into transparency-dev:main Nov 15, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants