Implement support for Google Spanner #271

jdm-square · 2022-08-09T21:42:19Z

These changes add a new backend for backfilling Spanner databases integrated into Misk services.

I'm still adding unit tests to show that it all works, but I figured I would put it up for some early review and to discover CI issues.

deanpapastrat

LGTM!

client-base/src/main/kotlin/app/cash/backfila/client/RealBackfillModule.kt

...-misk-spanner/src/main/kotlin/app/cash/backfila/client/misk/spanner/SpannerBackfillModule.kt

client-misk-spanner/src/main/kotlin/app/cash/backfila/client/misk/spanner/SpannerBackfill.kt

...isk-spanner/src/main/kotlin/app/cash/backfila/client/misk/spanner/internal/SpannerBackend.kt

...er/src/main/kotlin/app/cash/backfila/client/misk/spanner/internal/SpannerBackfillOperator.kt

...nt-misk-spanner/src/test/kotlin/app/cash/backfila/client/misk/spanner/SpannerBackfillTest.kt

shellderp · 2022-08-10T01:35:19Z

...er/src/main/kotlin/app/cash/backfila/client/misk/spanner/internal/SpannerBackfillOperator.kt

+
+    val partitions = listOf(
+      PrepareBackfillResponse.Partition.Builder()
+        .backfill_range(request.range)


are you requiring range to be passed in? In other implementations we compute the ranges if you don't pass it in

The range is actually completely ignored. Spanner is unlike many other DBs, where for optimal performance primary keys really can't be in anything like a monotonic increasing range. I don't know how to compute a range without doing a full table scan, which seems... suboptimal.

you can't ask for min/max primary key value?

Primary keys are often random values like UUIDs and unordered for optimal performance. Min/max aren't valid concepts, as far as I can tell. Source: https://cloud.google.com/spanner/docs/schema-design#primary-key-prevent-hotspots

Backfila requires ordered key values to operate. I'm curious how you would use it if that's not the case. I haven't used spanner but my understanding was its ordered, you just want to avoid sequential writes

And to answer the original question - we don’t require a range to be passed in. That’s optional.

Yeah I'm well aware how primary key design works in spanner, and you can have items added within the range. That's true even in auto increment, technically. It doesn't matter since the expectation is you are inserting new items that don't need backfilling.

It sounds like you are able to just ask spanner for records and it will give in some order, that should be fine I guess

Wouldn't this work like dynamo backfills? Dynamo is somewhat different, but has a scan mechanism we use, and I believe we don't do ranges on it either? You could check that.

Does this mean that you will essentially run your backfill single threaded?

So there must be some distributed way to process the whole data set in bulk? In Dynamo it is this idea of segments.

deanpapastrat

LGTM

...er/src/main/kotlin/app/cash/backfila/client/misk/spanner/internal/SpannerBackfillOperator.kt

shellderp · 2022-08-10T23:11:51Z

...er/src/main/kotlin/app/cash/backfila/client/misk/spanner/internal/SpannerBackfillOperator.kt

+  override fun getNextBatchRange(request: GetNextBatchRangeRequest): GetNextBatchRangeResponse {
+    // Establish a range to scane - either we want to start at the first key,
+    // or start from (and exclude) the last key that was scanned.
+    val range = if (request.previous_end_key == null) {


I guess we're not using the backfill_range at all, that's what would be passed in by the user (or I missed it somewhere)

Yes. If I'm not mistaken, the DynamoDB backend also ignores it.

DynamoDb is pretty limited because of dynamo itself, the hibernate one is pretty good to copy from. Obviously, build whatever features you want, I won't be using it :P

You need some guarantees around the end key otherwise you may be missing items, no? This was tricky with DynamoDb as well. We figured out some optimizations but since they weren't really documented we didn't add those to the client. In Dynamo we split up by segment but then don't complete the "batch" until the range is completed. Maybe Google has some better guarantees?

Co-authored-by: Mike Gershunovsky <[email protected]>

escardin

Since this isn't urgent, I'll just comment.

I think this is a good start, but we'd want to really make sure as much of backfila works as expected as possible. Having a single partition is okay, but could be somewhat challenging to scale. Upper and lower bounds for ranges are a reasonable tradeoff.

escardin · 2022-08-11T19:26:26Z

...er/src/main/kotlin/app/cash/backfila/client/misk/spanner/internal/SpannerBackfillOperator.kt

+
+    val partitions = listOf(
+      PrepareBackfillResponse.Partition.Builder()
+        .backfill_range(request.range)


Wouldn't this work like dynamo backfills? Dynamo is somewhat different, but has a scan mechanism we use, and I believe we don't do ranges on it either? You could check that.

escardin · 2022-08-11T19:34:56Z

...er/src/main/kotlin/app/cash/backfila/client/misk/spanner/internal/SpannerBackfillOperator.kt

+
+    val partitions = listOf(
+      PrepareBackfillResponse.Partition.Builder()
+        .backfill_range(request.range)


Does this mean that you will essentially run your backfill single threaded?

mpawliszyn

Overall looking very good. Let's avoid misk except in test.

I wonder if you can use this to be more parallel?
https://cloud.google.com/spanner/docs/reference/rpc/google.spanner.v1#google.spanner.v1.Spanner.PartitionRead
Can you share a session among different machines? Your backfill might die if the session dies though.

mpawliszyn · 2022-08-15T16:15:23Z

client-misk-spanner/build.gradle.kts

+  // We do not want to leak client-base implementation details to customers.
+  implementation(project(":client-base"))
+
+  implementation(Dependencies.misk)


Can we limit our use of misk at least in non-test? Do we really need it?

Looking through your code I think these only need to be testImplementation dependencies. Let's move those dependencies to test, rename the module, and add a comment so they don't leak to the main implementation.

mpawliszyn · 2022-08-15T16:16:25Z

...er/src/main/kotlin/app/cash/backfila/client/misk/spanner/internal/SpannerBackfillOperator.kt

+    val partitions = listOf(
+      PrepareBackfillResponse.Partition.Builder()
+        .backfill_range(request.range)
+        .partition_name("partition")


I'd prefer something like single or only. This is exposed to the customer.

mpawliszyn · 2022-08-16T15:52:23Z

...er/src/main/kotlin/app/cash/backfila/client/misk/spanner/internal/SpannerBackfillOperator.kt

+  override fun getNextBatchRange(request: GetNextBatchRangeRequest): GetNextBatchRangeResponse {
+    // Establish a range to scane - either we want to start at the first key,
+    // or start from (and exclude) the last key that was scanned.
+    val range = if (request.previous_end_key == null) {


You need some guarantees around the end key otherwise you may be missing items, no? This was tricky with DynamoDb as well. We figured out some optimizations but since they weren't really documented we didn't add those to the client. In Dynamo we split up by segment but then don't complete the "batch" until the range is completed. Maybe Google has some better guarantees?

mpawliszyn · 2022-08-16T15:56:12Z

...er/src/main/kotlin/app/cash/backfila/client/misk/spanner/internal/SpannerBackfillOperator.kt

+
+    val partitions = listOf(
+      PrepareBackfillResponse.Partition.Builder()
+        .backfill_range(request.range)


So there must be some distributed way to process the whole data set in bulk? In Dynamo it is this idea of segments.

deanpapastrat approved these changes Aug 10, 2022

View reviewed changes

shellderp reviewed Aug 10, 2022

View reviewed changes

jdm-square changed the title ~~[WIP] Implement support for Google Spanner~~ Implement support for Google Spanner Aug 10, 2022

jdm-square marked this pull request as ready for review August 10, 2022 22:25

jdm-square force-pushed the spanner-support branch 2 times, most recently from 87ad6fd to aaaebc0 Compare August 10, 2022 22:27

Add a Spanner backend.

1a6ad72

jdm-square force-pushed the spanner-support branch from aaaebc0 to 1a6ad72 Compare August 10, 2022 22:35

deanpapastrat approved these changes Aug 10, 2022

View reviewed changes

shellderp reviewed Aug 10, 2022

View reviewed changes

...er/src/main/kotlin/app/cash/backfila/client/misk/spanner/internal/SpannerBackfillOperator.kt Outdated Show resolved Hide resolved

shellderp reviewed Aug 10, 2022

View reviewed changes

Fix typo.

3d8b2e4

Co-authored-by: Mike Gershunovsky <[email protected]>

escardin reviewed Aug 11, 2022

View reviewed changes

mpawliszyn reviewed Aug 16, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement support for Google Spanner #271

Implement support for Google Spanner #271

jdm-square commented Aug 9, 2022 •

edited

Loading

deanpapastrat left a comment

shellderp Aug 10, 2022

jdm-square Aug 10, 2022

shellderp Aug 10, 2022

jdm-square Aug 10, 2022 •

edited

Loading

shellderp Aug 10, 2022

deanpapastrat Aug 10, 2022

shellderp Aug 10, 2022

escardin Aug 11, 2022

escardin Aug 11, 2022

mpawliszyn Aug 16, 2022

deanpapastrat left a comment

shellderp Aug 10, 2022

jdm-square Aug 10, 2022

shellderp Aug 10, 2022

mpawliszyn Aug 16, 2022

escardin left a comment

escardin Aug 11, 2022

escardin Aug 11, 2022

mpawliszyn left a comment

mpawliszyn Aug 15, 2022

mpawliszyn Aug 15, 2022

mpawliszyn Aug 15, 2022

mpawliszyn Aug 16, 2022

mpawliszyn Aug 16, 2022

Implement support for Google Spanner #271

Are you sure you want to change the base?

Implement support for Google Spanner #271

Conversation

jdm-square commented Aug 9, 2022 • edited Loading

deanpapastrat left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jdm-square Aug 10, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

deanpapastrat left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

escardin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mpawliszyn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jdm-square commented Aug 9, 2022 •

edited

Loading

jdm-square Aug 10, 2022 •

edited

Loading