design: Unified Compute Introspection #27548

teskje · 2024-06-10T14:06:37Z

This PR adds a design for Unified Compute Introspection (https://github.com/MaterializeInc/database-issues/issues/7898). The design is for an MVP that can be quickly implemented by the cluster team/me, and many of the design decisions reflect that, e.g., by preferring the simpler approach over the more flexible one.

Motivation

This PR adds a known-desirable feature.

Part of MaterializeInc/database-issues#7898.

Checklist

This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
This PR includes the following user-facing behavior changes:
- N/A

RobinClowers · 2024-06-10T16:52:11Z

doc/developer/design/20240610_unified_compute_introspection.md

+- Whenever a replica connection is established, the compute controller installs all defined introspection subscribes on the replica, as replica-targeted subscribes.
+- Whenever a batch of updates arrives for an introspection subscribe, the compute controller (a) prepends the replica ID to all updates and (b) sends the thus enriched updates to the storage controller, tagged with the corresponding `IntrospectionType`.
+- The storage controller's `CollectionManager` appends the introspection updates to the collection associated with the given `IntrospectionType`.
+- Whenever a replica disconnects, the compute controller cancels its introspection subscribes and instructs the storage controller to delete all introspection data previously emitted for this replica.


Just to be clear, this means it would emit retractions for the data, so a SUBSCRIBE AS OF would still include data for the dropped replica, correct?

Yes! In this regard the unified introspection relations behave like any system relation that contains a replica_id column (mz_cluster_replicas, mz_cluster_replica_statuses, ...).

antiguru

Looks good, some comments around how we define the dataflows to serve subscribes.

antiguru · 2024-06-13T14:45:58Z

doc/developer/design/20240610_unified_compute_introspection.md

+The `DataflowDescription`, since it is defined in the compute controller without access to the catalog or the optimizer, must be written using LIR constructs and can only reference introspection indexes, not builtin sources or views that may be defined in the catalog.
+These are limitations that seem reasonable for an MVP.


Agree that that's OK for an MVP, but unlikely it's what we want for the final implementation. Expressing complicated queries for example with reductions is increasingly hard to do manually.

A different take would be that all computations on introspection data should be done in the form of logging dataflows because they're running on all replicas and could have resource impact. If we restrict the plans to essentially be subscribes to existing arrangements, this would be the lowest cost possible because it allows us to hand-optimize the dataflows.

That's true, but I doubt that handwriting a dataflow is any easier than writing the LIR definition (which you can avoid by adding a dbg!(dataflow) in the right place) :)

If we do this as a cost optimization, we'll want to avoid creating new indexes, so we'd have to introduce the concept of a "logging subscribe", i.e. an implicit subscribe that's installed on the replica at CreateInstance time. Seems feasible but not something we should consider if we don't observe a need. In contrast to the logging dataflows the introspection subscribes proposed in this design will show up in introspection, so we'll be able to see when they use non-negligible resources.

antiguru · 2024-06-13T14:48:43Z

doc/developer/design/20240610_unified_compute_introspection.md

+We propose extending the `CollectionManager`'s API to also accept deletion requests.
+A deletion request contains a collection ID, as well as a `filter` closure `Box<dyn Fn(&Row) -> bool>`.
+The `CollectionManager` handles a deletion request by reading back the target collection at its latest readable time, applying the `filter` closure to each read `Row`, and appending retractions for each `Row` the `filter` returned `true` for.
+Append and deletion requests must be applied by the `CollectionManager` in the same order they have been issued by the client, to guarantee that deletions will retract all previously sent appends.


What's the memory requirements for providing such a feature?

This depends a lot on the implementation of the persist client.

With the naive implementation (build the retractions in memory, then append them):

In the "best case" the persist client streams in one update at a time and we only need to keep those belonging to the deleted replica, so the memory usage would be equal to the size of existing updates of that replica.

In the realistic case, the persist client probably introduces some overhead because it gives us the updates in batches, not one at a time. We can hope that this is constant overhead, but I'm not sure.

In the worst case we'd need to keep the whole collection snapshot in memory.

Rather than collecting retractions in memory, we can use a BatchBuilder, which will flush out parts to S3 when they get too large. If we make sure the unified introspection collection stay small that's probably not an optimization we need.

Also note that the current Pv2 plan includes making the CollectionManager self-correcting (#27496), similar to the persist_sink, in which case it will need to read the whole snapshot into memory anyway.

benesch

I was slow to get to this, but LGTM! Thanks very much for writing this up. Extremely helpful context for me.

benesch · 2024-06-16T03:52:25Z

doc/developer/design/20240610_unified_compute_introspection.md

+In the interest of moving fast we opt to not follow this approach for the MVP implementation.
+If we find that the limitations of the controller-based approach are too great, we can revisit this decision.


Makes sense to me.

benesch · 2024-06-16T03:52:57Z

doc/developer/design/20240610_unified_compute_introspection.md

+This would allow the compute controller to mint its own collection identifiers without needing to synchronize with the coordinator.
+
+While this would be a desirable change, also for a possible `ALTER MATERIALIZED VIEW ... SET CLUSTER` feature, the required refactoring work would be significant.
+Again in the interest of moving fast, we opt for the simpler approach of sharing the `transient_id_gen` instead.


(Already discussed and approved on Slack, but recording here for posterity my 👍🏽 for this decision.)

benesch · 2024-06-16T03:54:47Z

doc/developer/design/20240610_unified_compute_introspection.md

+The storage controller delegates the writing of storage-managed collections to a background task called the `CollectionManager`.
+The `CollectionManager` accepts `GlobalId`s identifying storage collections and corresponding `(Row, Diff)` updates to be appended to these collections, and appends them at the current system time.
+It does not yet expose a mechanism to delete previously written updates from a collection by reading back the collection contents, determining the necessary retractions, and appending them.
+
+To remove introspection data for dropped/disconnected replicas from the unified introspection relations, we require such a deletion mechanism.
+Note that the compute controller is not itself able to determine the set of necessary retractions.
+While it can be taught to read the contents of storage-managed collections, it doesn't have any way of knowing the time as of which all previously emitted introspection data has been fully written to its target collection.
+The compute controller would risk reading back the target collection's contents too soon and retract only part of the introspection data previously emitted for the disconnected replica.
+
+We propose extending the `CollectionManager`'s API to also accept deletion requests.
+A deletion request contains a collection ID, as well as a `filter` closure `Box<dyn Fn(&Row) -> bool>`.
+The `CollectionManager` handles a deletion request by reading back the target collection at its latest readable time, applying the `filter` closure to each read `Row`, and appending retractions for each `Row` the `filter` returned `true` for.
+Append and deletion requests must be applied by the `CollectionManager` in the same order they have been issued by the client, to guarantee that deletions will retract all previously sent appends.


I'm assuming you and @aljoscha have already chatted about this, as this has an interesting interaction with #27496. I'm +1 on the design you propose in isolation, but I have not yet thought deeply about how it interacts with the 0dt work.

#27496 makes this a bit easier because because it makes the CollectionManager keep the whole snapshot state in memory, so we can build the retractions from there directly. Eventually we'll still want to move to the design described here, when the CollectionManager learns to only keep the desired-persist diff around.

teskje requested review from antiguru and a team June 10, 2024 14:07

RobinClowers reviewed Jun 10, 2024

View reviewed changes

design: Unified Compute Introspection

a99a99f

teskje force-pushed the unified-introspection-design branch from 2ed670d to a99a99f Compare June 11, 2024 10:19

teskje mentioned this pull request Jun 11, 2024

adapter: share transient GlobalId generator with the compute controller #27558

Merged

5 tasks

antiguru approved these changes Jun 13, 2024

View reviewed changes

benesch approved these changes Jun 16, 2024

View reviewed changes

teskje merged commit f1afc09 into MaterializeInc:main Jun 19, 2024
7 checks passed

teskje deleted the unified-introspection-design branch June 19, 2024 08:38

This was referenced Jun 19, 2024

compute: introspection subscribes (controller-managed) #27709

Closed

storage: deletion from differential storage-managed collections #27768

Merged

adapter: introspection subscribes (coordinator-managed) #27795

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

design: Unified Compute Introspection #27548

design: Unified Compute Introspection #27548

teskje commented Jun 10, 2024

RobinClowers Jun 10, 2024

teskje Jun 10, 2024

antiguru left a comment

antiguru Jun 13, 2024

antiguru Jun 13, 2024

teskje Jun 13, 2024

antiguru Jun 13, 2024

teskje Jun 13, 2024

benesch left a comment •

edited

Loading

benesch Jun 16, 2024

benesch Jun 16, 2024

benesch Jun 16, 2024

teskje Jun 17, 2024

		The `DataflowDescription`, since it is defined in the compute controller without access to the catalog or the optimizer, must be written using LIR constructs and can only reference introspection indexes, not builtin sources or views that may be defined in the catalog.
		These are limitations that seem reasonable for an MVP.

		In the interest of moving fast we opt to not follow this approach for the MVP implementation.
		If we find that the limitations of the controller-based approach are too great, we can revisit this decision.

design: Unified Compute Introspection #27548

design: Unified Compute Introspection #27548

Conversation

teskje commented Jun 10, 2024

Motivation

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

antiguru left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benesch left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benesch left a comment •

edited

Loading