Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
107939: roachtest: add changefeed workload benchmarks r=erikgrinaker a=erikgrinaker This patch adds a set of benchmarks measuring the workload impact of a changefeed. The workload is single-row KV read-only and write-only, recording the throughput and latencies of the workload both without and with a changefeed running, graphed via roachperf. The watermark lag is also logged, but not recorded or asserted. ``` cdc/workload/kv0/nodes=5/cpu=16/ranges=100/control [cdc] cdc/workload/kv0/nodes=5/cpu=16/ranges=100/server=processor/protocol=mux/format=json/sink=null [cdc] cdc/workload/kv0/nodes=5/cpu=16/ranges=100/server=processor/protocol=rangefeed/format=json/sink=null [cdc] cdc/workload/kv0/nodes=5/cpu=16/ranges=100000/control [cdc] cdc/workload/kv0/nodes=5/cpu=16/ranges=100000/server=processor/protocol=mux/format=json/sink=null [cdc] cdc/workload/kv0/nodes=5/cpu=16/ranges=100000/server=processor/protocol=rangefeed/format=json/sink=null [cdc] cdc/workload/kv100/nodes=5/cpu=16/ranges=100/control [cdc] cdc/workload/kv100/nodes=5/cpu=16/ranges=100/server=processor/protocol=mux/format=json/sink=null [cdc] cdc/workload/kv100/nodes=5/cpu=16/ranges=100/server=processor/protocol=rangefeed/format=json/sink=null [cdc] cdc/workload/kv100/nodes=5/cpu=16/ranges=100000/control [cdc] cdc/workload/kv100/nodes=5/cpu=16/ranges=100000/server=processor/protocol=mux/format=json/sink=null [cdc] cdc/workload/kv100/nodes=5/cpu=16/ranges=100000/server=processor/protocol=rangefeed/format=json/sink=null [cdc] ``` Resolves cockroachdb#107441. Release note: None 108080: upgrades: avoid crdb_internal.system_jobs in upgrade manager r=adityamaru a=stevendanna The crdb_internal.system_jobs is a virtual table that joins information from the jobs table and the jobs_info table. When given a job status predicate it does this by running a query such as: WITH latestpayload AS ( SELECT job_id, value FROM system.job_info AS payload WHERE info_key = 'legacy_payload' ORDER BY written DESC ), latestprogress AS ( SELECT job_id, value FROM system.job_info AS progress WHERE info_key = 'legacy_progress' ORDER BY written DESC ) SELECT distinct(id), status, created, payload.value AS payload, progress.value AS progress, created_by_type, created_by_id, claim_session_id, claim_instance_id, num_runs, last_run,job_type FROM system.jobs AS j INNER JOIN latestpayload AS payload ON j.id = payload.job_id LEFT JOIN latestprogress AS progress ON j.id = progress.job_id WHERE j.status = 'cancel-requested'; This uses 2 full scans of the job_info table: ``` • distinct │ distinct on: id, value, value │ └── • merge join │ equality: (job_id) = (id) │ ├── • render │ │ │ └── • filter │ │ estimated row count: 2,787 │ │ filter: info_key = 'legacy_payload' │ │ │ └── • scan │ estimated row count: 5,597 (100% of the table; stats collected 27 minutes ago; using stats forecast for 17 minutes ago) │ table: job_info@primary │ spans: FULL SCAN │ └── • merge join (right outer) │ equality: (job_id) = (id) │ right cols are key │ ├── • render │ │ │ └── • filter │ │ estimated row count: 2,787 │ │ filter: info_key = 'legacy_progress' │ │ │ └── • scan │ estimated row count: 5,597 (100% of the table; stats collected 27 minutes ago; using stats forecast for 17 minutes ago) │ table: job_info@primary │ spans: FULL SCAN │ └── • index join │ table: jobs@primary │ └── • sort │ order: +id │ └── • scan missing stats table: jobs@jobs_status_created_idx spans: [/'cancel-requested' - /'cancel-requested'] ``` Previously, the upgrade manager was using this virtual table as part of a larger query: SELECT id, status FROM ( SELECT id, status, crdb_internal.pb_to_json( 'cockroach.sql.jobs.jobspb.Payload', payload, false ) AS pl FROM crdb_internal.system_jobs WHERE status IN ('running', 'pending', 'cancel-requested', 'pause-requested', 'reverting', 'paused') ) WHERE pl->'migration'->'clusterVersion' = $1::JSONB; I believe the use of the IN operator causes the virtual index's populate function to be called for each value. Perhaps the optimizer accounts for this in some way to avoid this resulting in 2 * 6 full scans of the job table, but it is hard to confirm with the explain output. In at least one recent escalation, we observed this query taking a substantial amount of time as it continually conflicted with other job system queries. Here, we avoid using the virtual table. This allows us to avoid the full scasn of the info table since we don't need the progress (only the payload). It also allows us to use the full `IN` predicate directly, avoiding any uncertainty. ``` • root │ ├── • hash join │ │ equality: (job_id) = (id) │ │ right cols are key │ │ │ ├── • render │ │ │ │ │ └── • lookup join │ │ │ table: job_info@primary │ │ │ equality: (id, lookup_join_const_col_@16) = (job_id,info_key) │ │ │ │ │ └── • render │ │ │ │ │ └── • scan buffer │ │ label: buffer 1 (running_migration_jobs) │ │ │ └── • scan buffer │ label: buffer 1 (running_migration_jobs) │ └── • subquery │ id: `@S1` │ original sql: SELECT id, status FROM system.jobs WHERE (status IN ('running', 'pending', 'cancel-requested', 'pause-requested', 'reverting', 'paused')) AND (job_type = 'MIGRATION') │ exec mode: all rows │ └── • buffer │ label: buffer 1 (running_migration_jobs) │ └── • filter │ filter: status IN ('cancel-requested', 'pause-requested', 'paused', 'pending', 'reverting', 'running') │ └── • index join │ table: jobs@primary │ └── • scan missing stats table: jobs@jobs_job_type_idx spans: [/'MIGRATION' - /'MIGRATION'] ``` In a local example, this is substantially faster ``` root@localhost:26257/defaultdb> SELECT id, status -> FROM ( -> SELECT id, -> status, -> crdb_internal.pb_to_json( -> 'cockroach.sql.jobs.jobspb.Payload', -> payload, -> false -- emit_defaults -> ) AS pl -> FROM crdb_internal.system_jobs -> WHERE status IN ('running', 'pending', 'cancel-requested', 'pause-requested', 'reverting', 'paused') -> ) -> WHERE pl->'migration'->'clusterVersion' = '{"activeVersion": {"internal": 84, "majorVal": 22, "minorVal": 2}}'::JSONB; id | status -----+--------- (0 rows) Time: 384ms total (execution 384ms / network 0ms) root@localhost:26257/defaultdb> WITH -> running_migration_jobs AS ( -> SELECT id, status -> FROM system.jobs -> WHERE status IN ('running', 'pending', 'cancel-requested', 'pause-requested', 'reverting', 'paused') -> AND job_type = 'MIGRATION' -> ), -> payloads AS ( -> SELECT job_id, value -> FROM system.job_info AS payload -> WHERE info_key = 'legacy_payload' -> AND job_id IN (SELECT id FROM running_migration_jobs) -> ORDER BY written DESC -> ) -> SELECT id, status FROM ( -> SELECT id, status, crdb_internal.pb_to_json('cockroach.sql.jobs.jobspb.Payload', payloads.value, false) AS pl -> FROM running_migration_jobs AS j -> INNER JOIN payloads ON j.id = payloads.job_id -> ); id | status -----+--------- (0 rows) Time: 3ms total (execution 2ms / network 0ms) ``` Note that the new query will return 2 rows if we happen to have 2 legacy_payload keys for a given job. This will result in an assertion failure. But I think this is reasonable since we take care to only ever have 1 legacy payload row. We should do more work to understand contention within the job system, but perhaps speeding up this query will help a bit. Epic: None Release note: None Co-authored-by: Erik Grinaker <[email protected]> Co-authored-by: Steven Danna <[email protected]>
- Loading branch information