-
Notifications
You must be signed in to change notification settings - Fork 896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize segmentwise recompression #7482
Conversation
Do we have benchmarks to confirm that this is faster? |
a1b8376
to
ae32773
Compare
* 4: frozen | ||
* 8: compressed_partial | ||
*/ | ||
if (!ts_chunk_is_compressed(uncompressed_chunk) && ts_chunk_is_partial(uncompressed_chunk)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition reads a little weird, how can chunk be uncompressed and partially compressed at the same time? Might be not a topic for this PR, but we might have to rename ts_chunk_is_compressed
to something like ts_chunk_is_fully_compressed
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that check seems to check for two bitfields which should always accompany each other (compressed and partially compressed). Having partially compressed without the compressed bit should be an internal error.
Maybe Ensure
is more suitable for this check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll do this in a follow-up PR since its the existing state which I've just moved to a separate file.
* 8: compressed_partial | ||
*/ | ||
if (!ts_chunk_is_compressed(uncompressed_chunk) && ts_chunk_is_partial(uncompressed_chunk)) | ||
elog(ERROR, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this still can be triggered by a user calling it for an unsuitable chunk? Needs better errcode then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this shouldn't not happen in regular execution. That's why I think Ensure
would be better suited here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, so this is the sorted merge recompression algorithm we discussed a long time ago.
a560496
to
c0c2c14
Compare
tsl_recompress_chunk_segmentwise(PG_FUNCTION_ARGS) | ||
{ | ||
Oid uncompressed_chunk_id = PG_ARGISNULL(0) ? InvalidOid : PG_GETARG_OID(0); | ||
bool if_not_compressed = PG_ARGISNULL(1) ? true : PG_GETARG_BOOL(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default is true here while in the function comment it is stated as false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will address this in a follow-up PR since its current state and has very little to do with the change here.
c0c2c14
to
4b4d270
Compare
With the recent change to compression schema, we can optimize segmentwise recompression and target only batches which are affected to be recompressed or compress tuples which don't belong to any batch without any decompression. This means narrowing down the amount of that we need to decompress and recompress, ultimately speeding up the operation and generating less bloat in the compressed chunk.
4b4d270
to
14f528c
Compare
With the recent change to compression schema, we can optimize segmentwise recompression and target only batches which are affected to be recompressed or compress tuples which don't belong to any batch without any decompression. This means narrowing down the amount of that we need to decompress and recompress, ultimately speeding up the operation and generating less bloat in the compressed chunk.
Benchmark with improvements (2-3x in certain cases):
https://grafana.ops.savannah-dev.timescale.com/d/fasYic_4z/compare-akuzm?orgId=1&var-branch=All&var-run1=3959&var-run2=3960&var-threshold=0&var-use_historical_thresholds=true&var-threshold_expression=2.5%20%2A%20percentile_cont%280.90%29&var-exact_suite_version=false