Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forklift: Make Compaction 2.0 the Default #1136

Open
4 tasks
JarredOlson opened this issue Jan 5, 2021 · 3 comments
Open
4 tasks

Forklift: Make Compaction 2.0 the Default #1136

JarredOlson opened this issue Jan 5, 2021 · 3 comments

Comments

@JarredOlson
Copy link
Contributor

JarredOlson commented Jan 5, 2021

What

  • All datasets should use Compaction 2.0 by default.
  • The SPECIAL_COMPACTION_DATASETS environment variable should be removed.
  • Code used solely by Compaction 1.0 should be removed.
  • Table metrics are still collected on compaction.

Why

Currently, Forklift supports two modes of compaction. 1.0 is the original, requiring a complete table rewrite each time data is compacted. It is slow, error-prone, and expensive in data transfer costs.

2.0 is the new version, and has been in use for certain configured datasets for nearly a year. It handles compaction via hourly bulk inserts, and nightly compaction of single partitions. It is faster, more reliable, and less expensive than 1.0, but requires datasets to have an OS_PARTITION field (this can be added by hand or automatically by the first compaction)

@JulieMaterni
Copy link

@LtChae can you fill in details on this card?

@JarredOlson
Copy link
Contributor Author

I believe this was a placeholder for us to determine if we want to use partitioning for all datasets.

@dmiree dmiree added the On Hold Assigned to cards who were originally under the column titled 'On Hold' label Jun 16, 2021
@LtChae LtChae changed the title Update Compaction to use Partitioning Forklift: Make Compaction 2.0 the Default Jun 18, 2021
@LtChae LtChae added the enhancement New feature or request label Jun 18, 2021
@ksmith-accenture ksmith-accenture added dev and removed On Hold Assigned to cards who were originally under the column titled 'On Hold' labels Jul 9, 2021
@jessicapfoster
Copy link

This is a cost-saving opportunity for AWS costs.

@ksmith-accenture ksmith-accenture added tech debt and removed enhancement New feature or request labels Aug 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants