You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
All datasets should use Compaction 2.0 by default.
The SPECIAL_COMPACTION_DATASETS environment variable should be removed.
Code used solely by Compaction 1.0 should be removed.
Table metrics are still collected on compaction.
Why
Currently, Forklift supports two modes of compaction. 1.0 is the original, requiring a complete table rewrite each time data is compacted. It is slow, error-prone, and expensive in data transfer costs.
2.0 is the new version, and has been in use for certain configured datasets for nearly a year. It handles compaction via hourly bulk inserts, and nightly compaction of single partitions. It is faster, more reliable, and less expensive than 1.0, but requires datasets to have an OS_PARTITION field (this can be added by hand or automatically by the first compaction)
The text was updated successfully, but these errors were encountered:
What
SPECIAL_COMPACTION_DATASETS
environment variable should be removed.Why
Currently, Forklift supports two modes of compaction. 1.0 is the original, requiring a complete table rewrite each time data is compacted. It is slow, error-prone, and expensive in data transfer costs.
2.0 is the new version, and has been in use for certain configured datasets for nearly a year. It handles compaction via hourly bulk inserts, and nightly compaction of single partitions. It is faster, more reliable, and less expensive than 1.0, but requires datasets to have an
OS_PARTITION
field (this can be added by hand or automatically by the first compaction)The text was updated successfully, but these errors were encountered: