There are too many small files in the delta-log directory in the collection #37438
-
There are too many small files in the delta-log directory in the collection. Currently, we have encountered a collection with 10 million small files. Is there any way to reduce these small files |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 12 replies
-
I guess you called collection.flush() after each delete(). Each flush() will generate a tiny file. |
Beta Was this translation helpful? Give feedback.
-
Are you doing frequent deletes? if so then you need to check why compaction is not triggered? if you can offer logs for datacoord and datanode we can help on investigate on why |
Beta Was this translation helpful? Give feedback.
-
please also have an eye on this |
Beta Was this translation helpful? Give feedback.
The initial suggestion I have is to minimize the creation of excessive segments. It's likely that the frequent execution of
#flush()
calls or improperly configured import tasks have led to the generation of many small segments. You might consider modifying these operations to prevent the creation of additional small segments.As for optimizing compaction, here are a couple of adjustments you could consider:
datacoord.compaction.taskPrioritizer=mix
will prioritize mix compactions over L0 compactions. Please be aware that this is an advanced configuration option, and it will block L0 compactions until the mix compactions are completed.