Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Some queries fail cause by FileSystem closed exception when I used jmeter for load test.
My Environment:
presto : 0.265.1
presto-hadoop-apache2: 2.7.4-9
hadoop (with rbf and keberos) : 3.2.1
hive: 1.2.1
jmeter(use 100 threads to run ): 5.3
Query Error Infomation:
I suspect this problem is caused by the privatecredentials of the PrestoFileSystemCache class. So I add some logs in fileSystemRefresh code block of PrestoFileSystemCache.getInternal() , like this:
At the same time, add a log before close filesystem.
Run jmeter script again, I get these logs
As can be seen from the above logs. A total of 3 FileSystem created in one second, include two time FileSystemRefresh. Finally, the two filesystems are closed in the following one second. This leads to FileSystem closed exception. One second is so short, the filesystem that is closed is using by queries when it is closed.
Multiple filesystem refreshes occur because the newly acquired private credentials are always more than those in the cached filesystemholder at the beginning. Therefore, when determining whether to need filesystem refresh, we should replace equals() with containsAll(), like this
Run jmeter script again, just a filesystem is created after equals() with containsAll(). There is no any filesystem closed exception. But I think the problem has not been completely solved. Because FinalizerService make the time to close the filesystem uncontrollable. The FileSystem may be closed immediately cause by jvm gc after fileSystemHolder is removed from cache map, then FileSystem closed exception will occur again.
So, I think that the FileSystem should be delay closed after a configuratable time. I add a config presto.hdfs.expired.fs.delay.close.time to crontrol how long it takes to close the FileSystem after FileSystem is removed from PrestoFileSystemCache.map. the default value is 300000 ms(5 minutes). The log show 2 of 3 FileSystem are closed afiter 5 minutes.