Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix FileSystem closed after pr 23 #54

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

carrey-feng
Copy link

@carrey-feng carrey-feng commented Dec 23, 2021

Some queries fail cause by FileSystem closed exception when I used jmeter for load test.
My Environment:
presto : 0.265.1
presto-hadoop-apache2: 2.7.4-9
hadoop (with rbf and keberos) : 3.2.1
hive: 1.2.1
jmeter(use 100 threads to run ): 5.3

Query Error Infomation:
image


I suspect this problem is caused by the privatecredentials of the PrestoFileSystemCache class. So I add some logs in fileSystemRefresh code block of PrestoFileSystemCache.getInternal() , like this:

image


At the same time, add a log before close filesystem.

image

Run jmeter script again, I get these logs

image

As can be seen from the above logs. A total of 3 FileSystem created in one second, include two time FileSystemRefresh. Finally, the two filesystems are closed in the following one second. This leads to FileSystem closed exception. One second is so short, the filesystem that is closed is using by queries when it is closed.

Multiple filesystem refreshes occur because the newly acquired private credentials are always more than those in the cached filesystemholder at the beginning. Therefore, when determining whether to need filesystem refresh, we should replace equals() with containsAll(), like this

image

Run jmeter script again, just a filesystem is created after equals() with containsAll(). There is no any filesystem closed exception. But I think the problem has not been completely solved. Because FinalizerService make the time to close the filesystem uncontrollable. The FileSystem may be closed immediately cause by jvm gc after fileSystemHolder is removed from cache map, then FileSystem closed exception will occur again.

So, I think that the FileSystem should be delay closed after a configuratable time. I add a config presto.hdfs.expired.fs.delay.close.time to crontrol how long it takes to close the FileSystem after FileSystem is removed from PrestoFileSystemCache.map. the default value is 300000 ms(5 minutes). The log show 2 of 3 FileSystem are closed afiter 5 minutes.

image

@linux-foundation-easycla
Copy link

CLA Not Signed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant