Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(java-client): fix client renew TGT thread leak when request table doesnt exist #1970

Merged
merged 12 commits into from
Apr 17, 2024

Conversation

Samunroyu
Copy link
Collaborator

@Samunroyu Samunroyu commented Apr 9, 2024

In the Kerberos environment, java client will check TGT and relogin
kerberos if needed every 10 seconds by a thread. But when client
requests table which doesnt exist will cause threads leak.

The root cause this problem is when Pegasus client close the client
but don't close TGT thread pool.

This patch adds KerberosProtocol close function.

@shalk
Copy link
Collaborator

shalk commented Apr 9, 2024

Handle Exception is a good choice. Exception will make the schedule job not schedule again

But i am not quite understand the main cause about daemon
I don't kown the relatetionship between daemon thread with exception handler @Samunroyu

if you set daemon false, you should shutdown the threadpool when client jvm shutdown.

@Apache9
Copy link
Contributor

Apache9 commented Apr 9, 2024

Handle Exception is a good choice. Exception will make the schedule job not schedule again

But i am not quite understand the main cause about daemon I don't kown the relatetionship between daemon thread with exception handler @Samunroyu

if you set daemon false, you should shutdown the threadpool when client jvm shutdown.

I‘ve talked with @Samunroyu offline, the root cause here is we do not have a close method for KerberosProtocol class, so if we create the client many time due to kerberos authentication error, although we have already closed all the created client, the KerberosProtocol will not be closed, so the ExecutorService will be leaked and cause too many threads.

@shalk
Copy link
Collaborator

shalk commented Apr 10, 2024

Handle Exception is a good choice. Exception will make the schedule job not schedule again
But i am not quite understand the main cause about daemon I don't kown the relatetionship between daemon thread with exception handler @Samunroyu
if you set daemon false, you should shutdown the threadpool when client jvm shutdown.

I‘ve talked with @Samunroyu offline, the root cause here is we do not have a close method for KerberosProtocol class, so if we create the client many time due to kerberos authentication error, although we have already closed all the created client, the KerberosProtocol will not be closed, so the ExecutorService will be leaked and cause too many threads.

It make sense. I get the case about thread leak.

if the pr finish and ci pass, click request , i will review again @Samunroyu

@Samunroyu Samunroyu force-pushed the dev/yjw/client_thread_leak branch 2 times, most recently from b91b9f4 to 89d9712 Compare April 11, 2024 07:58
@Samunroyu Samunroyu requested review from shalk and Apache9 April 12, 2024 03:53
final int CHECK_TGT_INTEVAL_SECONDS = 10;
final ScheduledExecutorService service =
Executors.newSingleThreadScheduledExecutor(
new ThreadFactory() {
@Override
public Thread newThread(Runnable r) {
Thread t = new Thread(r);
Thread t = new Thread(r, "TGT renew for pegasus");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better introduce a unique suffix or prefix, otherwise if there are multiple client you will see multiple threads with the same name.

@@ -96,11 +97,23 @@ public boolean isAuthRequest(final ReplicaSession.RequestEntry entry) {
}

private void scheduleCheckTGTAndRelogin() {
Runnable checkTGTAndReLogin =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could still use lambda here.

    service.scheduleAtFixedRate(
        () -> {
            try {
              checkTGTAndRelogin();
            } catch (Exception e) {
              logger.warn(
                  "check TGT and ReLogin kerberos failed , will retry after {} seconds.",
                  CHECK_TGT_INTEVAL_SECONDS,
                  e);
            }
        }
        CHECK_TGT_INTEVAL_SECONDS,
        CHECK_TGT_INTEVAL_SECONDS,
        TimeUnit.SECONDS);

@Samunroyu Samunroyu force-pushed the dev/yjw/client_thread_leak branch from 512a098 to 86aaba5 Compare April 16, 2024 09:54
@Samunroyu Samunroyu force-pushed the dev/yjw/client_thread_leak branch from 86aaba5 to ea19a6d Compare April 16, 2024 11:35
Copy link
Collaborator

@shalk shalk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thread leak has fixed

final int CHECK_TGT_INTEVAL_SECONDS = 10;
final ScheduledExecutorService service =
Executors.newSingleThreadScheduledExecutor(
new ThreadFactory() {
@Override
public Thread newThread(Runnable r) {
Thread t = new Thread(r);
String timestamp = new SimpleDateFormat("HHmmss").format(new Date());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from JDK8 , advise migration to the java.time classes.
String timestamp = LocalTime.now().format(DateTimeFormatter.ofPattern("HHmmss"));

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from JDK8 , advise migration to the java.time classes. String timestamp = LocalTime.now().format(DateTimeFormatter.ofPattern("HHmmss"));

ok, i have changed to use java.time.

@acelyc111 acelyc111 merged commit a73cec2 into apache:master Apr 17, 2024
18 checks passed
@Samunroyu Samunroyu deleted the dev/yjw/client_thread_leak branch May 30, 2024 03:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants