Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexplained websocket disconnection precisely 5 minutes after the connection is established #609

Open
James-Stokes opened this issue Apr 26, 2024 · 7 comments

Comments

@James-Stokes
Copy link

Tested using Paho version 4.2.4 on Andorid 11 and Android 7
I have been connecting to a mosquito (v2) broker using the mqtt protocol on port 1883 and everything is fine, but exactly the same setup using web sockets on port 9001 causes a disconnection exactly 5 minutes after connection every time (the auto reconnect then does its thing and the connection is re-established again almost instantly). It seems very easy to reproduce, I am using the default configs for mosquitto and paho and it happens every time. I have tried setting the keepalive interval to both 10 and 100 seconds, I see the PINGREQ and PINGRESP messages in the broker logs and that seems to be working fine.
I think that this is an issue with the client and not the broker because the Paho python client does not have the same issue with the same configuration and broker (also using websockets). I have tried bot SSL and plaintext, and brokers hosted in a k8s cluster and in my local testing environment, always a disconnection after 5 minutes.

Client initialisation:
mqttClient = MqttAndroidClient(this, BROKER_URL, clientId).also {
    it.setCallback(DefaultCallback())
}


Client connection:
val options = MqttConnectOptions()
options.isCleanSession = false
options.isAutomaticReconnect = true
options.userName = mqttClient.clientId
options.password = mPassword.toCharArray()
options.maxInflight = MAX_IN_FLIGHT
options.setWill(getScopedTopic(TOPIC_ONLINE), "0".toByteArray(), QOS_LEVEL, false)
Log.i(LOG_TAG, "Connecting as ${options.userName}")
val connectionCallback = object : IMqttActionListener {
    override fun onSuccess(token: IMqttToken?) {
        Log.i(LOG_TAG, "Connection success")
        val disconnectedBufferOptions = DisconnectedBufferOptions()
        disconnectedBufferOptions.isBufferEnabled = true
        disconnectedBufferOptions.bufferSize = BUFFER_SIZE
        disconnectedBufferOptions.isPersistBuffer = true
        disconnectedBufferOptions.isDeleteOldestMessages = true
        mqttClient.setBufferOpts(disconnectedBufferOptions)
    }

    override fun onFailure(token: IMqttToken?, exception: Throwable?) {
        Log.e(LOG_TAG, "Connection failed", exception)
    }
}
mqttClient.connect(options, this, connectionCallback)


Disconnection callback logs in Paho:
Connection lost (32109) - java.io.EOFException 
at org.eclipse.paho.client.mqttv3.internal.CommsReceiver.run(CommsReceiver.java:197)
at java.lang.Thread.run(Thread.java:761) Caused by: java.io.EOFException
at java.io.DataInputStream.readByte(DataInputStream.java:270)
at org.eclipse.paho.client.mqttv3.internal.wire.MqttInputStream.readMqttWireMessage(MqttInputStream.java:92)
at org.eclipse.paho.client.mqttv3.internal.CommsReceiver.run(CommsReceiver.java:137)
at java.lang.Thread.run(Thread.java:761)

Mosquitto broker logs:
1714144958: Sending PUBREC to faba9c974541 (m30, rc0)
1714144959: Received PUBREL from faba9c974541 (Mid: 30)
1714144959: Sending PUBCOMP to faba9c974541 (m30)
1714145019: Received PINGREQ from faba9c974541 
1714145019: Sending PINGRESP to faba9c974541 
1714145041: Client faba9c974541 closed its connection.
@hannesa2
Copy link
Owner

This #631 tries to test it

@hannesa2
Copy link
Owner

hannesa2 commented Jun 3, 2024

I made an own tab to observer ping behavior #644

and when I try a reconnect of connection lost #646 I see that

  1. ping was first working,
  2. then a ping was canceled (Question 1: why ?)
  3. then a connection lost appears
  4. an automatic retry failed, because
  5. an UnknowHostException appears

I've no explanation. does someone understand what's going on here ?

(PingWorker.kt:19)           info.mqtt.android.extsample D  .doWork(): Sending Ping at: 2024-06-03 19:33:03Z
(AlarmPingSender.kt:46)      info.mqtt.android.extsample D  .schedule(): Schedule next alarm at 2024-06-03 19:36:23Z
(PingWorker.kt:24)           info.mqtt.android.extsample D  $doWork.onSuccess(): Success null
WM-WorkerWrapper             info.mqtt.android.extsample I  Work [ id=636a1c87-c85e-45d4-9129-e1efff1985f1, tags={ info.mqtt.android.service.ping.PingWorker } ] was cancelled (Ask Gemini)
                                                            java.util.concurrent.CancellationException: Task was cancelled.
                                                            	at androidx.work.impl.utils.futures.AbstractFuture.cancellationExceptionWithCause(AbstractFuture.java:1183)
                                                            	at androidx.work.impl.utils.futures.AbstractFuture.getDoneValue(AbstractFuture.java:513)
                                                            	at androidx.work.impl.utils.futures.AbstractFuture.get(AbstractFuture.java:474)
                                                            	at androidx.work.impl.WorkerWrapper$2.run(WorkerWrapper.java:316)
                                                            	at androidx.work.impl.utils.SerialExecutorImpl$Task.run(SerialExecutorImpl.java:96)
                                                            	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
                                                            	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:644)
                                                            	at java.lang.Thread.run(Thread.java:1012)
(AlarmPingSender.kt:46)      info.mqtt.android.extsample D  .schedule(): Schedule next alarm at 2024-06-03 19:33:55Z
(MqttAndroidClient.kt:884)   info.mqtt.android.extsample V  .onReceive(): Bundle[{.exceptionStack=Connection lost (32109) - java.net.SocketException: Software caused connection abort (Ask Gemini)
                                                            	at org.eclipse.paho.client.mqttv3.internal.CommsReceiver.run(CommsReceiver.java:197)
                                                            	at java.lang.Thread.run(Thread.java:1012)
                                                            Caused by: java.net.SocketException: Software caused connection abort
                                                            	at java.net.SocketInputStream.socketRead0(Native Method)
                                                            	at java.net.SocketInputStream.socketRead(SocketInputStream.java:118)
                                                            	at java.net.SocketInputStream.read(SocketInputStream.java:173)
                                                            	at java.net.SocketInputStream.read(SocketInputStream.java:143)
                                                            	at java.net.SocketInputStream.read(SocketInputStream.java:205)
                                                            	at java.io.DataInputStream.readByte(DataInputStream.java:273)
                                                            	at org.eclipse.paho.client.mqttv3.internal.wire.MqttInputStream.readMqttWireMessage(MqttInputStream.java:92)
                                                            	at org.eclipse.paho.client.mqttv3.internal.CommsReceiver.run(CommsReceiver.java:137)
                                                            	at java.lang.Thread.run(Thread.java:1012) 
                                                            , .errorMessage=Connection lost, .exception=Connection lost (32109) - java.net.SocketException: Software caused connection abort, .callbackAction=onConnectionLost, .clientHandle=tcp://broker.hivemq.com:1883:ExtendedSampleClient:info.mqtt.android.extsample, .callbackStatus=OK}]
(MqttCallbackHandler.kt:26)  info.mqtt.android.extsample W  .connectionLost(): MqttException Connection lost
(MqttCallbackHandler.kt:37)  info.mqtt.android.extsample I  .connectionLost(): Try to reconnect
TrafficStats                 info.mqtt.android.extsample D  tagSocket(90) with statsTag=0xffffffff, statsUid=-1
(PingWorker.kt:19)           info.mqtt.android.extsample D  .doWork(): Sending Ping at: 2024-06-03 19:33:55Z
WM-WorkerWrapper             info.mqtt.android.extsample I  Worker result FAILURE for Work [ id=ce7563fc-fd26-4c0f-8dff-5d07b8f82832, tags={ info.mqtt.android.service.ping.PingWorker } ]
TrafficStats                 info.mqtt.android.extsample D  tagSocket(90) with statsTag=0xffffffff, statsUid=-1
(MqttAndroidClient.kt:884)   info.mqtt.android.extsample V  .onReceive(): Bundle[{.invocationContext=null, .errorMessage=MqttException, .exception=MqttException (0) - java.net.UnknownHostException: broker.hivemq.com, .activityToken=2, .callbackAction=connect, .clientHandle=tcp://broker.hivemq.com:1883:ExtendedSampleClient:info.mqtt.android.extsample, .callbackStatus=ERROR}]
(ActionListener.kt:100)      info.mqtt.android.extsample E  .onFailure(): CONNECT token=info.mqtt.android.service.MqttTokenAndroid@a2cdbbe (Ask Gemini)
                                                            MqttException (0) - java.net.UnknownHostException: broker.hivemq.com
                                                            	at org.eclipse.paho.client.mqttv3.internal.ExceptionHelper.createMqttException(ExceptionHelper.java:38)
                                                            	at org.eclipse.paho.client.mqttv3.internal.ClientComms$ConnectBG.run(ClientComms.java:738)
                                                            	at java.lang.Thread.run(Thread.java:1012)
                                                            Caused by: java.net.UnknownHostException: broker.hivemq.com
                                                            	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:208)
                                                            	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:436)
                                                            	at java.net.Socket.connect(Socket.java:646)
                                                            	at org.eclipse.paho.client.mqttv3.internal.TCPNetworkModule.start(TCPNetworkModule.java:74)
                                                            	at org.eclipse.paho.client.mqttv3.internal.ClientComms$ConnectBG.run(ClientComms.java:724)
                                                            	at java.lang.Thread.run(Thread.java:1012) 
(ActionListener.kt:157)      info.mqtt.android.extsample E  .connect(): Client failed to connect MqttException

@mistralaix
Copy link

We are also experiencing the same problem: we are connecting to a mqtt broker through websockets and it is disconnecting after exactly 5 minutes too.

Did someone figured out a solution or temporary fix?

@hannesa2
Copy link
Owner

hannesa2 commented Aug 2, 2024

Currently: no, it ends up with a java.net.UnknownHostException which is previous known
I've no clue why

@FJEagle
Copy link
Contributor

FJEagle commented Sep 11, 2024

We are also experiencing the same problem: we are connecting to a mqtt broker through websockets and it is disconnecting after exactly 5 minutes too.

Did someone figured out a solution or temporary fix?

Could it be telecom network heartbeat problem?
As I know, some telecom operator would disconnect the long-lived connection at certain duration.
To confirm it: try to connect to a server in LAN, would it disappear?

@hannesa2
Copy link
Owner

concerning this java.util.concurrent.CancellationException

These could be one of the reason, keep in mind it's not an error

  • Explicit Cancellation: You might have called WorkManager.cancelWorkById(workId) or WorkManager.cancelAllWorkByTag(tag) in your code.
  • Constraints Not Met: If your worker has constraints (e.g., network availability), and those constraints are no longer met, WorkManager might cancel it.
  • Replacing or Unique Work: If you enqueued a new work request that replaces or is unique to the existing PingWorker, WorkManager would cancel the previous one.
  • App Closure or System Events: In some cases, system events or app closure can lead to worker cancellation.

@hannesa2
Copy link
Owner

We are also experiencing the same problem: we are connecting to a mqtt broker through websockets and it is disconnecting after exactly 5 minutes too.
Did someone figured out a solution or temporary fix?

Could it be telecom network heartbeat problem? As I know, some telecom operator would disconnect the long-lived connection at certain duration. To confirm it: try to connect to a server in LAN, would it disappear?

@FJEagle
If a telecom operator kill my connection, in my world it should still know the host. Why even the DNS entry is lost... no clue.
Maybe a test with an IP address makes sense ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants