You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 6, 2018. It is now read-only.
Kafka web console freezing/stopping or dying too frequently.
I don't think it's a problem on the OS level.
Seems to be a problem on the application level.
I've already fixed open file handlers to 98000 for anybody and time_waits to 30s instead of the default 5 minutes.
From what I can see from the logs, it starts with play:
[ESC[31merrorESC[0m] play - Cannot invoke the action, eventually got an error: java.lang.RuntimeException: Exception while executing statement : IO Exception: "java.io.IOException: Too many open files"; "/etc/kafka-web-console/play"; SQL statement:
delete from offsetPoints
where
(offsetPoints.offsetHistoryId = ?) [90031-172]
errorCode: 90031, sqlState: 90031
Caused by: java.lang.RuntimeException: Exception while executing statement : IO Exception: "java.io.IOException: Too many open files"; "/etc/kafka-web-console/play"; SQL statement:
delete from offsetPoints
where
(offsetPoints.offsetHistoryId = ?) [90031-172]
errorCode: 90031, sqlState: 90031
delete from offsetPoints
then this seems to cause socket connection errors:
Caused by: java.io.IOException: Too many open files
at java.io.UnixFileSystem.createFileExclusively(Native Method) ~[na:1.7.0_75]
at java.io.File.createNewFile(File.java:1006) ~[na:1.7.0_75]
at org.h2.store.fs.FilePathDisk.createTempFile(FilePathDisk.java:367) ~[h2.jar:1.3.172]
at org.h2.store.fs.FileUtils.createTempFile(FileUtils.java:329) ~[h2.jar:1.3.172]
at org.h2.engine.Database.createTempFile(Database.java:1529) ~[h2.jar:1.3.172]
at org.h2.result.RowList.writeAllRows(RowList.java:90) ~[h2.jar:1.3.172]
[ESC[36mdebugESC[0m] application - Getting partition leaders for topic topic-exist-test
[ESC[36mdebugESC[0m] application - Getting partition leaders for topic topic-rep-3-test
[ESC[36mdebugESC[0m] application - Getting partition leaders for topic PofApiTest
[ESC[36mdebugESC[0m] application - Getting partition leaders for topic PofApiTest-2
[ESC[36mdebugESC[0m] application - Getting partition leaders for topic fileread
[ESC[36mdebugESC[0m] application - Getting partition leaders for topic pageview
[ESC[36mdebugESC[0m] application - Getting partition log sizes for topic topic-exist-test from partition leaders 10.100.71.42:9092, 10.100.71.42:9092, 10.100.71.42:9092, 10.100.71.42:9092, 10.100.71.42:9092, 10.100.71.42:9092, 10.100.71.42:9092, 10.100.71.42:9092
[ESC[33mwarnESC[0m] application - Could not connect to partition leader 10.100.71.42:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader 10.100.71.42:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader 10.100.71.42:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader 10.100.71.42:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader 10.100.71.42:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader 10.100.71.42:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader 10.100.71.42:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader 10.100.71.42:9092. Error message: Failed to open a socket.
[ESC[36mdebugESC[0m] application - Getting partition offsets for topic topic-exist-test
-jar:9092, exemplary-birds:9092, voluminous-mass:9092
[ESC[33mwarnESC[0m] application - Could not connect to partition leader voluminous-mass:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader exemplary-birds:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader harmful-jar:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader voluminous-mass:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader exemplary-birds:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader harmful-jar:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader voluminous-mass:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader exemplary-birds:9092. Error message: Failed to open a socket.
[ESC[36mdebugESC[0m] application - Getting partition offsets for topic PofApiTest
[ESC[36mdebugESC[0m] application - Getting partition log sizes for topic topic-rep-3-test from partition leaders exemplary-birds:9092, voluminous-mass:9092, harmful-jar:9092, exemplary-birds:9092, voluminous-mass:9092, harmful-jar:9092, exemplary-birds:9092, voluminous-mass:9092
[ESC[36mdebugESC[0m] application - Getting partition log sizes for topic fileread from partition leaders voluminous-mass:9092, harmful-jar:9092, exemplary-birds:9092, voluminous-mass:9092, harmful-jar:9092, exemplary-birds:9092, voluminous-mass:9092, harmful-jar:9092
[ESC[33mwarnESC[0m] application - Could not connect to partition leader exemplary-birds:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader harmful-jar:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader voluminous-mass:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader exemplary-birds:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader harmful-jar:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader voluminous-mass:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader exemplary-birds:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader harmful-jar:9092. Error message: Failed to open a socket.
[ESC[36mdebugESC[0m] application - Getting partition offsets for topic PofApiTest-2
But that only lasts for about 30s to 1minute then supervisord seems to restart webconsole after these time_waits go way or the sockets and files are properly closed or they get flushed from either play/webconsole or kafka.
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Kafka web console freezing/stopping or dying too frequently.
I don't think it's a problem on the OS level.
Seems to be a problem on the application level.
I've already fixed open file handlers to 98000 for anybody and time_waits to 30s instead of the default 5 minutes.
From what I can see from the logs, it starts with play:
[ESC[31merrorESC[0m] play - Cannot invoke the action, eventually got an error: java.lang.RuntimeException: Exception while executing statement : IO Exception: "java.io.IOException: Too many open files"; "/etc/kafka-web-console/play"; SQL statement:
delete from offsetPoints
where
(offsetPoints.offsetHistoryId = ?) [90031-172]
errorCode: 90031, sqlState: 90031
Caused by: java.lang.RuntimeException: Exception while executing statement : IO Exception: "java.io.IOException: Too many open files"; "/etc/kafka-web-console/play"; SQL statement:
delete from offsetPoints
where
(offsetPoints.offsetHistoryId = ?) [90031-172]
errorCode: 90031, sqlState: 90031
delete from offsetPoints
then this seems to cause socket connection errors:
Caused by: java.io.IOException: Too many open files
at java.io.UnixFileSystem.createFileExclusively(Native Method) ~[na:1.7.0_75]
at java.io.File.createNewFile(File.java:1006) ~[na:1.7.0_75]
at org.h2.store.fs.FilePathDisk.createTempFile(FilePathDisk.java:367) ~[h2.jar:1.3.172]
at org.h2.store.fs.FileUtils.createTempFile(FileUtils.java:329) ~[h2.jar:1.3.172]
at org.h2.engine.Database.createTempFile(Database.java:1529) ~[h2.jar:1.3.172]
at org.h2.result.RowList.writeAllRows(RowList.java:90) ~[h2.jar:1.3.172]
[ESC[36mdebugESC[0m] application - Getting partition leaders for topic topic-exist-test
[ESC[36mdebugESC[0m] application - Getting partition leaders for topic topic-rep-3-test
[ESC[36mdebugESC[0m] application - Getting partition leaders for topic PofApiTest
[ESC[36mdebugESC[0m] application - Getting partition leaders for topic PofApiTest-2
[ESC[36mdebugESC[0m] application - Getting partition leaders for topic fileread
[ESC[36mdebugESC[0m] application - Getting partition leaders for topic pageview
[ESC[36mdebugESC[0m] application - Getting partition log sizes for topic topic-exist-test from partition leaders 10.100.71.42:9092, 10.100.71.42:9092, 10.100.71.42:9092, 10.100.71.42:9092, 10.100.71.42:9092, 10.100.71.42:9092, 10.100.71.42:9092, 10.100.71.42:9092
[ESC[33mwarnESC[0m] application - Could not connect to partition leader 10.100.71.42:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader 10.100.71.42:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader 10.100.71.42:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader 10.100.71.42:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader 10.100.71.42:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader 10.100.71.42:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader 10.100.71.42:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader 10.100.71.42:9092. Error message: Failed to open a socket.
[ESC[36mdebugESC[0m] application - Getting partition offsets for topic topic-exist-test
-jar:9092, exemplary-birds:9092, voluminous-mass:9092
[ESC[33mwarnESC[0m] application - Could not connect to partition leader voluminous-mass:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader exemplary-birds:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader harmful-jar:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader voluminous-mass:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader exemplary-birds:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader harmful-jar:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader voluminous-mass:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader exemplary-birds:9092. Error message: Failed to open a socket.
[ESC[36mdebugESC[0m] application - Getting partition offsets for topic PofApiTest
[ESC[36mdebugESC[0m] application - Getting partition log sizes for topic topic-rep-3-test from partition leaders exemplary-birds:9092, voluminous-mass:9092, harmful-jar:9092, exemplary-birds:9092, voluminous-mass:9092, harmful-jar:9092, exemplary-birds:9092, voluminous-mass:9092
[ESC[36mdebugESC[0m] application - Getting partition log sizes for topic fileread from partition leaders voluminous-mass:9092, harmful-jar:9092, exemplary-birds:9092, voluminous-mass:9092, harmful-jar:9092, exemplary-birds:9092, voluminous-mass:9092, harmful-jar:9092
[ESC[33mwarnESC[0m] application - Could not connect to partition leader exemplary-birds:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader harmful-jar:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader voluminous-mass:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader exemplary-birds:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader harmful-jar:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader voluminous-mass:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader exemplary-birds:9092. Error message: Failed to open a socket.
[ESC[33mwarnESC[0m] application - Could not connect to partition leader harmful-jar:9092. Error message: Failed to open a socket.
[ESC[36mdebugESC[0m] application - Getting partition offsets for topic PofApiTest-2
Then this leads to time_wait on the monitoring box to the production server:
1 tcp6 0 0 10.100.68.48:35050 10.100.98.100:9092 TIME_WAIT
1 tcp6 0 0 10.100.68.48:35051 10.100.98.100:9092 TIME_WAIT
1 tcp6 0 0 10.100.68.48:35055 10.100.98.100:9092 TIME_WAIT
1 tcp6 0 0 10.100.68.48:35057 10.100.98.100:9092 TIME_WAIT
1 tcp6 0 0 10.100.68.48:35064 10.100.98.100:9092 TIME_WAIT
1 tcp6 0 0 10.100.68.48:35065 10.100.98.100:9092 TIME_WAIT
1 tcp6 0 0 10.100.68.48:35066 10.100.98.100:9092 TIME_WAIT
1 tcp6 0 0 10.100.68.48:35073 10.100.98.100:9092 TIME_WAIT
1 tcp6 0 0 10.100.68.48:35074 10.100.98.100:9092 TIME_WAIT
1 tcp6 0 0 10.100.68.48:35075 10.100.98.100:9092 TIME_WAIT
1 tcp6 0 0 10.100.68.48:35085 10.100.98.100:9092 TIME_WAIT
1 tcp6 0 0 10.100.68.48:35088 10.100.98.100:9092 TIME_WAIT
1 tcp6 0 0 10.100.68.48:35100 10.100.98.100:9092 TIME_WAIT
1 tcp6 0 0 10.100.68.48:35103 10.100.98.100:9092 TIME_WAIT
But that only lasts for about 30s to 1minute then supervisord seems to restart webconsole after these time_waits go way or the sockets and files are properly closed or they get flushed from either play/webconsole or kafka.
The text was updated successfully, but these errors were encountered: