Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segmentation fault from rdkafka_topic.c #4907

Open
3 of 7 tasks
ojktx opened this issue Nov 19, 2024 · 2 comments
Open
3 of 7 tasks

segmentation fault from rdkafka_topic.c #4907

ojktx opened this issue Nov 19, 2024 · 2 comments

Comments

@ojktx
Copy link

ojktx commented Nov 19, 2024

Read the FAQ first: https://github.com/confluentinc/librdkafka/wiki/FAQ

Do NOT create issues for questions, use the discussion forum: https://github.com/confluentinc/librdkafka/discussions

Description

Hello!

I am reporting a segmentation fault.
I will explain briefly because the symptoms and code are clear.
In the rd_kafka_topic_metadata_update() function of rdkafka_topic.c, a NULL reference exception occurs on a variable rktp.

In v2.6.0, the rktp variable is referenced at line 1390.

After analyzing the code for a short time, I found that a null check was missing.

In simple terms,
if (unlikely(!rktp)) {
rd_kafka_dbg(~~~);
return;
}
This code was missing.

I found this problem when I tried to test the problem when the broker was restarted repeatedly.

It is not always the case, but about once in dozens of times, the rktp pointer becomes NULL.
However, since I cannot write an issue at work, I am writing it simply at home without a call stack and screenshots.

Since my company uses librdkafka on at least a thousand servers, I need to fix the issue before I can upgrade the version.

I hope for a quick fix. Thanks.

How to reproduce

With librdkafka producer running,

repeat start-stop of brokers.

An issue occurred in 2.6.0.

Checklist

Please provide the following information:

  • librdkafka version (release number or git tag): v2.6.0
  • Apache Kafka version: 3.6.0
  • librdkafka client configuration: <REPLACE with e.g., message.timeout.ms=123, auto.reset.offset=earliest, ..>
  • Operating system: win10 , ubuntu 22.04
  • Provide logs (with debug=.. as necessary) from librdkafka
  • Provide broker log excerpts
  • Critical issue
@thmic
Copy link

thmic commented Nov 29, 2024

I encountered the same issue. When I stop the Kafka service, delete all Kafka logs, and then restart the service, many applications using Kafka generate core dumps. When debugging the core files, I found that rktp is 0x0.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00000000004a6efb in rd_kafka_topic_metadata_update (rkt=rkt@entry=0x14e474004650, mdt=mdt@entry=0x14e4740041e8, mdit=mdit@entry=0x14e474004208,
ts_age=) at rdkafka_topic.c:1384
1384 rdkafka_topic.c: No such file or directory.
[Current thread is 1 (Thread 0x14e47dd88640 (LWP 50313))]
(gdb) bt
#0 0x00000000004a6efb in rd_kafka_topic_metadata_update (rkt=rkt@entry=0x14e474004650, mdt=mdt@entry=0x14e4740041e8, mdit=mdit@entry=0x14e474004208,
ts_age=) at rdkafka_topic.c:1384
#1 0x00000000004a7ddd in rd_kafka_topic_metadata_update2 (rkb=rkb@entry=0x24272a0, mdt=mdt@entry=0x14e4740041e8, mdit=mdit@entry=0x14e474004208)
at rdkafka_topic.c:1471
#2 0x0000000000539bb4 in rd_kafka_parse_Metadata_update_topic (mdit=, mdt=0x14e4740041e8, rkb=0x24272a0) at rdkafka_metadata.c:379
#3 rd_kafka_parse_Metadata0 (rkb=rkb@entry=0x24272a0, request=request@entry=0x14e47400b140, rkbuf=rkbuf@entry=0x14e4680048a0, mdip=mdip@entry=0x14e47dd83ed8,
request_topics=request_topics@entry=0x0, reason=) at rdkafka_metadata.c:839
#4 0x000000000053efef in rd_kafka_parse_Metadata (rkb=rkb@entry=0x24272a0, request=request@entry=0x14e47400b140, rkbuf=rkbuf@entry=0x14e4680048a0,
mdip=mdip@entry=0x14e47dd83ed8) at rdkafka_metadata.c:1111
#5 0x00000000004cd9b7 in rd_kafka_handle_Metadata (rk=, rkb=0x24272a0, err=, rkbuf=0x14e4680048a0, request=0x14e47400b140,
opaque=0x0) at rdkafka_request.c:2490
#6 0x00000000004b937c in rd_kafka_buf_callback (rk=0x241b280, rkb=0x24272a0, err=RD_KAFKA_RESP_ERR_NO_ERROR, response=0x14e4680048a0, request=0x14e47400b140)
at rdkafka_buf.c:509
#7 0x00000000004c96e3 in rd_kafka_op_handle_std (rk=, rkq=, rko=, cb_type=) at rdkafka_op.c:875
#8 0x00000000004c9778 in rd_kafka_op_handle (rk=0x241b280, rkq=0x14e47dd84130, rko=0x14e4680045a0, cb_type=RD_KAFKA_Q_CB_CALLBACK, opaque=0x241b280,
callback=0x484840 <rd_kafka_poll_cb>) at rdkafka_op.c:915
#9 0x00000000004be324 in rd_kafka_q_serve (rkq=0x241c4c0, timeout_ms=19, max_cnt=max_cnt@entry=0, cb_type=cb_type@entry=RD_KAFKA_Q_CB_CALLBACK,
callback=callback@entry=0x0, opaque=opaque@entry=0x0) at rdkafka_queue.c:578
#10 0x000000000048d08b in rd_kafka_thread_main (arg=0x241b280) at rdkafka.c:2136
#11 0x000014e4a6ac80f1 in ?? () from /usr/lib64/libc.so.6
#12 0x000014e4a6b4acf0 in ?? () from /usr/lib64/libc.so.6
(gdb) p * rktp
Cannot access memory at address 0x0

@marioluzi
Copy link

Hello ,
similar issue for us with below stackdump (It happens when nodes in Kafka stop down):

Core was generated by `/opt/anritsu/mclaw/installed/eoxdr-probehandler/lib/eoxdr-probehandler'.
Program terminated with signal 11, Segmentation fault.
#0 0x00007f242664a4fb in rd_kafka_topic_metadata_update (rkt=rkt@entry=0x7f23c8004a50, mdt=mdt@entry=0x7f234c007038, mdit=mdit@entry=0x7f234c007058, ts_age=) at rdkafka_topic.c:1390
1390 rdkafka_topic.c: No such file or directory.
#0 0x00007f242664a4fb in rd_kafka_topic_metadata_update (rkt=rkt@entry=0x7f23c8004a50, mdt=mdt@entry=0x7f234c007038, mdit=mdit@entry=0x7f234c007058, ts_age=) at rdkafka_topic.c:1390
#1 0x00007f242664b46d in rd_kafka_topic_metadata_update2 (rkb=rkb@entry=0x7f24106cbe80, mdt=mdt@entry=0x7f234c007038, mdit=mdit@entry=0x7f234c007058) at rdtime.h:106
#2 0x00007f24266df209 in rd_kafka_parse_Metadata_update_topic (mdit=0x7f234c007058, mdt=0x7f234c007038, rkb=0x7f24106cbe80) at rdkafka_metadata.c:379
#3 rd_kafka_parse_Metadata0 (rkb=rkb@entry=0x7f24106cbe80, request=request@entry=0x7f2334002080, rkbuf=rkbuf@entry=0x7f2334001710, mdip=mdip@entry=0x7f2243fda798, request_topics=request_topics@entry=0x0, reason=0x7f2334001ba0 "connected") at rdkafka_metadata.c:839
#4 0x00007f24266e351f in rd_kafka_parse_Metadata (rkb=rkb@entry=0x7f24106cbe80, request=request@entry=0x7f2334002080, rkbuf=rkbuf@entry=0x7f2334001710, mdip=mdip@entry=0x7f2243fda798) at rdkafka_metadata.c:1111
#5 0x00007f2426662dc0 in rd_kafka_handle_Metadata (rk=, rkb=0x7f24106cbe80, err=, rkbuf=0x7f2334001710, request=0x7f2334002080, opaque=0x0) at rdkafka_request.c:2546
#6 0x00007f2426659904 in rd_kafka_buf_callback (rk=0x7f241069e710, rkb=0x7f24106cbe80, err=RD_KAFKA_RESP_ERR_NO_ERROR, response=0x7f2334001710, request=0x7f2334002080) at rdkafka_buf.c:509
#7 0x00007f24266600d0 in rd_kafka_op_handle_std (rk=, rkq=, rko=, cb_type=) at rdkafka_op.c:905
#8 0x00007f2426660178 in rd_kafka_op_handle (rk=0x7f241069e710, rkq=0x7f2243fda9d0, rko=0x7f2334001350, cb_type=RD_KAFKA_Q_CB_CALLBACK, opaque=0x7f241069e710, callback=0x7f2426628b20 <rd_kafka_poll_cb>) at rdkafka_op.c:945
#9 0x00007f242665cc31 in rd_kafka_q_serve (rkq=0x7f241065a460, timeout_ms=, max_cnt=max_cnt@entry=0, cb_type=cb_type@entry=RD_KAFKA_Q_CB_CALLBACK, callback=callback@entry=0x0, opaque=opaque@entry=0x0) at rdkafka_queue.c:578
#10 0x00007f242662bd4d in rd_kafka_thread_main (arg=arg@entry=0x7f241069e710) at rdkafka.c:2143
#11 0x00007f24266d5eb7 in _thrd_wrapper_function (aArg=) at tinycthread.c:576
#12 0x00007f2426cd5ea5 in start_thread () from /lib64/libpthread.so.0
#13 0x00007f24220cbb0d in clone () from /lib64/libc.so.6

is there any workaround to apply ? or we have to simply wait the fix ??

we tried also using last version 2.6.1 and it is the same.

please let us know that we have huge impact on customer environments

thank you .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants