Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

db lock up (after db corruption?) #83

Open
ygrek opened this issue Jan 4, 2021 · 5 comments
Open

db lock up (after db corruption?) #83

ygrek opened this issue Jan 4, 2021 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@ygrek
Copy link
Member

ygrek commented Jan 4, 2021

db log :

2020-07-08 16:28:16 Unmarshalling: Keys: 100 keys
2020-07-08 16:28:16 Marshalling: Ack: 0
2020-07-08 16:28:17 Keydb.key_to_merge_updates: error in key 009AF5196EE84E5F0A7F6E53038B1610: Stack overflow
2020-07-08 16:28:17 Applying 121 changes
2020-07-08 16:28:17 Fatal database error: Bdb.DBError("BDB0620 DB_DBT_READONLY should not be set on data DBT.")
2020-07-08 16:28:17 Key addition failed: Stdlib.Sys.Break
2020-07-08 16:28:18 Unmarshalling: LogQuery: (5000,1592987790.081733)
2020-07-08 16:28:18 Marshalling: LogResp: 0 events
2020-07-08 16:28:18 Unmarshalling: Keys: 100 keys
2020-07-08 16:28:18 Marshalling: Ack: 0
2020-07-08 16:28:18 Applying 118 changes
2020-07-08 16:28:18 Fatal database error: Bdb.DBError("BDB0620 DB_DBT_READONLY should not be set on data DBT.")
2020-07-08 16:28:18 Key addition failed: Stdlib.Sys.Break
2020-07-08 16:28:18 Unmarshalling: LogQuery: (5000,1592987790.081733)
2020-07-08 16:28:18 Marshalling: LogResp: 0 events
2020-07-08 16:28:19 Unmarshalling: Keys: 100 keys
2020-07-08 16:28:19 Marshalling: Ack: 0
2020-07-08 16:28:19 Fatal database error: Bdb.DBError("BDB0110 Log sequence error: page LSN 1097 4989775; previous LSN 1097 2332942")
2020-07-08 16:28:19 add_keys_merge failed: Stdlib.Sys.Break
2020-07-08 16:28:19 Key addition failed: Stdlib.Sys.Break
2020-07-08 16:28:20 Unmarshalling: LogQuery: (5000,1592987790.081733)
2020-07-08 16:28:20 Marshalling: LogResp: 0 events

db backtrace :

(gdb) bt
#0  0x00007f6115cca0df in pthread_rwlock_wrlock () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007f61162fa6b3 in __db_pthread_mutex_lock () from /lib/x86_64-linux-gnu/libdb-5.3.so
#2  0x00007f6116415ca3 in __memp_dirty () from /lib/x86_64-linux-gnu/libdb-5.3.so
#3  0x00007f61163dc389 in __db_pg_alloc_recover () from /lib/x86_64-linux-gnu/libdb-5.3.so
#4  0x00007f61163c43ef in __db_dispatch () from /lib/x86_64-linux-gnu/libdb-5.3.so
#5  0x00007f611642b78b in ?? () from /lib/x86_64-linux-gnu/libdb-5.3.so
#6  0x00007f611642d0dc in __txn_abort () from /lib/x86_64-linux-gnu/libdb-5.3.so
#7  0x00007f611642d238 in ?? () from /lib/x86_64-linux-gnu/libdb-5.3.so
#8  0x000055fba402841a in sweep_slice (work=952283, work@entry=17753214) at major_gc.c:567
#9  0x000055fba4029161 in caml_major_collection_slice (howmuch=howmuch@entry=-1) at major_gc.c:802
#10 0x000055fba402a66a in caml_gc_dispatch () at minor_gc.c:471
#11 0x000055fba402a707 in caml_check_urgent_gc (extra_root=<optimized out>) at minor_gc.c:490
#12 0x000055fba402bb85 in caml_alloc_string (len=19809) at alloc.c:105
#13 0x000055fba3f77993 in camlChannel__fun_1803 () at channel.ml:53
#14 0x000055fba3f45d59 in camlDbMessages__unmarshal_key_760 () at dbMessages.ml:111
#15 0x000055fba3fc0b85 in camlStdlib__array__init_101 () at array.ml:52
#16 0x000055fba3f463d7 in camlDbMessages__unmarshal_msg_832 () at cMarshal.ml:63
#17 0x000055fba3f63e09 in camlMsgContainer__unmarshal_472 () at msgContainer.ml:56
#18 0x000055fba3f2a3e1 in camlDbserver__command_handler_1493 ()
#19 0x000055fba3f2c02a in camlDbserver__fun_3204 ()
#20 0x000055fba3f78c53 in camlCommon__protect_965 () at /tmp/ocamlppaf03f7:171
#21 0x000055fba3f78c53 in camlCommon__protect_965 () at /tmp/ocamlppaf03f7:171
#22 0x000055fba3f74d6b in camlEventloop__do_timed_callback_600 () at eventloop.ml:107
#23 0x000055fba3f75743 in camlEventloop__do_current_events_715 () at eventloop.ml:218
#24 0x000055fba3f7594d in camlEventloop__do_next_event_728 () at eventloop.ml:240
#25 0x000055fba3f75a48 in camlEventloop__evloop_778 () at eventloop.ml:255
#26 0x000055fba3f78c53 in camlCommon__protect_965 () at /tmp/ocamlppaf03f7:171
#27 0x000055fba3f20f6d in camlSks__entry ()
#28 0x000055fba3f1d8a9 in caml_program ()
#29 0x000055fba4041564 in caml_start_program ()
#30 0x000055fba404191d in caml_startup_common (argv=0x7ffcd2abb758, pooling=<optimized out>, pooling@entry=0) at startup_nat.c:160
#31 0x000055fba404196b in caml_startup_exn (argv=<optimized out>) at startup_nat.c:170
#32 caml_startup (argv=<optimized out>) at startup_nat.c:170
#33 0x000055fba3f1cf4c in main (argc=<optimized out>, argv=<optimized out>) at main.c:44
@ygrek ygrek added the bug Something isn't working label Jan 4, 2021
@ygrek ygrek self-assigned this Jan 4, 2021
@kmkaplan
Copy link

kmkaplan commented Feb 4, 2021

Isn't this a duplicate of issue #82?

@ygrek
Copy link
Member Author

ygrek commented Feb 4, 2021

I think DB_DBT_READONLY in above log is not responsible for lockup- it must be a problem in bindings to C code.

@kmkaplan
Copy link

kmkaplan commented Feb 4, 2021

I think it is: when Bdb.DBError is raised, the running transaction is neither rolled back nor committed. Other transactions then have to wait until the stale transaction that raised the error is resolved. And that will never happen.
I suggest rolling back the DB_DBT_READONLY commit until this gets resolved.
The next Debian release risk being plagued by this bug.

@ygrek
Copy link
Member Author

ygrek commented Feb 4, 2021

ok, removed DB_DBT_REDAONLY usage and keeping this issue open as a reminder about txn errors handling

@kmkaplan
Copy link

kmkaplan commented Feb 4, 2021

Please have a look at my last comment in issue #82. I think I have a solution for the bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants