-
-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix error when changing force write mode #8146
base: v5.0-release
Are you sure you want to change the base?
Fix error when changing force write mode #8146
Conversation
…thread won't close it while we're using it. For Classic Server you can change forced write mode only with no connections.
Why ? |
@@ -968,3 +968,4 @@ FB_IMPL_MSG(JRD, 965, ods_upgrade_err, -901, "HY", "000", "ODS upgrade failed wh | |||
FB_IMPL_MSG(JRD, 966, bad_par_workers, -924, "HY", "000", "Wrong parallel workers value @1, valid range are from 1 to @2") | |||
FB_IMPL_MSG(JRD, 967, idx_expr_not_found, -902, "42", "000", "Definition of index expression is not found for index @1") | |||
FB_IMPL_MSG(JRD, 968, idx_cond_not_found, -902, "42", "000", "Definition of index condition is not found for index @1") | |||
FB_IMPL_MSG(JRD, 969, forced_write_change_err, -902, "42", "000", "Cannot change forced write mode for classic server while it has connections") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error code is already used in master
@@ -751,13 +764,13 @@ bool PIO_read(thread_db* tdbb, jrd_file* file, BufferDesc* bdb, Ods::pag* page, | |||
SINT64 bytes; | |||
FB_UINT64 offset; | |||
|
|||
EngineCheckout cout(tdbb, FB_FUNCTION, EngineCheckout::UNNECESSARY); | |||
ReadLockGuard readGuard(file->fil_desc_lock, FB_FUNCTION); | |||
|
|||
if (file->fil_desc == -1) | |||
return unix_error("read", file, isc_io_read_err, status_vector); | |||
|
|||
Database* const dbb = tdbb->getDatabase(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be before EngineCheckout ? At least from logical POV
@@ -803,13 +816,13 @@ bool PIO_write(thread_db* tdbb, jrd_file* file, BufferDesc* bdb, Ods::pag* page, | |||
SINT64 bytes; | |||
FB_UINT64 offset; | |||
|
|||
EngineCheckout cout(tdbb, FB_FUNCTION, EngineCheckout::UNNECESSARY); | |||
ReadLockGuard readGuard(file->fil_desc_lock, FB_FUNCTION); | |||
|
|||
if (file->fil_desc == -1) | |||
return unix_error("write", file, isc_io_write_err, status_vector); | |||
|
|||
Database* const dbb = tdbb->getDatabase(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question as in PIO_read above
@@ -858,6 +871,8 @@ static jrd_file* seek_file(jrd_file* file, BufferDesc* bdb, FB_UINT64* offset, | |||
break; | |||
} | |||
|
|||
ReadLockGuard readGuard(file->fil_desc_lock, FB_FUNCTION); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks as not necessary when seek_file()
is called from PIO_read()
and PIO_write()
- the most often cases.
The exception is when caller is PIO_init_data()
, so there should be some smart solution.
I tried to implement this via AST, looking at LCK_monitor as an example, but I couldn't figure it out where and when I should relock my lock after LCK_downgrade in AST. So I stop on restriction for classic server. |
My question was - why different CS processes must run with the same FW mode ?
|
I think because it's more expected, if someone changes FW mode, they will expect all connections to change FW mode too. This works with Super, but not with Classic.
Yes, these errors can only be reproduced on Super, I should have mentioned that in the PR description, added it now. |
The more I think about it the more I consider that solution is not fully correct and excessive, sorry. There is no (direct) harm when different CS processes use different FW mode. It is not perfect, I agree, but we lived with it many years with no complains. As for SS, where the real problem happens, I think it is not good to re-open database file with a (lot of) user connections - it could lead to a significant performance loss for a visible time, as OS flushes the file buffers when file is closing and it could take a time. Note, all user connections the require disk IO will wait for this process to finish. So, instead, I offer to save new FW flag value to the header page (all architectures) and re-open file only if there is no other connections (required for SS only, but could be discussed). This is similar to how change of page buffers number works, btw. |
I would expect that while file open mode changes are delayed in this case for SS, changes in page cache flush strategy would be immediate. Perhaps file re-opening also may be immediate if the change is done in sole connection. |
Could you show what you means by 'flush strategy', how it works now and how you would expect it to work ?
This is what I said above. |
I expect that when I set FW OFF, dirty pages won't be flushed immediately on commit for all connections. Because that's why I do it. |
My last comment is for SS. CS can do whatever it wants. |
When engine works with FW=ON ( When engine works with FW=OFF ( Now, imagine database with FW=ON and admin changed FW to OFF. File mode is not changed and OS still writes every page to disk immediately. You offer to clear In opposite case, when FW is changed from OFF to ON, engine will never call |
Which by default are -1, i.e. "never". |
Not on Windows. |
I guess the main point was when someone changes the FW mode, he usually wants to get the effect immediately. And there are 2 ways to do this: either force all user connections to close or reopen the file with relatively less time and performance penalty. The second one looks better from this point. Also, if we allow the actual FW mode change to be delayed then we cannot trust its status showed by gstat.
Similarity is good but changing the FW mode is more critical, I believe. |
What I don't like in the "delayed option" is it being unpredictable. DBA turns FW on and expects that crashes will not lead to corruptions after that. The same for CS - if some connections will still be using FW=OFF after that and crash and corrupt the database this is not what DBA expects after changing FW to ON. But they have no idea when the change will be actually applied. So to ensure that they need also to shutdown the database to force everybody to reconnect. This is hardly pleasant under user load. This is the difference between BUFFERS and FW -- one is just about performance but the other is about reliability which is more important, IMO. |
Same for SC.
Document it. Issue warning.
Explain, please - why reasonable DBA could think about changing FW under real load ever ? PS this is again flame about nothing. The original intent was to fix errors on SS when FW is changing under load. |
So, we stop at the realization proposed by @hvlad? |
Why to change FW at all? Loading, indexing, or even just because someone wants. At least untill we don't require EX lock on DB for this. |
When we try to change force write mode on database that has connections, we can get various types of errors or asserts from PIO_ functions (Super Server mode only). These errors can be reproduced using TPCC load and changing force write mode.
This patch fixes cases where someone tries to change force write mode when database has connections. This is done via RWLock for file descriptor, so we can be sure that some other thread won't close this descriptor while we're using it.
For Classic Server we can change force write mode only with no connections.