Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mailbox locking refcount #4731

Merged
merged 38 commits into from
Feb 12, 2024
Merged

Mailbox locking refcount #4731

merged 38 commits into from
Feb 12, 2024

Conversation

brong
Copy link
Member

@brong brong commented Nov 8, 2023

This brings back the mboxlist_changesub modseq bump but also rewrites mailbox refcounting so it works, and ALSO replaces the JMAP mailbox cache as part of that change.

There are probably more places we can just open and close rather than having to track the index mailbox too.

@brong brong requested a review from rsto November 8, 2023 13:25
Copy link
Member

@rsto rsto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just have a few nits that might be nice to address, but no hard requirement. I'll wait for CI to finish to approve.

imap/jmap_api.c Outdated Show resolved Hide resolved
imap/jmap_api.c Outdated Show resolved Hide resolved
imap/mboxlist.c Show resolved Hide resolved
@brong brong force-pushed the mailbox-locking-refcount branch from ad71897 to fb57b4e Compare November 8, 2023 15:30
@brong brong requested a review from rsto November 8, 2023 15:49
@brong
Copy link
Member Author

brong commented Nov 8, 2023

Damn tests pass on my box :(

@rsto
Copy link
Member

rsto commented Nov 9, 2023

These tests are failing on my local build

 [FAILED] Cyrus::ImapTest.notify
 [FAILED] Cyrus::ImapTest.urlauth2
 [FAILED] Cyrus::ImapTest.urlauth-binary
 [FAILED] Cyrus::Info.conf
 [FAILED] Cyrus::JMAPCalendars.calendarsharenotification_changes
 [FAILED] Cyrus::JMAPCalendars.calendarsharenotification_query
 [FAILED] Cyrus::Rename.rename_inbox
 [FAILED] Cyrus::Rename.rename_inbox_unselected
 [FAILED] Cyrus::Rename.rename_bigconversation
 [FAILED] Cyrus::Rename.rename_user_bigconversation

I only glimpsed at JMAPCalendars.calendarsharenotification_changes and it seems like where creating a share notification and deleting one each bumped the modseq on the notifications mailbox, the create now silently does not bump the modseq. Haven't dug deeper.

@elliefm
Copy link
Contributor

elliefm commented Nov 9, 2023

@brong If the tests pass for you locally, they might be dependent on some local config you have but no-one else does. What's in the [config] section of your cassandane.ini? The only thing that must be in there these days is zoneinfo_dir. If you have altnamespace or unixhierarchysep entries left over from the old days, remove them. If you have other stuff in there that the tests are relying on, they should probably be configured in the test itself, not in your cassandane.ini

@elliefm
Copy link
Contributor

elliefm commented Nov 14, 2023

This feels like a pretty high-risk PR, both due to the extent of the changes and the mysteries exposed in testing so far.

Even once this passes CI, it seems likely that other edge cases will arise that we don't yet have adequate test coverage for. So I think this should be "DO NOT MERGE" and "include in fastmail" for a while before it lands on master, to give us time to use it for real and shake them out before we start calling it release quality. Especially with it being only about 6-7 weeks until we fork the new 3.10 stable branch!

So I'm tagging this DO NOT MERGE for now, and if it's ready for our builds this year we can tag it "include in fastmail". It can land on master next year, after the 3.10 branch has forked.

@brong brong force-pushed the mailbox-locking-refcount branch from 168f990 to 81fe9ed Compare November 15, 2023 00:37
@brong
Copy link
Member Author

brong commented Nov 15, 2023

Thanks @elliefm, that seems reasonable! I'm hoping this time it will actually pass. It's shaken out some bugs in assumptions in other code :) But yeah, complex changes for sure.

Copy link
Contributor

@elliefm elliefm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit; also something odd going on in reconstruct, looks like the mailbox is neither committed nor closed after mailbox_setversion() succeeds?

imap/mailbox.c Outdated Show resolved Hide resolved
imap/reconstruct.c Show resolved Hide resolved
@brong brong force-pushed the mailbox-locking-refcount branch from 81fe9ed to eec344d Compare November 17, 2023 09:26
@brong
Copy link
Member Author

brong commented Nov 17, 2023

@ellie fixed the setversion issue and also added a test which exercises it!

@brong
Copy link
Member Author

brong commented Nov 17, 2023

@rsto did you want to update your review on this?

@elliefm
Copy link
Contributor

elliefm commented Nov 19, 2023

We gotta be more careful about pinging "elliefm" and "ksmurchison", rather than "ellie" and "Murch" who are nothing to do with this project ;)

This is still failing in CI, on a few JMAPContacts tests. The failures are interesting... one was like this:

1) JMAPContacts.contact_blobid
 expected 'BEGIN:VCARD', got '<!DOCTYPE H' at Cassandane/Cyrus/JMAPContacts.pm line 3498.

Another failure expected to receive one content-type but got a completely different one (sorry, I forgot which by the time I got to typing here)

And the other got back a 500 error response, when it was expecting a document that matched a pattern like "this should not crash the server"

It kinda smells like these tests are getting back completely different documents from what they expected. I wonder if there's something up with the changes to the jmap mailbox cache handling?

I've told the CI job to re-run, on the off-chance that it was some transient weirdness, though that seems unlikely... and tbh if it were a transient issue I think I'd be even more concerned. I'll pull the branch and run it locally too, once my current cass run finishes, and see if that sheds any new light.

cassandane/Cassandane/Cyrus/Reconstruct.pm Outdated Show resolved Hide resolved
imap/mailbox.c Outdated Show resolved Hide resolved
@elliefm
Copy link
Contributor

elliefm commented Nov 19, 2023

Here's the local failures:

1) JMAPContacts.contact_blobid
 expected 'BEGIN:VCARD', got '<!DOCTYPE H' at Cassandane/Cyrus/JMAPContacts.pm line 3498.
2) JMAPContacts.contact_apple_label_handling
 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
  <head>
    <title>500 Internal Server Error</title>
  </head>
  <body>
    <h1>Internal Server Error</h1>
    <p>Mailbox is locked</p>
    <hr>
    <address>Cyrus-HTTP/3.9.0-alpha0-1141-gcd87f6735 Cyrus-SASL/2.1.27 Lib/XML2.9.10 Jansson/2.13.1 OpenSSL/1.1.1n Nghttp2/1.43.0 Wslay/1.1.1 Zlib/1.2.11 Brotli/1.0.9 Zstd/1.4.8 Xapian/1.5.0 LibiCal/3.1 ICU4C/72.1 SQLite/3.34.1 Server at 127.0.0.1 Port 9112</address>
  </body>
</html>
 didn't match /(?^:X-ABLabel:this-should-not-crash-cyrus)/ at Cassandane/Cyrus/JMAPContacts.pm line 3740.
3) JMAPContacts.contact_copy
 <undef> unexpected at Cassandane/Cyrus/JMAPContacts.pm line 3292.
4) JMAPContacts.contact_set_avatar_from_deleted_contact
 expected 'image/jpeg', got 'text/html; charset=utf-8' at Cassandane/Cyrus/JMAPContacts.pm line 2471.

The restarted CI job also failed again, with the same four failures: https://github.com/cyrusimap/cyrus-imapd/actions/runs/6902252975/job/18830872467?pr=4731

@brong brong force-pushed the mailbox-locking-refcount branch from cd87f67 to f783196 Compare November 20, 2023 20:43
Copy link
Contributor

@elliefm elliefm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still one bad xsyslog call. But it passes CI now! :)

imap/http_dav_sharing.c Outdated Show resolved Hide resolved
@brong brong force-pushed the mailbox-locking-refcount branch from 085684b to 583ba6e Compare November 26, 2023 23:46
Copy link
Contributor

@elliefm elliefm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop/global changes from the latest two commits look good. But you haven't yet fixed the xsyslog typo (in http_dav_sharing) from my last review. It'll produce a gibberish log message if the error condition it's logging about ever occurs.

@brong brong force-pushed the mailbox-locking-refcount branch from 583ba6e to 52efd18 Compare November 28, 2023 03:49
@brong brong requested a review from elliefm November 28, 2023 03:57
@brong
Copy link
Member Author

brong commented Nov 28, 2023

There's a ton more changes in here now! I added namelock ordering checks and found many things. There's still issues with:

Search.esearch
JMAPCalendars.calendarevent_set_itip_preserve_partstat
and a couple more that also call the deliver_local codepath

(update to this comment for clarity) - these are tests which fail namelock ordering tests on the user locks - so they COULD cause a deadlock. In practice this is going to be super rare, so I left the code for user namelock ordering checks in, but commented it out, so all tests pass because they aren't checking for this condition.

The complex things to fix still are:

  • esearch across users (all subscribed mailboxes)
  • itip deliver_local (opens destination user while holding the source user's lock - and there may be multiple destination users in arbitrary order. The fix is almost certainly going to have to be asynchronicity of some sort and closing the source lock)

elliefm
elliefm previously requested changes Dec 6, 2023
imap/http_dav_sharing.c Outdated Show resolved Hide resolved
imap/mailbox.c Outdated Show resolved Hide resolved
brong and others added 26 commits February 12, 2024 09:51
This makes jmap_openmbox useless, we'll remove that next
This just serialises and unserialises, which is pretty inefficient but easy to write
This means they sort alphabetically the same as elsewhere
This requires opening conversations late as well, but
means our lock ordering is always valid
Co-authored-by: elliefm <[email protected]>
@brong brong force-pushed the mailbox-locking-refcount branch from 9a570fc to 9255220 Compare February 11, 2024 22:51
@brong brong merged commit 67f1d69 into master Feb 12, 2024
2 checks passed
@brong brong deleted the mailbox-locking-refcount branch February 12, 2024 00:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants