IMAPProtocol: Problem with searching emails when the subject contains umlauts #104

vitaliiavdiienko · 2023-07-31T14:46:10Z

Some Background
We are searching in the IMAP-Server for specific Emails based on their Subject. We noticed in one of our tests that the search with umlauts (ü or ä) in the subject is not performant as it should be. It takes 30 Minuten to find 1 Email in the Inbox with > 6000 Emails.

We investigated the problem and found, that in this case IMAP-Server throws an Exception and the Library falls down to the default implementation and loads all Emails.

Details
We debugged the code and found the root-cause of the error.

Method search in the IMAPProtocol class https://github.com/eclipse-ee4j/angus-mail/blob/master/providers/imap/src/main/java/org/eclipse/angus/mail/imap/protocol/IMAPProtocol.java#L2494 has the following code:

// Check if the search "text" terms contain only ASCII chars,
        // or if utf8 support has been enabled (in which case CHARSET
        // is not allowed; see RFC 6855, section 3, last paragraph)
        if (supportsUtf8() || SearchSequence.isAscii(term)) {
            try {
                return issueSearch(msgSequence, term, null);
            } catch (IOException ioex) { /* will not happen */ }
        }

Out IMAP-Server Supports UTF-8 and the code correctly calls issueSearch with no Charset. So far so good
The problem occurs in the issueSearch itself on line 2552 https://github.com/eclipse-ee4j/angus-mail/blob/master/providers/imap/src/main/java/org/eclipse/angus/mail/imap/protocol/IMAPProtocol.java#L2552

Here all SearchTerms will be converted to the Argument

// Generate a search-sequence with the given charset
        Argument args = getSearchSequence().generateSequence(term,
                charset == null ? null :
                        MimeUtility.javaCharset(charset)

In our case the charset is NULL and then the subject from the SearchTerm will be converted as follows:

public Argument writeString(String s, String charset) throws UnsupportedEncodingException {
        if (charset == null) {
            this.writeString(s);
        } else {
            this.items.add(new AString(s.getBytes(charset)));
        }

        return this;
    }

at the end ASCIIUtility.getBytes(s) will be called and it uses a default OS-Charset (on Windows it is not UTF-8) and at this point of time all umlaut have a wrong representation in the byte-array, which will be sent to the IMAP-Server.

We strongly believe that there should be a possibility to specify the Encoding for converting SearchTerms independently. Or maybe you can find more elegant solution.

Thanks in advance.

The text was updated successfully, but these errors were encountered:

jmehrens · 2023-08-01T15:43:04Z

I wonder if the bug is here:

// Generate a search-sequence with the given charset
        Argument args = getSearchSequence().generateSequence(term,
                charset == null ? null :
                        MimeUtility.javaCharset(charset)

When the charset is null instead of unconditionally passing null it should pass "UTF-8" when supportsUtf8() is true otherwise null

jmehrens · 2023-08-23T02:32:33Z

Looks like my suggestion is the same as: jakartaee/mail-api#474

jmehrens added the bug Something isn't working label Jan 22, 2024

jmehrens linked a pull request Feb 15, 2024 that will close this issue

Wrong charset used in creating search terms when server supports UTF8 #131

Draft

jmehrens linked a pull request Feb 16, 2024 that will close this issue

Wrong charset used in creating search terms when server supports UTF8 #131

Draft

jmehrens mentioned this issue Apr 12, 2024

office365 search fails with NO [BADCHARSET (US-ASCII)] #145

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IMAPProtocol: Problem with searching emails when the subject contains umlauts #104

IMAPProtocol: Problem with searching emails when the subject contains umlauts #104

vitaliiavdiienko commented Jul 31, 2023

jmehrens commented Aug 1, 2023

jmehrens commented Aug 23, 2023

IMAPProtocol: Problem with searching emails when the subject contains umlauts #104

IMAPProtocol: Problem with searching emails when the subject contains umlauts #104

Comments

vitaliiavdiienko commented Jul 31, 2023

jmehrens commented Aug 1, 2023

jmehrens commented Aug 23, 2023