Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IMAPProtocol: Problem with searching emails when the subject contains umlauts #104

Open
vitaliiavdiienko opened this issue Jul 31, 2023 · 2 comments · May be fixed by #131
Open

IMAPProtocol: Problem with searching emails when the subject contains umlauts #104

vitaliiavdiienko opened this issue Jul 31, 2023 · 2 comments · May be fixed by #131
Labels
bug Something isn't working

Comments

@vitaliiavdiienko
Copy link

Some Background
We are searching in the IMAP-Server for specific Emails based on their Subject. We noticed in one of our tests that the search with umlauts (ü or ä) in the subject is not performant as it should be. It takes 30 Minuten to find 1 Email in the Inbox with > 6000 Emails.

We investigated the problem and found, that in this case IMAP-Server throws an Exception and the Library falls down to the default implementation and loads all Emails.

Details
We debugged the code and found the root-cause of the error.

Method search in the IMAPProtocol class https://github.com/eclipse-ee4j/angus-mail/blob/master/providers/imap/src/main/java/org/eclipse/angus/mail/imap/protocol/IMAPProtocol.java#L2494 has the following code:

// Check if the search "text" terms contain only ASCII chars,
        // or if utf8 support has been enabled (in which case CHARSET
        // is not allowed; see RFC 6855, section 3, last paragraph)
        if (supportsUtf8() || SearchSequence.isAscii(term)) {
            try {
                return issueSearch(msgSequence, term, null);
            } catch (IOException ioex) { /* will not happen */ }
        }

Out IMAP-Server Supports UTF-8 and the code correctly calls issueSearch with no Charset. So far so good
The problem occurs in the issueSearch itself on line 2552 https://github.com/eclipse-ee4j/angus-mail/blob/master/providers/imap/src/main/java/org/eclipse/angus/mail/imap/protocol/IMAPProtocol.java#L2552

Here all SearchTerms will be converted to the Argument

// Generate a search-sequence with the given charset
        Argument args = getSearchSequence().generateSequence(term,
                charset == null ? null :
                        MimeUtility.javaCharset(charset)

In our case the charset is NULL and then the subject from the SearchTerm will be converted as follows:

public Argument writeString(String s, String charset) throws UnsupportedEncodingException {
        if (charset == null) {
            this.writeString(s);
        } else {
            this.items.add(new AString(s.getBytes(charset)));
        }

        return this;
    }

at the end ASCIIUtility.getBytes(s) will be called and it uses a default OS-Charset (on Windows it is not UTF-8) and at this point of time all umlaut have a wrong representation in the byte-array, which will be sent to the IMAP-Server.

We strongly believe that there should be a possibility to specify the Encoding for converting SearchTerms independently. Or maybe you can find more elegant solution.

Thanks in advance.

@jmehrens
Copy link
Contributor

jmehrens commented Aug 1, 2023

I wonder if the bug is here:

// Generate a search-sequence with the given charset
        Argument args = getSearchSequence().generateSequence(term,
                charset == null ? null :
                        MimeUtility.javaCharset(charset)

When the charset is null instead of unconditionally passing null it should pass "UTF-8" when supportsUtf8() is true otherwise null

@jmehrens
Copy link
Contributor

Looks like my suggestion is the same as: jakartaee/mail-api#474

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants