-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong charset used in creating search terms when server supports UTF8 #131
base: master
Are you sure you want to change the base?
Conversation
#474 Signed-off-by: jmehrens [email protected]
@lukasj @jbescos I'm trying to take a look at these high value tickets and get them fixed but I need a input on this one before I proceed. I updated test where I print the command string followed by the hex values of each char for some validation of this patch. Test code is in the PR. If I run the old code with the new test I see the following output:
Looks like a problem where UTF-8 was not used to encode/decode the bytes. Ok, fair enough. With the full patch proposed, which is very similar to jakartaee/mail-api#607 and #104 The output is:
Looks much better in that we are not seeing garbage characters. Great! However, RFC6855 states:
RFC3501 Section 4.3. String, states that a literal is:
[snip]
If I'm understand the RFC correctly matched with the output above, this current patch is generating a 8-bit characters in a quoted-string which is incorrect. Therefore will simply produce a similar issue to jakartaee/mail-api#526. Yep, that is 3 tickets I found so far, not including this one, surrounding the same core issue. If I'm on track then I think what I need to do is (fix my branch name) have the SearchSequence class query this.protocol.suportsUtf8() method and generate literals instead of quoted strings when needed. Otherwise, use old behavior. Is my analysis on track here or am I off? Does this look like a reasonable approach? |
Signed-off-by: jmehrens [email protected]
Signed-off-by: jmehrens [email protected]
#474 Signed-off-by: jmehrens [email protected]
I also understand it in the same way as you do, but it is probably a good idea to test it with an IMAP server to make sure it works. In case you need a local IMAP server you can use this one. |
#474 Signed-off-by: jmehrens [email protected]
Awesome! Thanks! This patch will take some time to get working correctly with regards to coding and testing |
Signed-off-by: jmehrens [email protected]
Signed-off-by: jmehrens [email protected]
@synim503 I pushed my last changes which are just an incomplete set of changes. Here is where this is at so you are aware of what you'll run into:
|
@jmehrens, I don't have a deep knowledge of programming. But according to a preliminary test this method works well for Russian. It finds letters well both by subject and body. The only thing that I have noticed so far is that the search problems occur with the |
@synim503 Thank you for trying out that build. When you tested with |
Signed-off-by: jmehrens [email protected]
Testing with
With I did another test were I've set
As shown in the output JakartaMail converts There is a possible enhancement where we could test for mostly ascii and combine imap search on an ascii prefix/suffix then client side filter on the full utf-8 text. |
#474
Signed-off-by: jmehrens [email protected]