-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Library throws "URI malformed" error when creating patch with emojis #22
Comments
@laurent22 I'm running into the same problem, did you manage to get around this? |
We simply don't use patch_toText anymore because it's pretty much broken. Instead we use patch_make and serialise to JSON, which for our purpose is good enough (because we don't need the patch in proper diff format). https://github.com/laurent22/joplin/blob/dev/packages/lib/models/Revision.ts |
@laurent22 Thank you so much for such quick reply! What do you mean by it's pretty much broken? Do you mean just this issue, or other issues, too? Unfortunately we don't use |
I mean that in our particular case this function is unusable because a lot of random strings are thrown at it, so many users sooner or later will hit this escapeUri bug. But patch_make as a replacement is fine, it doesn't crash. If you figure out a fix please let us know! I looked at the code at some point but couldn't fix it. |
I got somewhere. Take it with a grain of salt though, as I might have easily missed something :) The problem appears when multi-character unicode emojis get broken up into separate diffs inside a patch: 'π'.length // 2
'π'.charAt(0) // \ud83d
'π'.charAt(1) // \udc9b
encodeURI("\ud83d") // malformed uri error
encodeURI("\udc9b") // malformed uri error
encodeURI("\uD83D\udc9b") // '%F0%9F%92%9B' This mostly happens with internally generated prefixes/suffixes, for example inside It seems to me that the problem can be solved by replacing Before I tried this, I also tinkered with the source code for quite a while, and I did seemingly manage to prevent such emoji splitting - in |
@laurent22 Ok, I've come a long way since my last post. I found out much work was devoted to this problem in the official Google's repo (this repo is just an independently maintained npm package for it, which I didn't know). Particulary these three issues: #59 #69 #80.
This does break some other use-cases after all, as noticed in this issue for a related library which tried this exact fix (not sure if this can happen outside of Chineese or similar languages, though).
My previous approach was way off. But I adapted dmsnell's approach from issue #80 to the |
Thanks for looking into it @michal-kurz. It seems unlikely that any change will be merged to the official repository (the PR is from 2019) - do you know if there's a good fork being maintained somewhere where that kind of fix could be applied? |
@michal-kurz I've recently added my fix to
the problem here is when we're generating the context for the patch objects, we might go ahead and split surrogates again. I would be very interested in simply using my branch instead of What do you think of this? by the way I'm planning on cleaning up my branch now too, particularly with the tests, and merge it into my fork's either way, if we could fix this here, it would be helpful to |
cc: @JackuB ^^^ |
The following code:
Will throw an error "URI Malformed" at this line. That's often the problem when using encodeURI on arbitrary data (the md5 package has the same problem) but in that case as far as I can see the inputs are valid UTF-8.
I think either
patch_make
orpatch_apply
generates invalid text.But also I'm wondering why is encodeURI needed in this lib? Wouldn't a simple escape/unescape of specific reserved characters be enough?
The text was updated successfully, but these errors were encountered: