-
-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
getVocabulary: Call scrub_html on individual items #288
Conversation
@reinhardt thanks for creating this Pull Request and helping to improve Plone! TL;DR: Finish pushing changes, pass all other checks, then paste a comment:
To ensure that these changes do not break other parts of Plone, the Plone test suite matrix needs to pass, but it takes 30-60 min. Other CI checks are usually much faster and the Plone Jenkins resources are limited, so when done pushing changes and all other checks pass either start all Jenkins PR jobs yourself, or simply add the comment above in this PR to start all the jobs automatically. Happy hacking! |
…ipt/html first Fixes JSONDecodeError when terms contain incomplete HTML
1401934
to
2524e28
Compare
@jenkins-plone-org please run jobs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a pretty good practical idea for improving the performance here.
Would you consider submitting this change to the SafeHtml transform in PortalTransforms instead? (https://github.com/plone/Products.PortalTransforms/blob/master/Products/PortalTransforms/transforms/safe_html.py) Then it would help any code which uses this transform, and not only the getVocabulary view. Also that would let us run the tests of the transform to make sure this optimization doesn't break any of the intended XSS protections.
There is now a similar check inside scrub_html, see plone/Products.PortalTransforms#66
Thanks for the suggestion! I've removed the condition here, so this PR now mainly reverts #287, plus adds a regression test for valid json. @jenkins-plone-org please run jobs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small suggestion, but this looks good.
] | ||
) | ||
with mock.patch.object(view, "get_vocabulary", return_value=vocab): | ||
json.loads(view()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be a better test if it made an assertion on the result to confirm that the html was scrubbed as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are other tests that do that. The point of this test is to check that the json can be parsed. A test like this was missing before. I would keep it simple and not do more checks in this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it should be called testJson
to make this clearer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this and for improving PortalTransforms.
In hindsight it was not a good idea to run scrubHtml
on the whole result, as this basically tries to create an lxml tree from <div>{some json}</div>
, which can't be good. What is surprising is that no tests were failing, and that my manual testing also did not show any problems.
I added two inline suggestions.
Co-authored-by: Maurits van Rees <[email protected]>
Co-authored-by: Maurits van Rees <[email protected]>
@jenkins-plone-org please run jobs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now, thanks.
I have released I have updated https://dist.plone.org/release/6.0-dev/ and https://dist.plone.org/release/6.1-dev/ |
… but check for script/html firstFixes JSONDecodeError when terms contain incomplete HTML
Unfortunately I did find a way to break #287, not in terms of security, but of invalid JSON. As the test demonstrates, terms with incomplete HTML result in broken JSON like this:
This PR reverts the approach of calling
scrub_html
on the end result,but adds a check for script or html on the individual items, which saves the time needed to invoke lxml if there's nothing to scrub out.EDIT: The check was moved to plone/Products.PortalTransforms#66