Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getVocabulary: Call scrub_html on individual items #288

Merged
merged 4 commits into from
Aug 8, 2024

Conversation

reinhardt
Copy link
Contributor

@reinhardt reinhardt commented Aug 2, 2024

… but check for script/html first

Fixes JSONDecodeError when terms contain incomplete HTML

Unfortunately I did find a way to break #287, not in terms of security, but of invalid JSON. As the test demonstrates, terms with incomplete HTML result in broken JSON like this:

{"results": [{"id": "term 0 <b>", "text": "term 0 <b>"}, {"id": "term 1 <b>", "text": "term 1 <b>"}, {"id": "term 2 <b>", "text": "term 2 <b>"}], "total": 3}</b></b></b></b></b></b>

This PR reverts the approach of calling scrub_html on the end result, but adds a check for script or html on the individual items, which saves the time needed to invoke lxml if there's nothing to scrub out.

EDIT: The check was moved to plone/Products.PortalTransforms#66

@mister-roboto
Copy link

@reinhardt thanks for creating this Pull Request and helping to improve Plone!

TL;DR: Finish pushing changes, pass all other checks, then paste a comment:

@jenkins-plone-org please run jobs

To ensure that these changes do not break other parts of Plone, the Plone test suite matrix needs to pass, but it takes 30-60 min. Other CI checks are usually much faster and the Plone Jenkins resources are limited, so when done pushing changes and all other checks pass either start all Jenkins PR jobs yourself, or simply add the comment above in this PR to start all the jobs automatically.

Happy hacking!

…ipt/html first

Fixes JSONDecodeError when terms contain incomplete HTML
@reinhardt reinhardt force-pushed the getVocabulary-incomplete-html branch from 1401934 to 2524e28 Compare August 2, 2024 14:20
@reinhardt
Copy link
Contributor Author

@jenkins-plone-org please run jobs

Copy link
Member

@davisagli davisagli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty good practical idea for improving the performance here.

Would you consider submitting this change to the SafeHtml transform in PortalTransforms instead? (https://github.com/plone/Products.PortalTransforms/blob/master/Products/PortalTransforms/transforms/safe_html.py) Then it would help any code which uses this transform, and not only the getVocabulary view. Also that would let us run the tests of the transform to make sure this optimization doesn't break any of the intended XSS protections.

plone/app/content/browser/vocabulary.py Outdated Show resolved Hide resolved
There is now a similar check inside scrub_html, see
plone/Products.PortalTransforms#66
@reinhardt
Copy link
Contributor Author

Thanks for the suggestion! I've removed the condition here, so this PR now mainly reverts #287, plus adds a regression test for valid json.

@jenkins-plone-org please run jobs

Copy link
Member

@davisagli davisagli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small suggestion, but this looks good.

]
)
with mock.patch.object(view, "get_vocabulary", return_value=vocab):
json.loads(view())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a better test if it made an assertion on the result to confirm that the html was scrubbed as expected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are other tests that do that. The point of this test is to check that the json can be parsed. A test like this was missing before. I would keep it simple and not do more checks in this test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it should be called testJson to make this clearer?

Copy link
Member

@mauritsvanrees mauritsvanrees left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this and for improving PortalTransforms.

In hindsight it was not a good idea to run scrubHtml on the whole result, as this basically tries to create an lxml tree from <div>{some json}</div>, which can't be good. What is surprising is that no tests were failing, and that my manual testing also did not show any problems.

I added two inline suggestions.

plone/app/content/tests/test_widgets.py Outdated Show resolved Hide resolved
plone/app/content/tests/test_widgets.py Outdated Show resolved Hide resolved
reinhardt and others added 2 commits August 7, 2024 13:11
Co-authored-by: Maurits van Rees <[email protected]>
Co-authored-by: Maurits van Rees <[email protected]>
@reinhardt
Copy link
Contributor Author

@jenkins-plone-org please run jobs

Copy link
Member

@mauritsvanrees mauritsvanrees left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now, thanks.

@mauritsvanrees mauritsvanrees merged commit 0cebad3 into master Aug 8, 2024
13 checks passed
@mauritsvanrees mauritsvanrees deleted the getVocabulary-incomplete-html branch August 8, 2024 10:17
@mauritsvanrees
Copy link
Member

I have released plone.app.content 4.1.6 and Products.PortalTransforms = 4.1.0.

I have updated https://dist.plone.org/release/6.0-dev/ and https://dist.plone.org/release/6.1-dev/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants