Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement new Wikimedia mobile end-point support/render #1903

Merged
merged 59 commits into from
Oct 17, 2023

Conversation

VadimKovalenkoSNF
Copy link
Collaborator

@VadimKovalenkoSNF VadimKovalenkoSNF commented Sep 11, 2023

Fixes #1664
Fixes: #1916
Blocked by #1925

@codecov
Copy link

codecov bot commented Sep 13, 2023

Codecov Report

Attention: 22 lines in your changes are missing coverage. Please review.

Comparison is base (6b51e74) 74.60% compared to head (d412ee4) 76.14%.
Report is 18 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1903      +/-   ##
==========================================
+ Coverage   74.60%   76.14%   +1.53%     
==========================================
  Files          33       37       +4     
  Lines        2812     2985     +173     
  Branches      626      653      +27     
==========================================
+ Hits         2098     2273     +175     
+ Misses        618      617       -1     
+ Partials       96       95       -1     
Files Coverage Δ
src/MediaWiki.ts 86.56% <100.00%> (+0.70%) ⬆️
src/Templates.ts 100.00% <100.00%> (ø)
src/config.ts 100.00% <ø> (ø)
src/mwoffliner.lib.ts 74.81% <100.00%> (ø)
src/renderers/abstract.renderer.ts 86.17% <100.00%> (-1.10%) ⬇️
src/renderers/visual-editor.renderer.ts 56.75% <100.00%> (+1.20%) ⬆️
src/util/builders/url/desktop.director.ts 100.00% <ø> (ø)
src/util/builders/url/mobile.director.ts 100.00% <100.00%> (ø)
src/util/const.ts 100.00% <100.00%> (ø)
src/util/dump.ts 74.19% <100.00%> (+1.77%) ⬆️
... and 9 more

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@VadimKovalenkoSNF
Copy link
Collaborator Author

This patch is not ready for the review yet. I found that article treatments are not working for mobile as I expected first. I pushed preliminary commit b7f6e55 to pass output from page/mobile-html directly to zim creator, but this could not be considered as a final solution - Kiwix app doesn't render it correctly though in local Kiwix server I can see exact same response as I get from page/mobile-html endpoint. There is another approach which might be better - instead of page/mobile-html, use page/html and process sections using scripts from deprecated MCS but inside mwoffliner. This will allow to preserve functionality that has been working before but without mobile-sections endpoint. On the other hand, it might require additional effort to apply it inside mwoffliner.

@VadimKovalenkoSNF VadimKovalenkoSNF self-assigned this Sep 15, 2023
@VadimKovalenkoSNF
Copy link
Collaborator Author

Current implementation of mobile-html render has additional treatments:

  1. Connect mobile js and css that are required for mobile-html.
  2. Remove Edit icon near each section.
  3. Restore lazy load images. Current mobile-html output returns span with a lazy-load placeholder that is not transforming into the image on scroll in Kiwix app.
  4. Restore sections' appearance.
  5. Disable event listeners for sup elements. This can be achieved only on the client side, so I added an additional script tag with this functionality to the output.
  6. Added additional styles to override redundant or invalid CSS rules to make final output more robust.

Copy link
Collaborator

@kelson42 kelson42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Architecture looks good but a few details to fix to keep things lean.
Important now, is to get all the tests passing so we can make the final review.

src/Downloader.ts Outdated Show resolved Hide resolved
src/MediaWiki.ts Outdated Show resolved Hide resolved
src/renderers/renderer.builder.ts Outdated Show resolved Hide resolved
src/util/saveArticles.ts Outdated Show resolved Hide resolved
@VadimKovalenkoSNF VadimKovalenkoSNF force-pushed the 1664-mobile-html-renderer branch 2 times, most recently from 6e2d33c to 251b6bb Compare September 22, 2023 05:56
@VadimKovalenkoSNF VadimKovalenkoSNF marked this pull request as ready for review September 22, 2023 08:44
@VadimKovalenkoSNF
Copy link
Collaborator Author

Set this to the review. To secure tests I explicitly set forceRender: 'WikimediaDesktop' where it was necessary. Once #1898 is approved, we can start to test mobile treatments along with other functionality without duplicating test files, etc.

@kelson42
Copy link
Collaborator

@VadimKovalenkoSNF Great, do we have anything know not working fine specificily with this new end-point and which would need further work (and an other ticket)?

@VadimKovalenkoSNF
Copy link
Collaborator Author

@kelson42 collapsible sections not working (don't mismatch this with collapsible infoboxes) + I want you to look at the HTML output and give me feedback about it ( mostly form a user perspective ). Note that page/mobile-html is different than page/html but I've tried to apply transformations to elements that are not valid for the output.

@VadimKovalenkoSNF
Copy link
Collaborator Author

I noticed that images in gallery box don't render and nopic parameter doesn't work as expected.

@kelson42
Copy link
Collaborator

This is essential that both bugs are fixed

@VadimKovalenkoSNF
Copy link
Collaborator Author

@kelson42 , I created a separate test file test/e2e/mobileRenderIntegrity.test.ts to check zim integrity for mobile renderer. It looks like I still got error about external pcs script there. Steps to reproduce:

Download this patch and run:
npm run test:pattern 'test/e2e/mobileRenderIntegrity.test.ts'

Output that I received:

[ERROR] Invalid external links found:
https://meta.wikimedia.org/api/rest_v1/data/javascript/mobile/pcs is an external dependence in article A/Canada

And the error about invalid internal links:

[ERROR] Invalid internal links found:
The following links:
- ../-/File:"O_Canada",_performed_by_the_United_States_Third_Marine_Aircraft_Wing_Band.oga-bg.vtt
(-/File:"O_Canada",_performed_by_the_United_States_Third_Marine_Aircraft_Wing_Band.oga-bg.vtt) were not found in article A/Canada

I would appreciate your input about how to solve this. Is there any effort required for zimcheck itself to fix this. Or is this only a mwoffliner issue? Link rels seems to work as expected. List of external resources that needed for mobile renderer can be found here - https://en.wikipedia.org/api/rest_v1/page/mobile-html-offline-resources/Canada *
(* title 'Canada' or any other title is required for this API)

@kelson42
Copy link
Collaborator

kelson42 commented Sep 26, 2023

@VadimKovalenkoSNF All internal links pointing to articles which are not mirrored should be removed. Wonder why this one is not!.

External links are OK but external resources obviously not. They ahve to be locally mirrored.

@VadimKovalenkoSNF
Copy link
Collaborator Author

VadimKovalenkoSNF commented Sep 27, 2023

[ERROR] Invalid external links found:

This one has been fixed, I mirrored mobile modules on the right way now.

[ERROR] Invalid internal links found

And this one doesn't seem to be related to the mobile renderer. It works depending on the article title, I left a comment here, maybe this should be highlighted in a separate ticket.

@VadimKovalenkoSNF VadimKovalenkoSNF marked this pull request as ready for review September 27, 2023 18:09
Copy link
Collaborator

@kelson42 kelson42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have finally run with thie renderer and we still have serious problems:

  • Many editing artifact which should be removed (to add description under the title, to edit wikidata entries, ....) See
    image
  • None of the internal links work because somehow the onclick event is still caught

@VadimKovalenkoSNF
Copy link
Collaborator Author

Many editing artifact which should be removed (to add description under the title, to edit wikidata entries, ....) See

I removed them. If there is anything else for removal, I prepared a separate res/pcs/pcs_override_style.css where we can apply CSS overrides

None of the internal links work because somehow the onclick event is still caught

I fixed this, pls check.

@VadimKovalenkoSNF
Copy link
Collaborator Author

Tested mobile renderer output for bm.wikipedia.org:
--format=nopic - works as expected, 2 Mb
--format=nodet - works OK, 60 Mb
--format=nodet, nopic - works OK, 1.6 Mb
bm.wikipedia.org as is - works OK, but the size is 101,5 Mb, twice bigger than WikimediaDesktop render, this needs to be fixed. I suspect the problem is with image optimization. Here is a screenshot with comparison between same images that come from desktop and mobile renders.

image-comparison

Copy link
Collaborator

@kelson42 kelson42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested the ZIM and it looks good in general, but the code needs IMO to be factored properly and partly simplified.

src/renderers/abstract.renderer.ts Outdated Show resolved Hide resolved
src/mwoffliner.lib.ts Outdated Show resolved Hide resolved
@@ -401,6 +404,10 @@ async function execute(argv: any) {
logger.info('Copying Static Resource Files')
await saveStaticFiles(config, zimCreator)

// TODO: refactor sequence, this only needed for mobile renderer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be indeed part of the mobilerenderer

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Static files handle only in src/mwoffliner.lib.ts, not in renders.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is always the same kind of problem (like the CSS): code has to be factored properly. You have one place to save static files and only one, why not here. But the place to specify what are the static files to save that should be in the renderer. In addition it seems you save these static mobile files... all the time, even if they are not needed. which is not clean.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced modules handling to renderers themself. Now abstract desktop and abstract mobile have the ability to filter out necessary modules that come from Downloader. In the execution process now should be the exact modules we need.

src/util/builders/url/base.director.ts Outdated Show resolved Hide resolved
test/e2e/mobileRenderIntegrity.test.ts Outdated Show resolved Hide resolved
src/util/saveArticles.ts Outdated Show resolved Hide resolved
src/util/saveArticles.ts Outdated Show resolved Hide resolved
src/util/saveArticles.ts Outdated Show resolved Hide resolved
src/util/misc.ts Outdated Show resolved Hide resolved
@kelson42
Copy link
Collaborator

kelson42 commented Oct 4, 2023

@VadimKovalenkoSNF Can you please rebase?

Copy link
Collaborator

@kelson42 kelson42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amost good, but IMO really problematic that the modules are not handled properly in the renderers. Same for "static" files!


__ARTICLE_CONFIGVARS_LIST__
__ARTICLE_JS_LIST__
__PCS_JS_OVERRIDE__
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you need to override Mediawiki JS, then make a generic solution/naming to do that, but don't make it specific in particular with naming which is very Mediawiki specific (not MWoffliner

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I replaced it with MW_MOBILE naming.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not what I meant. There should be only one place to override CSS and not a placeholder for each usage. A reall good name should be something like __CSS_OVERRIDE__ or __JS_OVERRIDE__ ... and inside you put whatever you want... here, for this end-point, the PCS related stuff.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done by creating a separate template for desktop and mobile.

src/Downloader.ts Show resolved Hide resolved
src/renderers/abstractMobile.render.ts Outdated Show resolved Hide resolved
@kelson42
Copy link
Collaborator

kelson42 commented Oct 5, 2023

@VadimKovalenkoSNF Don't forget to add WikimediaMobile in RENDER_LIST

@kelson42
Copy link
Collaborator

kelson42 commented Oct 5, 2023

@VadimKovalenkoSNF As far as I can see, there is also a size problem. With this rendere, the ZIM file is almost and still 10x bigger than it should. Please compare the maxi versions (basically --format=novid) in https://download.kiwix.org/zim/wikipedia/

@VadimKovalenkoSNF
Copy link
Collaborator Author

VadimKovalenkoSNF commented Oct 6, 2023

@kelson42 , I've noticed that ab wiki doesn't work as expected with mobile render. Though it has the support of page/mobile-html (prooflink), there is an example of this article (desktop) that returns empty string in page/mobile-html (link). That causes an error and break entire scrapping process. I'll make a shallow error handling without exiting the runtime process. The side effect of this will be some missing articles.

@kelson42
Copy link
Collaborator

kelson42 commented Oct 6, 2023

@VadimKovalenkoSNF either this a bug by us and it has to be fixed or its a bug on wikimedia side and it has to be reported. If the backend is not able to deliver the content of an article, the whole scraper should stop with proper error. No tolerance to backend rendering errors if they concern a full article.

@VadimKovalenkoSNF
Copy link
Collaborator Author

VadimKovalenkoSNF commented Oct 6, 2023

I noticed that page/html is also not working for the article mentioned above, but mwoffliner doesn't exit the scrape process and returns the page with an empty body.

abkhazian-wiki-article

Upd: I created dedicated Phab ticket - https://phabricator.wikimedia.org/T348321

@VadimKovalenkoSNF
Copy link
Collaborator Author

Please compare the maxi versions (basically --format=novid)

I'm not sure that the problem is with novid format, I secured it in test/e2e/mobileRenderFormatParams.test.ts

Copy link
Collaborator

@kelson42 kelson42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VadimKovalenkoSNF If I got that right, this PR still needs to be fixed to have a better static content handling.

@VadimKovalenkoSNF
Copy link
Collaborator Author

@VadimKovalenkoSNF If I got that right, this PR still needs to be fixed to have a better static content handling.

Yes, static file saving is in progress.

@kelson42
Copy link
Collaborator

LGTM but the js/css handling really needs to be revamped, should be handled in the renders.

@kelson42 kelson42 merged commit 8ebd7d0 into main Oct 17, 2023
6 checks passed
@kelson42 kelson42 deleted the 1664-mobile-html-renderer branch October 17, 2023 05:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants