Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Render mobile images to standard size #2043

Merged
merged 2 commits into from
Jun 23, 2024
Merged

Render mobile images to standard size #2043

merged 2 commits into from
Jun 23, 2024

Conversation

audiodude
Copy link
Member

@audiodude audiodude commented Jun 21, 2024

Fixes #1925

In https://phabricator.wikimedia.org/T349972 a solution was given to img data-src attributes hardcoding images that are too big. This was the provision of an attribute data-data-file-original-src which has the original src size of the image. By hacking this URL, we are able to provide images that are any size we want, in this case 140px is chosen arbitrarily.

Note the use of the WikimediaMobileRenderer.INTERNAL object in this PR. This is to provide test access to methods that are, and should remain private. Here, we only want to test the image transformations of the renderer with a fake document. The production implementation uses this.INTERNAL.convertLazyLoadToImages to reference the private method, and we do the same in the test. This is the pattern that we have agreed upon in our Typescript codebase at YouTube and I think it works fairly well, though it is a bit ugly. The benefits are clear though from looking at the unit test provided with this PR. It is short and concise and fast, tests only the method being changed, and doesn't require a network connection or setting up a long chain of mwoffliner configs and objects.

Copy link
Collaborator

@kelson42 kelson42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@audiodude The image is retrieved from the right place but I don't understand why you change the resolution. It's not our role to change the resolution.

It's seems you think the size given is wrong, but I'm not sure about the standardsyou refer too. I also don't understand hoe 140px width could be the right width!

@audiodude
Copy link
Member Author

Sorry, I guess I was confused about the requirements.

If you look at the images retrieved from the sample endpoint, which I included in the test, you see that the first has data-width="320" and:

data-src="//upload.wikimedia.org/wikipedia/commons/thumb/8/8f/Bamako_et_fleuve_Niger.jpg/320px-Bamako_et_fleuve_Niger.jpg"

So the data-src already has the right size image. Same for the other image:

data-width="640"
data-src="//upload.wikimedia.org/wikipedia/commons/thumb/2/20/Bamako_bridge2.jpg/640px-Bamako_bridge2.jpg"

Since I read in the reports that 640px was "too big/wide", I thought I had to resize the images. 140px was just an arbitrary value I chose.

@kelson42
Copy link
Collaborator

kelson42 commented Jun 22, 2024

@audiodude Key approach (to see if we are doing good is) IMHO is to compare to what we get currently with the mobile-section end-point in production on the Zimfarm.

That said, I have still a concern that the size given in data-data-file-original-src is the size given in the wiki... and that this is not adapted to the mobile HTML given (and/or not coherent with the resolution given by html-section API end-point). We need to looks at this.

@audiodude
Copy link
Member Author

audiodude commented Jun 22, 2024

Yes looking at the HTML of https://bm.wikipedia.org/wiki/Bamak%C9%94 I see:

<img src="//upload.wikimedia.org/wikipedia/commons/thumb/8/8f/Bamako_et_fleuve_Niger.jpg/250px-Bamako_et_fleuve_Niger.jpg" decoding="async" width="250" height="188" class="mw-file-element" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/8/8f/Bamako_et_fleuve_Niger.jpg/375px-Bamako_et_fleuve_Niger.jpg 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/8/8f/Bamako_et_fleuve_Niger.jpg/500px-Bamako_et_fleuve_Niger.jpg 2x" data-file-width="600" data-file-height="450">

Which matches the wikicode:

[[Fichier:Bamako_et_fleuve_Niger.jpg|thumb|250px|Bamako]]

Which has a width of 250, and that's what we see in data-data-file-original-src.

So maybe the solution is to just set the <img>.src to this value and not have to transform any URLs?

@kelson42
Copy link
Collaborator

So maybe the solution is to just set the <img>.src to this value and not have to transform any URLs?

If you read the issue (here and on Phabricator) this has been clearly what I requested... and what I expected this PR to do...

But it we do only this we will:

  • Increase significantly the size of our ZIM files
  • Might have ugly visual effect
  • ... and an other side effect will be to recompute the whole cache of pictures....

I wonder how bad would these 3 points above be... and how important/urgent is this to fix.

My (bad) feeling is that it is urgent and we need to either

  • Request the WMF to deliver "mobile friendly" width and height to recompute the right size URL (I don't like this as I said previously)
  • Request the WMF to deliver the "mobile friendly" resolution URL (I don't like because this is not what I requested first)
  • As what the WMF to deliver the logic used to compute the "mobile friendly" resolution to resize on the the fly our pictures (hate as well as sounds fragile)

What do you think?

Copy link

codecov bot commented Jun 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 74.73%. Comparing base (7f61953) to head (4af9396).

Current head 4af9396 differs from pull request most recent head fdccec9

Please upload reports for the commit fdccec9 to get more accurate results.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2043      +/-   ##
==========================================
+ Coverage   74.72%   74.73%   +0.01%     
==========================================
  Files          41       41              
  Lines        3141     3143       +2     
  Branches      688      689       +1     
==========================================
+ Hits         2347     2349       +2     
  Misses        675      675              
  Partials      119      119              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@audiodude
Copy link
Member Author

audiodude commented Jun 22, 2024

I think there is a confusion about what "mobile-friendly" means. It seems the Mediawiki software is using 640px and 320px images as "mobile-friendly" because they will take up the full width of the phone, which arguably looks better when you're scrolling an article in vertical format. I don't think there is any concept here, in the original software, that "mobile-friendly" means "saving bandwidth".

I don't think using the data-data-file-original-src will increase the size of the ZIMs. The Mediawiki code you linked to here will never upscale an image past it's "original file size", but it will make it 640px or 320px if that file size is greater than those options. As pointed out, this is potentially greater than the size that the image was on the page (in the example above, 250px).

So I think using the original page image will:

  1. Actually decrease ZIM size compared to using data-src
  2. More closely match what the article looks like on the website (no impact to "visual effect")

As far as recomputing the cache of pictures, that is definitely a side effect, but seems necessary.

I just don't think WMF or the Mediawiki maintainers have any more idea what a "mobile-friendly" size is than we do, or at least they have a very different definition of it. That's why I just guessed 140px.

@kelson42
Copy link
Collaborator

kelson42 commented Jun 23, 2024

I think there is a confusion about what "mobile-friendly" means. It seems the Mediawiki software is using 640px and 320px images as "mobile-friendly" because they will take up the full width of the phone, which arguably looks better when you're scrolling an article in vertical format. I don't think there is any concept here, in the original software, that "mobile-friendly" means "saving bandwidth".

Yes, the "mobile-html" new API we use is conceived for The Wikipedia mobile app. I believe the mobile app does on the fly image resising (probably depending of the device screen resolution).

I don't think using the data-data-file-original-src will increase the size of the ZIMs. The Mediawiki code you linked to here will never upscale an image past it's "original file size", but it will make it 640px or 320px if that file size is greater than those options. As pointed out, this is potentially greater than the size that the image was on the page (in the example above, 250px).

You might be right, but reference should be current generated ZIM files, which use an other API. That should be actually really easy to do.

So I think using the original page image will:

  1. Actually decrease ZIM size compared to using data-src
  2. More closely match what the article looks like on the website (no impact to "visual effect")

As far as recomputing the cache of pictures, that is definitely a side effect, but seems necessary.

I just don't think WMF or the Mediawiki maintainers have any more idea what a "mobile-friendly" size is than we do, or at least they have a very different definition of it. That's why I just guessed 140px.

Yes, at least their definition of "mobile-friendly" has significantly evolved between the "mobile-section" API and the "mobile-html". To me "mobile-friendly" means "smaller than desktop", somlike it is currently with the "mobile section" API.

@audiodude
Copy link
Member Author

audiodude commented Jun 23, 2024

You might be right, but reference should be current generated ZIM files, which use an other API. That should be actually really easy to do.

So do I just compare https://download.kiwix.org/zim/wikipedia/wikipedia_bm_all_maxi_2024-06.zim with a new ZIM that I make from scratch using the updated code?

Is there a way to see the params that were used to make that ZIM? I don't see a recipe on https://farm.zimit.kiwix.org/recipes probably because I don't have permssions?

@kelson42
Copy link
Collaborator

You might be right, but reference should be current generated ZIM files, which use an other API. That should be actually really easy to do.
So do I just compare https://download.kiwix.org/zim/wikipedia/wikipedia_bm_all_maxi_2024-06.zim with a new ZIM that I make from scratch using the updated code?

Yes

Is there a way to see the params that were used to make that ZIM? I don't see a recipe on https://farm.zimit.kiwix.org/recipes probably because I don't have permssions?

https://farm.openzim.org/recipes/wikipedia_bm_all

Important to consider is the --webp option.

@audiodude
Copy link
Member Author

Looks like the files are around the same, even with my changes to the image src:

Created with my modified version of mwoffliner:

$ du output/wikipedia_bm_all_maxi_2024-06.zim 
41884	output/wikipedia_bm_all_maxi_2024-06.zim

File from https://download.kiwix.org/zim/wikipedia/wikipedia_bm_all_maxi_2024-06.zim

$ du ~/Downloads/wikipedia_bm_all_maxi_2024-06.zim 
41956	/home/tmoney/Downloads/wikipedia_bm_all_maxi_2024-06.zim

@audiodude
Copy link
Member Author

Wait, do I have to modify a setting to force the mobile renderer?

@kelson42
Copy link
Collaborator

kelson42 commented Jun 23, 2024

@audiodude --forceRender might have to be used (actually this is something we need to check as well if the priorisations of API is correct).. but you can not use the mobile-section API anymore anyway with 1.14 (dev) version of MWoffliner.

@audiodude
Copy link
Member Author

Here is the command I used:

$ node ./lib/cli.js --webp --mwUrl="https://bm.wikipedia.org" --format="nopic:nopic" --format="novid:maxi" --osTmpDir="/dev/shm" --adminEmail="[email protected]" --outputDirectory="output" --customZimFavicon="https://drive.farm.openzim.org/wikipedia_all/favicon-48x48.png" --customZimDescription="offline version of Wikipedia in Bambara" --publisher="openZIM" --forceRender=WikimediaMobile

And here is the resulting file size:

$ du output/wikipedia_bm_all_maxi_2024-06.zim 
41700	output/wikipedia_bm_all_maxi_2024-06.zim

So no significant impact on file size using the Mobile Renderer with getting the image src from data-data-file-original-src

@audiodude
Copy link
Member Author

I've updated the code, PTAL

@kelson42
Copy link
Collaborator

@audiodude Code LGTM, but CI is red. I will rebase.

@audiodude
Copy link
Member Author

@audiodude Code LGTM, but CI is red. I will rebase.

It looks like the tests are flaky, one of them is timing out sometimes:

FAIL test/unit/mwApiCapabilities.test.ts (10.2 s)
  ● Checking Mediawiki capabilities › test capabilities of pokemon.fandom.com with RestApi receipt

    thrown: "Exceeded timeout of 5000 ms for a test.
    Add a timeout value to this test to increase the timeout, if this is a long-running test. See https://jestjs.io/docs/api#testname-fn-timeout."

      67 |   })
      68 |
    > 69 |   test('test capabilities of pokemon.fandom.com with RestApi receipt', async () => {
         |   ^
      70 |     MediaWiki.base = 'https://pokemon.fandom.com/'
      71 |     MediaWiki.wikiPath = '/'
      72 |     MediaWiki.restApiPath = 'rest.php'

      at test/unit/mwApiCapabilities.test.ts:69:3
      at test/unit/mwApiCapabilities.test.ts:3:1

@audiodude
Copy link
Member Author

@audiodude Code LGTM, but CI is red. I will rebase.

It looks like the tests are flaky, one of them is timing out sometimes:

FAIL test/unit/mwApiCapabilities.test.ts (10.2 s)
  ● Checking Mediawiki capabilities › test capabilities of pokemon.fandom.com with RestApi receipt

    thrown: "Exceeded timeout of 5000 ms for a test.
    Add a timeout value to this test to increase the timeout, if this is a long-running test. See https://jestjs.io/docs/api#testname-fn-timeout."

      67 |   })
      68 |
    > 69 |   test('test capabilities of pokemon.fandom.com with RestApi receipt', async () => {
         |   ^
      70 |     MediaWiki.base = 'https://pokemon.fandom.com/'
      71 |     MediaWiki.wikiPath = '/'
      72 |     MediaWiki.restApiPath = 'rest.php'

      at test/unit/mwApiCapabilities.test.ts:69:3
      at test/unit/mwApiCapabilities.test.ts:3:1

#2045

@audiodude
Copy link
Member Author

audiodude commented Jun 23, 2024

This failure might be an actual bug because it's failing zimcheck for a MobileRenderer scrape.

EDIT: Just checked the generated ZIM and the images do appear to be broken

@audiodude
Copy link
Member Author

Looks like several of the images in the BMW article don't have data-data-original-file-src:

<span
                            class="mw-file-element gallery-img pcs-widen-image-override pcs-lazy-load-placeholder pcs-lazy-load-placeholder-pending"
                            style="width: 150px"
                            data-class="mw-file-element gallery-img pcs-widen-image-override"
                            data-src="//upload.wikimedia.org/wikipedia/commons/thumb/4/44/BMW.svg/150px-BMW.svg.png"
                            data-srcset="//upload.wikimedia.org/wikipedia/commons/thumb/4/44/BMW.svg/225px-BMW.svg.png 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/4/44/BMW.svg/300px-BMW.svg.png 2x"
                            data-width="150"
                            data-height="150"
                            data-alt="Logo used in vehicles since 1997"
                            data-data-file-width="1015"
                            data-data-file-height="1015"
                            ><span style="padding-top: 100%">
...

<span
                            class="mw-file-element gallery-img pcs-widen-image-override pcs-lazy-load-placeholder pcs-lazy-load-placeholder-pending"
                            style="width: 150px"
                            data-class="mw-file-element gallery-img pcs-widen-image-override"
                            data-src="//upload.wikimedia.org/wikipedia/commons/thumb/b/b2/Squircle_%28120290377%29.jpg/150px-Squircle_%28120290377%29.jpg"
                            data-srcset="//upload.wikimedia.org/wikipedia/commons/thumb/b/b2/Squircle_%28120290377%29.jpg/225px-Squircle_%28120290377%29.jpg 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/b/b2/Squircle_%28120290377%29.jpg/300px-Squircle_%28120290377%29.jpg 2x"
                            data-width="150"
                            data-height="147"
                            data-alt="The logo on a BMW car"
                            data-data-file-width="1642"
                            data-data-file-height="1614"
                            >

I guess we have to fall back to src in this case?

@audiodude
Copy link
Member Author

It looks like in the mobile API source code, that if the image didn't get resized, it doesn't have a data-data-file-original-src attribute. So in that case it should be fine to use the .src.

https://github.com/wikimedia/mediawiki-services-mobileapps/blob/master/lib/thumbnail.js#L96

@kelson42
Copy link
Collaborator

kelson42 commented Jun 23, 2024

@audiodude

It looks like in the mobile API source code, that if the image didn't get resized, it doesn't have a data-data-file-original-src attribute. So in that case it should be fine to use the .src.

https://github.com/wikimedia/mediawiki-services-mobileapps/blob/master/lib/thumbnail.js#L96

I guess this PR should be patched then!?

@audiodude
Copy link
Member Author

Yes just updated. Tests pass locally.

@audiodude
Copy link
Member Author

Force pushed to consolidate commits, PTAL

@kelson42 kelson42 merged commit ecc3e7e into main Jun 23, 2024
4 checks passed
@kelson42 kelson42 deleted the img-size branch June 23, 2024 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Images from page/mobile-html endpoint are too big
2 participants