Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update setBaseUrl() to handle URL directors #1929

Merged
merged 10 commits into from
Nov 15, 2023

Conversation

VadimKovalenkoSNF
Copy link
Collaborator

@VadimKovalenkoSNF VadimKovalenkoSNF commented Oct 17, 2023

Fixes: #1927
Depends on #1939
Depends on #1944

@codecov
Copy link

codecov bot commented Oct 17, 2023

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Comparison is base (cb4f7a2) 74.17% compared to head (db23e5c) 73.89%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1929      +/-   ##
==========================================
- Coverage   74.17%   73.89%   -0.29%     
==========================================
  Files          39       39              
  Lines        3106     3095      -11     
  Branches      686      679       -7     
==========================================
- Hits         2304     2287      -17     
- Misses        687      689       +2     
- Partials      115      119       +4     
Files Coverage Δ
src/Downloader.ts 71.42% <100.00%> (-0.73%) ⬇️
src/config.ts 100.00% <ø> (ø)
src/renderers/abstractMobile.render.ts 95.45% <100.00%> (ø)
src/util/builders/url/api.director.ts 87.50% <100.00%> (-1.98%) ⬇️
src/util/builders/url/visual-editor.director.ts 100.00% <100.00%> (ø)
src/mwoffliner.lib.ts 75.45% <0.00%> (+0.64%) ⬆️
src/util/saveArticles.ts 75.72% <66.66%> (-0.12%) ⬇️
src/MediaWiki.ts 87.11% <94.59%> (+0.35%) ⬆️

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@VadimKovalenkoSNF
Copy link
Collaborator Author

The idea behind this patch is to remove downloader.setBaseUrl as this functionality is redundant and duplicates URL builder methods which can be used directly in saveArticles() loop to create the correct article Url using <render constructor name>.buildArticleURL(articleId)

@VadimKovalenkoSNF VadimKovalenkoSNF marked this pull request as ready for review October 17, 2023 14:13
Copy link
Collaborator

@kelson42 kelson42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not the proper architecture:

  • CPU inefficient, why testing millions of times something which can be decided once for all?
  • The overall concept of having classes sharing the same API and they are interchangeable is always underused, although this is a base principle of the new architecture
  • Director in a technicality which should be in the Downloader... and ideally nowhere else.
  • To me it looks like a serious regression has been introduced by not handling the case with anymore when forceRender is null.

Comment on lines 169 to 181
public getArticleUrl(renderer, articleId: string): string {
let articleUrl: string
// Renders and URL builders are independent so there should be a switch/case scenario here
switch (renderer.constructor.name) {
case 'WikimediaDesktopRenderer':
articleUrl = MediaWiki.wikimediaDesktopUrlDirector.buildArticleURL(articleId)
break
case 'VisualEditorRenderer':
articleUrl = MediaWiki.visualEditorURLDirector.buildArticleURL(articleId)
break
case 'WikimediaMobileRenderer':
articleUrl = MediaWiki.wikimediaMobileUrlDirector.buildArticleURL(articleId)
break
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not going do go through this millions of times, if this is can be done once per scrape. Decide which Director to use once for all.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this has been revamped in favor of using URL builder's .buildArticleURL(articleId) method.

Comment on lines -172 to -188
public async setBaseUrls(forceRender = null) {
if (!forceRender) {
//* Objects order in array matters!
this.baseUrl = basicURLDirector.buildDownloaderBaseUrl([
{ condition: await MediaWiki.hasWikimediaMobileApi(), value: MediaWiki.WikimediaMobileApiUrl.href },
{ condition: await MediaWiki.hasWikimediaDesktopApi(), value: MediaWiki.WikimediaDesktopApiUrl.href },
{ condition: await MediaWiki.hasVisualEditorApi(), value: MediaWiki.visualEditorApiUrl.href },
])

//* Objects order in array matters!
this.baseUrlForMainPage = basicURLDirector.buildDownloaderBaseUrl([
{ condition: await MediaWiki.hasWikimediaDesktopApi(), value: MediaWiki.WikimediaDesktopApiUrl.href },
{ condition: await MediaWiki.hasVisualEditorApi(), value: MediaWiki.visualEditorApiUrl.href },
{ condition: await MediaWiki.hasWikimediaMobileApi(), value: MediaWiki.WikimediaMobileApiUrl.href },
])
} else {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where goes all this logic to decide automatically which API/render to use? Init the Director object at construction time. You can not get ride of a some kind of init code for this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic for choosing render is in saveArticles(). Here you can find a condition where forceRender is null. When choosing render, src/renderers/renderer.builder.ts automatically selects the necessary API based on the predefined renderType option.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upd: I brought back this logic + calculate the necessary URL builder here.

default:
throw new Error('Unable to find specific API end-point to retrieve article HTML')
}
public getArticleUrl(renderer, articleId: string): string {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method should have only one argument: articleId

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of renderer I passed dump to apply isMainPage condition, pls check.

src/MediaWiki.ts Show resolved Hide resolved
src/MediaWiki.ts Outdated
@@ -152,7 +152,7 @@ class MediaWiki {

public async hasVisualEditorApi(): Promise<boolean> {
if (this.#hasVisualEditorApi === null) {
this.#hasVisualEditorApi = await checkApiAvailability(this.VisualEditorURLDirector.buildArticleURL(this.apiCheckArticleId))
this.#hasVisualEditorApi = await checkApiAvailability(this.visualEditorURLDirector.buildArticleURL(this.apiCheckArticleId))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instanciate the needed director here temporary if you need it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

src/MediaWiki.ts Outdated
Comment on lines 190 to 192
this.wikimediaDesktopUrlDirector = new WikimediaDesktopURLDirector(this.WikimediaDesktopApiUrl.href)
this.wikimediaMobileUrlDirector = new WikimediaMobileURLDirector(this.WikimediaMobileApiUrl.href)
this.VisualEditorURLDirector = new VisualEditorURLDirector(this.visualEditorApiUrl.href)
this.visualEditorURLDirector = new VisualEditorURLDirector(this.visualEditorApiUrl.href)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should vanish

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, now instantiates in related Mediawiki hasCapabilies lazy methods.

src/util/builders/url/api.director.ts Show resolved Hide resolved
@VadimKovalenkoSNF VadimKovalenkoSNF changed the title Deprecate setBaseUrl() and getArticleUrl() in favor of downloader.getArticleUrl() Update setBaseUrl() to handle URL directors Oct 23, 2023
@kelson42
Copy link
Collaborator

@VadimKovalenkoSNF Unfortunately didn't have the time to look at this in detail. But please rebase and fix conflict and I will within 24 hours.

@VadimKovalenkoSNF VadimKovalenkoSNF force-pushed the 1927-fix-article-url-calculation branch 2 times, most recently from df90842 to 4e244a4 Compare October 26, 2023 12:24
Copy link
Collaborator

@kelson42 kelson42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not there yet unfortunately. See my comments.

{ condition: await MediaWiki.hasVisualEditorApi(), value: MediaWiki.VisualEditorApiUrl.href, Director: VisualEditorURLDirector },
{ condition: await MediaWiki.hasWikimediaMobileApi(), value: MediaWiki.WikimediaMobileApiUrl.href, Director: WikimediaMobileURLDirector },
]
this.baseUrlForMainPage = basicURLDirector.buildDownloaderBaseUrl(mainPageCapabilitiesList)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we stick with a baseUrl here and don't have a director like for other articles? Should be the case IMO. We should not have anything about baseUrl here in this class and probably nowhere else probably.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls check recent changes, the property Downloader.baseUrl is no longer used.

Comment on lines 191 to 198
const articlesCapabilitiesList = [
{ condition: await MediaWiki.hasWikimediaMobileApi(), value: MediaWiki.WikimediaMobileApiUrl.href, Director: WikimediaMobileURLDirector },
{ condition: await MediaWiki.hasWikimediaDesktopApi(), value: MediaWiki.WikimediaDesktopApiUrl.href, Director: WikimediaDesktopURLDirector },
{ condition: await MediaWiki.hasVisualEditorApi(), value: MediaWiki.VisualEditorApiUrl.href, Director: VisualEditorURLDirector },
]

this.baseUrl = basicURLDirector.buildDownloaderBaseUrl(articlesCapabilitiesList)
this.articleUrlDirector = this.getUrlDirector(articlesCapabilitiesList)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the condition does not returns directly the director.... like it was returning directly the baseURL in the past? Concept of capabilites (and related functions) seem useless.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored this to downloader.setUrlsDirectors(mainPageRenderer, articlesRenderer)

src/MediaWiki.ts Outdated
@@ -181,15 +184,12 @@ class MediaWiki {
const baseUrlDirector = new BaseURLDirector(this.baseUrl.href)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole function/method initMWApis() should disappear:

  • baseUrlDirector should be make class public variable a this.urlDirector
  • BaseURLDirector should take all the necessary arguments to be able to deliver all the necessary paths:buildWikimediaDesktopApiUrl, buildVisualEditorURL ...
  • BaseURLDirector method buildWikimediaDesktopApiUrl, buildVisualEditorURL ... should be made lazy and renamed in getWikimediaDekstopApiUrl(), ....
  • Mediawiki method urlDirector should be used in place of all these variables.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored this part of the code, pls check set base(value: string) { ... }. Once we set this.baseUrl in Mediawiki we are eligible to calculate other endpoints.


if (!this.baseUrl || !this.baseUrlForMainPage) throw new Error('Unable to find appropriate API end-point to retrieve article HTML')
public getArticleUrl(dump: Dump, articleId: string): string {
Copy link
Collaborator

@kelson42 kelson42 Oct 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't take dump as parameter, only the articleId. Create in addition a getMainPageUrl(articleId)... the function interface naming is not great I know....

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created downloader.getMainPageUrl()

@VadimKovalenkoSNF VadimKovalenkoSNF force-pushed the 1927-fix-article-url-calculation branch 2 times, most recently from 9f313bd to aa14e7a Compare November 9, 2023 12:47
@kelson42
Copy link
Collaborator

@VadimKovalenkoSNF This PR should be rebased and hopefuly is ready to review

@kelson42 kelson42 merged commit 31709d3 into main Nov 15, 2023
6 checks passed
@kelson42 kelson42 deleted the 1927-fix-article-url-calculation branch November 15, 2023 06:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Full article URL calculates two times in different parts of mwoffliner using different funtions
2 participants