Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for images with only srcset and no src #62

Merged
merged 4 commits into from
Nov 5, 2024
Merged

Conversation

benoit74
Copy link
Contributor

@benoit74 benoit74 commented Nov 5, 2024

Fix #59

Changes:

  • see list of commits, two commits are just cleanup / code reordering, and one about adding some missing tests

@benoit74 benoit74 self-assigned this Nov 5, 2024
Copy link

codecov bot commented Nov 5, 2024

Codecov Report

Attention: Patch coverage is 85.12397% with 18 lines in your changes missing coverage. Please review.

Project coverage is 46.49%. Comparing base (5d68d9a) to head (3a7a734).
Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
scraper/src/mindtouch2zim/html_rewriting.py 81.81% 9 Missing and 9 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #62      +/-   ##
==========================================
+ Coverage   37.79%   46.49%   +8.70%     
==========================================
  Files          10       12       +2     
  Lines         688      727      +39     
  Branches       84       94      +10     
==========================================
+ Hits          260      338      +78     
+ Misses        426      376      -50     
- Partials        2       13      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rgaudin
Copy link
Member

rgaudin commented Nov 5, 2024

Why still draft?

@benoit74
Copy link
Contributor Author

benoit74 commented Nov 5, 2024

Because I realized while reviewing this PR on my own before submitting for review that it was a good opportunity to add some missing tests. Now done, you can review

@benoit74 benoit74 marked this pull request as ready for review November 5, 2024 08:43
@benoit74 benoit74 requested a review from rgaudin November 5, 2024 08:43
new_descriptor = new_descriptor.strip()
if current_best_descriptor[-1:] != new_descriptor[-1:]:
return False
return int(new_descriptor[:-1]) > int(current_best_descriptor[:-1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can an invalid descriptor be sent here? Like one that would fail to convert to int?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it can happen, and then it will fail the scraper. This is intended for now, because it should not happen, so I prefer to fail the scraper, know about it, diagnose and decide what to do based on a real use-case (potentially just a parsing issue that needs to be handled) rather than taking bad decisions, having bad images and never noticing it. Thank you for the remark anyway.

@benoit74 benoit74 merged commit b403f74 into main Nov 5, 2024
10 checks passed
@benoit74 benoit74 deleted the img_srcsets branch November 5, 2024 12:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Some images have only srcset and no src
2 participants