Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle libretexts.org glossary rewriting #66

Merged
merged 3 commits into from
Nov 13, 2024
Merged

Handle libretexts.org glossary rewriting #66

merged 3 commits into from
Nov 13, 2024

Conversation

benoit74
Copy link
Contributor

Fix #53

image

While a bit basic (as mentioned in the issue, we do not handle all properties and we do not handle nested glossary definitions), this is sufficient to provide the sheer added value of glossary pages.

@benoit74 benoit74 self-assigned this Nov 11, 2024
Copy link

codecov bot commented Nov 11, 2024

Codecov Report

Attention: Patch coverage is 22.22222% with 21 lines in your changes missing coverage. Please review.

Project coverage is 44.43%. Comparing base (f0e6cfe) to head (477304f).
Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
scraper/src/mindtouch2zim/libretexts/glossary.py 23.80% 16 Missing ⚠️
scraper/src/mindtouch2zim/processor.py 16.66% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #66      +/-   ##
==========================================
- Coverage   45.08%   44.43%   -0.66%     
==========================================
  Files          13       14       +1     
  Lines         845      871      +26     
  Branches      111      116       +5     
==========================================
+ Hits          381      387       +6     
- Misses        451      471      +20     
  Partials       13       13              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@benoit74 benoit74 marked this pull request as ready for review November 11, 2024 16:20
Copy link
Member

@rgaudin rgaudin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but I dont see in the ticket nor in the PR why this is limited to English. Should probably be mentioned in the README and should it be a current shortcut, probably have a ticket to support other languages later

scraper/src/mindtouch2zim/processor.py Outdated Show resolved Hide resolved
scraper/src/mindtouch2zim/libretexts/glossary.py Outdated Show resolved Hide resolved
Base automatically changed from collapse_subpages to main November 13, 2024 08:22
@benoit74
Copy link
Contributor Author

benoit74 commented Nov 13, 2024

You are right, the fact this is limited to English is a shortcut I took without even really noticing it. I checked the list of ZIMs we have to build and there is some international libraries, so we should support it.

I changed the way it works by:

  • detecting we are on a glossary based on path pattern instead of title : /zz:_*/20:_* (seems to be stable across all libraries even in foreign languages)
  • ignoring all tables on the page but the last one (seems to be stable as well)
  • rewriting with regular logic when there is no table (happens a lot in the wild in fact)

Tested on:

@rgaudin
Copy link
Member

rgaudin commented Nov 13, 2024

Great ; please re-request when ready

@benoit74
Copy link
Contributor Author

Ready, first commit is unchanged (just rebased), please look only at second commit for last changes

Copy link
Member

@rgaudin rgaudin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 please add suggested comment

scraper/src/mindtouch2zim/processor.py Show resolved Hide resolved
@benoit74 benoit74 merged commit 89e0b16 into main Nov 13, 2024
10 checks passed
@benoit74 benoit74 deleted the glossary branch November 13, 2024 10:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Properly render Glossary pages of libretexts.org
2 participants