You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are situations in which it would be useful for html2text to understand at least a small amount of CSS.
An occasional annoyance I find with some web pages is that they use different classes of <span> (or <div>, depending on preference) for all their formatting, including both paragraph separation and inline style changes such as emphasis. Then they rely on CSS to make some of those span classes behave like <p>, some like <em>, some like <code> and so on.
html2text can't render a document of that kind sensibly without having to speak enough CSS to at least know which classes of <span> it should treat like which normal tags. You end up with a huge megaparagraph, or alternatively no end of spurious newlines (depending on whether the author went all-spans or all-divs).
I don't have a real-world example handy, but here's one I mocked up manually:
There is now what could be described as "minimal CSS support"; it includes display: none but not font-style or display:block. So some progress has been made...
jugglerchris
changed the title
Minimal CSS support
CSS support for formatting styles
Jan 21, 2024
There are situations in which it would be useful for html2text to understand at least a small amount of CSS.
An occasional annoyance I find with some web pages is that they use different classes of
<span>
(or<div>
, depending on preference) for all their formatting, including both paragraph separation and inline style changes such as emphasis. Then they rely on CSS to make some of those span classes behave like<p>
, some like<em>
, some like<code>
and so on.html2text can't render a document of that kind sensibly without having to speak enough CSS to at least know which classes of
<span>
it should treat like which normal tags. You end up with a huge megaparagraph, or alternatively no end of spurious newlines (depending on whether the author went all-spans or all-divs).I don't have a real-world example handy, but here's one I mocked up manually:
@jugglerchris mentioned that another use case is pages that use
display: none
.The text was updated successfully, but these errors were encountered: