Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSS support for formatting styles #20

Open
sgtatham opened this issue Feb 3, 2020 · 1 comment
Open

CSS support for formatting styles #20

sgtatham opened this issue Feb 3, 2020 · 1 comment

Comments

@sgtatham
Copy link
Contributor

sgtatham commented Feb 3, 2020

There are situations in which it would be useful for html2text to understand at least a small amount of CSS.

An occasional annoyance I find with some web pages is that they use different classes of <span> (or <div>, depending on preference) for all their formatting, including both paragraph separation and inline style changes such as emphasis. Then they rely on CSS to make some of those span classes behave like <p>, some like <em>, some like <code> and so on.

html2text can't render a document of that kind sensibly without having to speak enough CSS to at least know which classes of <span> it should treat like which normal tags. You end up with a huge megaparagraph, or alternatively no end of spurious newlines (depending on whether the author went all-spans or all-divs).

I don't have a real-world example handy, but here's one I mocked up manually:

<head>
<title>Demo of the 'spans-everywhere' school of HTML</title>
<style type="text/css">
.p { display: block; margin-bottom: 1em; }
.em { font-style: italic; }
.code { font-family: monospace; }
</style>
</head>
<body>
<span class="p">Paragraph one, containing <span class="em">emphasis</span>.</span><span class="p">Paragraph two, containing <span class="code">code</span>.</span>
</body>
</html>

@jugglerchris mentioned that another use case is pages that use display: none.

@jugglerchris
Copy link
Owner

There is now what could be described as "minimal CSS support"; it includes display: none but not font-style or display:block. So some progress has been made...

@jugglerchris jugglerchris changed the title Minimal CSS support CSS support for formatting styles Jan 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants