Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

volume breaking: prefer splitting at lower level headings #119

Closed
josteinaj opened this issue Oct 10, 2017 · 16 comments
Closed

volume breaking: prefer splitting at lower level headings #119

josteinaj opened this issue Oct 10, 2017 · 16 comments

Comments

@josteinaj
Copy link
Member

josteinaj commented Oct 10, 2017

Test was set to pending due to #63 but was never reenabled.

<x:scenario label="prefer splitting at lower level headings" pending="https://github.com/nlbdev/pipeline/issues/119">

@bertfrees
Copy link
Collaborator

@josteinaj The test doesn't really make sense anymore now that we treat the first levelN the same as the parent level{N-1} (see #88 (comment)) because in the test every level has at most one child level. I guess the decision to treat the first child the same as the parent was based on the assumption that the content at start of the parent level would be small compared to the first child level, but that is not the case in the test. The solution could be to let the CSS style of the first child depend on how much content precedes it.

Another consequence of the decision to treat the first child the same as the parent is that if a level has more than one child level, there is the risk that the first child could contain a break while it is preferred to happen after the first child or before the parent. We have to try it and if it turns out to be a problem, a solution could be to wrap the first child level and the content before inside a div and give it the same priority of other child levels.

@josteinaj
Copy link
Member Author

josteinaj commented Jan 2, 2018

Yes, I've updated the -obfl-keep rules so that the first level recursvely should have the same -obfl-keep value in 5b5d418 (it turned out to be much simpler than I had expected). I have a local stash where I started an attempt at updating this test to work for the updated rules but it's not done. The test will have to be updated so that a volume break normally would occur in the middle of the second sibling levelN for each N that it tests.

@bertfrees
Copy link
Collaborator

I'm not sure I understand the rationale behind this requirement. (Maybe I did at some point but forgot, in which case you have to explain it to me again.) Why is splitting at lower level headings better? Isn't it better to split at higher level headings so that a new volume doesn't start in the middle of a chapter or subchapter? Sorry if we talked about this already.

@josteinaj
Copy link
Member Author

Hmm. Maybe it's about prefering lower-level headings when it's not possible to split at higher-level headings.

@KariRudjord what do you think?

@KariRudjord
Copy link

Yes, if you only split on higher-level headings some books will get very uneven volumes. For fiction, structure varies a lot from book to book, and we want to fill up volumes as good as possible to avoid to many volumes.

Many volumes are impractically for the reader and more expensive when printing at the publisher.

@bertfrees
Copy link
Collaborator

OK thanks for the explanation. I understand the idea behind it now. But still, the solution just seems wrong. In order to fill up volumes as good as possible, what you want to do essentially is to define as many break opportunities as possible. That is, break opportunities with equal weight, or possibly with a slight preference for breaks at higher level headings, but certainly not the other way around. Giving preference to breaks at lower level headings has two counteracting effects. On the one hand, it will indeed create more break opportunities because there are more lower levels than higher levels. But on the other hand, you increase the chances to get uneven volumes because not all break points are equal.

Let's take a step back and look at the requirements. Take the following example:

h1
...
  ---
  h2
  ...
    ---
    h3
    ...
    ---   *
    h3
    ...
  ---     *
  h2
  ...
    ---
    h3
    ...
    ---   *
    h3
    ...
---       *
h1
...
  ---
  h2
  ...

We want to have as many break opportunities as possible, i.e. one before every heading. This is marked with ---. You might want to give the higher level headings a slight preference but let's forget about that for a moment. You also might want to give the ones before the first h2 in a h1, or the first h3 inside a h2 (the ones without *) a lower preference, but for simplicity let's also forget about that for now.

The correct approach to solve this problem is to make sure every chunk (part between two ---) maps to a single element and assign the same "keep-priority" to all chunks. In practice this means that for all levels that have child levels, the content before the first child level needs to be wrapped inside a new element.

Taking it one step further: if we want to give a preference to certain break opportunities, we need nested elements with different "keep-priority" values. This is where it starts to get more tricky because you need to come up with a nesting structure that matches how you want to break, and this does not necessarily match the nesting structure of the DTBook/EPUB.

Let's say for example we want to give the break points marked with * a higher preference. A possible solution could be the following.

h1
...
  ---
  h2
  ...
    ---
    h3
    ...
-------   *
    h3
    ...
-------   *
  h2
  ...
    ---
    h3
    ...
-------   *
    h3
    ...
-------   *
h1
...
  ---
  h2
  ...

All the chunks still map to elements with the same high keep-priority. The bigger parts separated with ------- that contain multiple chunks need to be wrapped in elements with a low "keep-priority". Exactly how much preference should be given to the * breaks can be controlled with the exact keep-priority values.

It is clear that the DTBook structure is not suitable for the above example. Levels have to be split up for each new descendant level with a preceding sibling level. As a general rule, by giving preference to breaks at higher levels you can keep more of the original structure of the document.

@josteinaj
Copy link
Member Author

So, what if we chunk the content during pre-processing? Would that make things easier?

input:

<book>
    <level1>
        <h1>Headline 1 at level 1</h1>
        <p>content</p>
        <level2>
            <h2>Headline 2 at level 2</h2>
            <p>content</p>
            <level3>
                <h3>Headline 3 at level 3</h3>
                <p>content</p>
            </level3>
            <level3>
                <h3>Headline 4 at level 3</h3>
                <p>content</p>
            </level3>
        </level2>
        <level2>
            <h2>Headline 5 at level 2</h2>
            <p>content</p>
            <level3>
                <h3>Headline 6 at level 3</h3>
                <p>content</p>
            </level3>
            <level3>
                <h3>Headline 7 at level 3</h3>
                <p>content</p>
            </level3>
        </level2>
    </level1>
    <level1>
        <h1>Headline 8 at level 1</h1>
        <p>content</p>
        <level2>
            <h2>Headline 9 at level 2</h2>
            <p>content</p>
        </level2>
    </level1>
</book>

output something like this:

<book>
    <chunk>
        <level1>
            <h1>Headline 1 at level 1</h1>
            <p>content</p>
        </level1>
    </chunk>
    <chunk>
        <level1>
            <level2>
                <h2>Headline 2 at level 2</h2>
                <p>content</p>
            </level2>
        </level1>
    </chunk>
    <chunk>
        <level1>
            <level2>
                <level3>
                    <h3>Headline 3 at level 3</h3>
                    <p>content</p>
                </level3>
            </level2>
        </level1>
    </chunk>
    <chunk>
        <level1>
            <level2>
                <level3>
                    <h3>Headline 4 at level 3</h3>
                    <p>content</p>
                </level3>
            </level2>
        </level1>
    </chunk>
    <chunk>
        <level1>
            <level2>
                <h2>Headline 5 at level 2</h2>
                <p>content</p>
            </level2>
        </level1>
    </chunk>
    <chunk>
        <level1>
            <level2>
                <level3>
                    <h3>Headline 6 at level 3</h3>
                    <p>content</p>
                </level3>
            </level2>
        </level1>
    </chunk>
    <chunk>
        <level1>
            <level2>
                <level3>
                    <h3>Headline 7 at level 3</h3>
                    <p>content</p>
                </level3>
            </level2>
        </level1>
    </chunk>
    <chunk>
        <level1>
            <h1>Headline 8 at level 1</h1>
            <p>content</p>
        </level1>
    </chunk>
    <chunk>
        <level1>
            <level2>
                <h2>Headline 9 at level 2</h2>
                <p>content</p>
            </level2>
        </level1>
    </chunk>
</book>

We could add classes to the <chunk>s for setting the -obfl-keep as well if that helps with writing volume breaking rules.

@bertfrees
Copy link
Collaborator

bertfrees commented Apr 18, 2018

That could help, although I think I would skip the chunk wrappers. And note that how exactly you need to split depends on how you want the volume breaking to be. Also you need to make sure this new structure is still compatible with the other CSS rules, and you should be aware of the chunking when developing the CSS style sheet.

I was more thinking in the direction of creating the new structure in the CSS to OBFL step, based on the volume-break-before and volume-break-after properties, including support for avoid and prefer (and there could possibly be gradations of prefer). I don't know if this would take us too far. Anyway, this has been the intention all along. The volume-break-inside: -obfl-keep(...) was just meant as a temporary solution. It was the easiest way to implement it because it maps directly to a OBFL feature.

@josteinaj
Copy link
Member Author

josteinaj commented Apr 18, 2018

I discussed a bit with @KariRudjord.

@bertfrees what do you think about the following set of volume breaking rules:

  • always start a new volume at class="part" / epub:type="part", except for the first one (like we do already)
  • don't end a volume with a headline (orphaned headline)
  • use <levelN> / <section> as break opportunities, with equal weight regardless of depth so that volumes are filled up as much as possible
  • one last less important rule:
    • try keeping the first subheadline in the same volume as its main headline (first h2 in same volume as its h1, first h3 in the same volume as its h2 etc. This probably means that <levelN>s / <section>s without a preceding sibling <levelN> / <section> should not be used as a break opportunity)
    • if the first subheadline in a section can't be put in the same volume as its main headline; try to avoid splitting volumes right before the headline (i.e. -obfl-keep-with-previous-sheets: 1 ? This might not be necessary though if the section is not a suggested break point in the first place)

@bertfrees
Copy link
Collaborator

Yes makes sense. This is the same as my last example, right?

But I don't understand your last point. Where would then be an appropriate split point?

@josteinaj
Copy link
Member Author

josteinaj commented Apr 18, 2018

Yes, it's pretty much like your last example (maybe exactly the same, not quite sure).

The idea behind the last point is that the switch from a chapter to a subchapter should not happen at a volume break. Breaking between two chapters (or two subchapters) are fine though. This is to avoid confusion as to which level depth you're reading.

@bertfrees
Copy link
Collaborator

But first that you say that "the first subheadline in a section can't be put in the same volume as its main headline". If you don't want to split right before the subheadline, then where would be better to split?

@josteinaj
Copy link
Member Author

h1
content
content
--- avoid breaking here (first sub-section)
h2
--- avoid breaking here (orphaned headline)
content
content
--- suggested break point (sub-section, not first)
h2
content
content

Anywhere between the h1 and the last h2, except the two "avoid"-places, would be appropriate split points if it's necessary to split between there.

@bertfrees
Copy link
Collaborator

OK. We can do that, but it would indeed have to be done via keep-with-previous-sheets.

However, is this really what you want? I think as a reader I would rather get this:

h1
content
content
content
content
---
h2

(break exactly before h2) than this:

h1
content
content
content
---
content
h2

(the last sheet before h2 missing). We might have discussed this before but don't remember what the conclusion was.

@josteinaj
Copy link
Member Author

My concern was that it would be unclear that the new volume started on a deeper level. But after discussing a bit we agree with you. Breaking right before h2 would be ok.

bertfrees added a commit that referenced this issue Jun 1, 2018
Note that this fixes the test but does not make EPUB behave exactly
like DTBook. It is difficult to achieve exactly the same behavior
because the document structure of EPUB is different from DTBook. As I
explained in #119 (comment)
the document structure is important for volume breaking.
@KariRudjord
Copy link

OK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants