-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move as much code as possible from book-to-pef to pipeline-modules #225
Comments
This is probably too complicated. However what would be doable is to filter script options from a "base script XML" at build time (like in the Obviously you'll still need your own custom script, but it will require less maintenance because we can put a lot more options and style sheet modules in the Pipeline, available to everyone. Your custom script will only include some inputs, outputs and standard options (not associated with a style sheet, or specific to NLB), one or more parameter ports, and the logic to invoke the actual conversion (load, convert, store). The other big advantage of this is that I can remove options from the generic scripts that currently say "Not implemented". I would also make the "default.scss" style sheet, that is currently always included, public and free to include or not by custom scripts. It will be a good idea to remove the "stylesheet" option from custom scripts because the available options will be determined at build time and therefore related to a fixed style sheet. So all formatting options that you want to present to the user should be in this fixed style sheet. I think I would wrap the content of style sheet modules inside a big $chose-between-module-x-and-y: x !default;
$enable-module-x: $chose-between-module-x-and-y == x;
@import "http://www.daisy.org/pipeline/modules/braille/html-to-pef/css/module-x.scss";
$enable-module-y: $chose-between-module-x-and-y == y;
@import "http://www.daisy.org/pipeline/modules/braille/html-to-pef/css/module-y.scss"; To avoid that you would have to override option documentation, I would remove the parts that say "includes the following rule by default ...", because you're likely to override some of these default (too simple) rules in your custom style sheet, and so the documentation would be wrong. Obviously the option documentation should also be internationalized. |
Sounds good. I think it would be nice to have a stylesheet option still though, in case we want to try out some new rules, or override some existing rules (for testing purposes or as a one-off production). |
OK sure, that's still possible, as long as the user is aware that the options belong to the default style sheet. |
…eter port For steps that "px:extends" another step, the "px:options" attribute copies all the options in the specified namespace, and connects the options with the parameter port. See nlbdev/pipeline#225 (comment)
CC @kalaspuffar |
For automated production (which we do for most of our books), I think it's only hyphenation that is remaining. For manual production, we might need translations and script options, but we'll see. We can do some testing when hyphenation is done. |
@josteinaj @kalaspuffar I've updated the list. |
Hi @bertfrees and @kalaspuffar. Just checking in on the status here. Any progress? Any blocking issues? I think we agreed in our last meeting that the only thing needed for us initially to be able to start testing is that Norwegian hyphenation needs to be moved/migrated/implemented in the main PIP version. |
I am waiting for a Norwegian hyphenation table to become available somewhere, preferrably on Maven (because permanent and versioned), so that I can import it in Pipeline. That is how I see my responsibility in this. If there is no progress, I'm also willing to manually build and copy the table from https://github.com/nlbdev/spell-no into Pipeline, so that we at least have an initial version that you can test. But I think this is not a good solution for the long run because it does not allow easy updates to newer versions (or at least I don't want to be the one to do the updates). I'm also willing to take on the job of writing a script to build and deploy the hyphenation table (based on https://github.com/nlbdev/spell-no). But I'd still need your help to set it up (e.g. the Maven groupId, assuming we'll use Maven). It only really makes sense to go this path though if somebody is actually going to maintain the hyphenation table. If it is not going to be updated, it does not make sense to create a dedicated project for it. And I'm kind of worried that no one might take up the responsibility to maintain it (given the time it takes to just publish an initial version). |
Ok, thanks. If it is easy to maintain the table for non-technical people, then I think we could do that part of it. The technical setup with Maven and other tooling, I don't think we have the time to do much work with internally at NLB. But I suppose that once it's set up, it's not much work to maintain? I'll try contacting Språkbanken ("the language bank") and Nasjonalbiblioteket (the national library) and see if there's someone interested in helping maintain this. |
@bertfrees I think it would be useful if you could build and copy the table from spell-no into Pipeline, yes. And for future updates of the hyphenation table, we would need to do it using a separate project. The current version of the hyphenation table is good enough for us to use in production (after all, we already use that version today). |
I can set up the Maven repository when needed. I would like to do it maybe solely in Github using releases, to keep things more simple, but either way, I can set it up when needed. |
As you like. As long as it is easy to fetch updates. Like, change some version number in some pom file. Github Packages might be another option?
Good idea. |
Github Packages makes sense. So:
seems right? |
Yes, seems right. |
@josteinaj I created a PR: nlbdev/spell-no#1 |
Thanks! I will have a look as soon as possible. |
My contribution to solving this issue is now more or less done. Most things that could be done on the Pipeline side are done. The only thing that is left is to add support for internationalization. But that is a big change so we might want to prioritize other things over it. Regarding this other boxes that have not been checked off yet:
I may also port some more CSS code in the future but that has lower priority. |
Great, thanks 👍. I have the hyphenation issue high on my list, but haven't gotten to it yet, sorry. I'll leave this issue open until we have started testing hyphenation in the main pipeline branch. |
Hyphenation
Produce the hyphenation files in a separate project, deploy it somewhere (e.g. in Maven Central in a ZIP) and import it in libhyphen-utils.
Some useful work has been done in this repository: https://github.com/nlbdev/spell-no (the "norsk/patterns" subdirectory). It includes a build script for the patterns file and a UI for viewing the input words (that are used to build the patterns file) and hyphenations.
In progress: see Pack hyphenation table in ZIP and publish to Maven spell-no#1.
Script options
(What would be really nice, but probably out of scope, is if we would be able to analyze a provided style sheet and based on this dynamically compute the relevant options and present only these to the user.)Translator features
Recognizing the CSS property
text-transform: uncontracted
: could be made into a generic feature: done in Support "text-transform:uncontracted" in the generic translator daisy/pipeline-modules#9Selecting sub-translators for different languages: should be moved to liblouis-utils (Braille translator should handle multi-language documents daisy/pipeline-mod-braille#196).
Overriding how undefined characters should be handled: could be made into a translator option. Either just via an argument in the constructor of
LiblouisTranslatorImpl
, or it could be configurable via a "query feature". See "dots-for-undefined-char" feature for Liblouis translator daisy/pipeline-mod-braille#206.Handling of
@text-transform strong { system: -nlb-indicators; open: '⠠⠄'; close: '⠠⠄'; }
etc.: can possibly be generalized, although I would much prefer to do this via Liblouis. For this, we need to be able to control the order of the begin/end marks (Web UI handling of em and strong #217 (comment), How to control order of emphasis indicators for multiple emphasis? liblouis/liblouis#922).Marking of URLs and e-mail addresses. This should be done by Liblouis.
Unfortunately Liblouis doesn't have any opcodes to insert indicators to announce and close a computer braille string, but I think it should be possible to accomplish.The default Liblouis table for Norwegian usesbegcomp
andendcomp
(which are undocumented opcodes) to insert indicators to announce and close a computer braille string, but the NLB version of the table for some reason has disabled this and it has been implemented in Java.This is "on hold" because Dawn and Lars are figuring out whether the requirement needs to be updated.
px:nfc
should be moved to liblouis-utils, but not as a separate XProc step. Instead Liblouis tables could contain metadata about which Unicode variants are supported (NFC and/or NFD) (Table metadata to indicate supported Unicode normal forms liblouis/liblouis#923), and liblouis-utils should then do the conversion if needed (Liblouis based translator: do Unicode normalization step when needed daisy/pipeline-mod-braille#197). Ideally this should be handled within Liblouis itfself (glyph vs diaeresis liblouis/liblouis#98). Other related issues: Full support for combining marks #4, Basic support for combining marks #21.Style sheets
The items below are not essential because style sheets are considered just another input for Pipeline. They don't need to be included inside the pipeline-mod-nlb module. With some small modifications, the existing style sheets should still work with the latest version. The benefit of porting CSS code is that it becomes available for everyone.
dt
element with followingdd
elements (this is not something that can be done with plain CSS)The text was updated successfully, but these errors were encountered: