Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

epub 2 txt conversion issues. #47

Open
AJolly opened this issue Dec 13, 2024 · 3 comments
Open

epub 2 txt conversion issues. #47

AJolly opened this issue Dec 13, 2024 · 3 comments

Comments

@AJolly
Copy link

AJolly commented Dec 13, 2024

I've run into issues a few times converting epubs to txt where it will silently fail for parts of it (and then i dont realiize until later when I'm being confused about the book not making sense)

Look at Chapter 33 ( which gets labeled part 34).
Half of the chapter is missing in the txt file.

Ghost in the City try2.zip

@aedocw
Copy link
Owner

aedocw commented Dec 31, 2024

I'm not really sure what's up with this other than just something odd in the epub formatting. My suggestion would be to see if Calibre is able to cleanly export this to text, as a next test. It is unfortunate it's silently missing sections of a book though.

I wonder if this is related to a relatively recent merge that checks for both p and div? Hmm, no I took a look again at that and it would not have been behind this. I think it's got to be something with the formatting of the epub as it works fine with books made be big publishers.

@aedocw
Copy link
Owner

aedocw commented Dec 31, 2024

In the test epub you supplied, I did get output indicating some of the chapters had problems:

Could not find any paragraph tags <p> in "None". Trying with <div>.
Could not find any paragraph tags <p> in "None". Trying with <div>.
Could not find any paragraph tags <p> in "None". Trying with <div>.

I also tried exporting to text with epub2tts, but comparing the output of the two didn't show me anything obvious missing. without knowing specific phrases to search for (ones that were missing), I'm not really sure I can do anything here, sorry.

@AJolly
Copy link
Author

AJolly commented Jan 9, 2025

I'm using Calibres FanFicFare plugin to download from royal road. I'm not quite sure what those error messages you listed mean.

In this particular example, look at line 10485 in the text file. in the txt:


She was back on the roof, hiding in TriggerFingers little shack. He lit it up, not stopping the automatic fire until his mag was empty and he quickly reloaded and started firing again.



# Part 35
By the time we hit the gate we both realized what happened.

In the epub

She was back on the roof, hiding in TriggerFingers little shack. He lit it up, not stopping the automatic fire until his mag was empty and he quickly reloaded and started firing again.



# Part 35
By the time we hit the gate we both realized what happened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants