Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exported mbox files should use LF instead of CRLF as EOL by default #607

Open
ziqin opened this issue Jul 21, 2024 · 16 comments
Open

Exported mbox files should use LF instead of CRLF as EOL by default #607

ziqin opened this issue Jul 21, 2024 · 16 comments

Comments

@ziqin
Copy link

ziqin commented Jul 21, 2024

The "default" mbox Database Format specified in RFC 4155, Appendix A requires that

the canonical mbox database MUST use a single Line-Feed character (0x0A)
as the end-of-line sequence, and MUST NOT use a Carriage-Return/Line-Feed
pair (NB: ... ). This usage represents the most common historical
representation of the mbox database format, and allows for the least
amount of conversion.

Although RFC 4155 is not a formal standard for mbox, I believe it's nevertheless a good idea to adhere to the "default" format defined in RFC 4155 by default to improve interoperability, given that IETNG already decided to follow its requirement for From_ separator (#455).

Current Status

It is observed that the current IETNG implementation exports an mbox file mixing LF and CRLF: the first 3 lines (the From_ separator line, X-Mozilla-Status and X-Mozilla-Status2 header lines) terminates with LF, while the remaining lines terminates with CRLF.

An mbox database using CRLF as EOL causes problem for MUA like mutt, which displays unwanted ^M at the end of the Date and Subject headers.

Expected Behavior

Every line in the exported mbox database ends with LF instead of CRLF.

@cleidigh
Copy link
Collaborator

@ziqin
I will look at this.
I agree we should follow the RFC.
@cleidigh

@cleidigh
Copy link
Collaborator

@ziqin
Sorry, been on a long hiatus.
I have this fixin beta b6.
If you can check with mutt that would be great.
Grab here:

@cleidigh

@ziqin
Copy link
Author

ziqin commented Nov 16, 2024

The fix looks straightforward, but the exported mbox file still mixes LF and CRLF. See highlighted 0d0a in following screenshot.

Export mbox: Mixing LF and CRLF

I only tested with the 14.1.3-b6 version downloaded from https://github.com/thunderbird/import-export-tools-ng/blob/v14.1.3/xpi/beta/import-export-tools-ng-14.1.3-b6-tb.xpi. I haven't tried to build from the source. I'm not sure if the problem is related to packaging.

@cleidigh
Copy link
Collaborator

@ziqin
With multiple exports and checking with a good hex editor, I can't find any instances of the CR character.
Two things : Did you try a restart of Thunderbird just to make sure there are no cache issues?
If still having the issue could you create a folder with just two messages that on export has CR characters. Then send the original mbox file from Thunderbird, not the exported mbox. I will see if I get the same results.
Send here":
[email protected]
@cleidigh

@ziqin
Copy link
Author

ziqin commented Nov 18, 2024

I've tried to restart Thunderbird and reinstall the plugin, but the problem still exists.

FYI: every message exported with IETNG from my inbox has mixed LF and CRLF.

I've sent you an email with an original mbox file from Thunderbird. Please also check the Trash folder if you cannot find it in inbox.

FYI: I'm not familiar with Thunderbird's mbox format, but it seems that there are 3 messages in the original file although I only moved 2 messages to the folder one by one. I have resynced from the IMAP server and sent you a new email with a new original file.

@cleidigh
Copy link
Collaborator

@ziqin
Thanks for your help and patience on this. I received your mbox file and will experiment with it today, thanks.
@cleidigh

@cleidigh
Copy link
Collaborator

@ziqin
I exported your file on my Windows system. All CRLF sequences were converted and the file had no CR characters. Rather difficult to explain. I am going to try on my Linux Mint VM
@cleidigh

@cleidigh
Copy link
Collaborator

@ziqin
The Linux export worked fine as well. I'm a bit stumped.
I made a b7 with some debug showing the matchAll counts for
\r\n
\n
\r
Then I do a final \r replaceAll if \r count not zero.
I also changed the original replace with replaceAll. This should be redundant with the global flag, but just to check.

Clear your debug console then do the export. Capture and send me the output to my test email.

https://github.com/thunderbird/import-export-tools-ng/blob/v14.1.3/xpi/beta/import-export-tools-ng-14.1.3-b7-tb.xpi

@cleidigh

@ziqin
Copy link
Author

ziqin commented Nov 19, 2024 via email

@cleidigh
Copy link
Collaborator

@ziqin
No problem.
@cleidigh

@cleidigh
Copy link
Collaborator

cleidigh commented Dec 4, 2024

@ziqin
Have you had a chance to check b7 or b8?
I am getting close to release.

@cleidigh

@cleidigh
Copy link
Collaborator

cleidigh commented Dec 6, 2024

@ziqin
This is working for me, all LF endings.
Can you please check and verify?
@cleidigh

@ziqin
Copy link
Author

ziqin commented Dec 7, 2024

Sorry for the late response.

Today I just realize that we were probably exporting mbox files in different ways when testing the CR problem. With b7/b8/b9, when I export a message or several selected messages from the context menu of the message list, the exported mbox file still contains CR characters, but if I export a whole folder, the mbox file exported will have no CR character.

IETNG export a single message as mbox

@cleidigh, could you please try to export in the first way? Let's see if you could reproduce the problem.


BTW, I cannot find related output from the debug console. I'm unfamiliar with Thunderbird extension development and maybe I was playing with the Developer Tool in a wrong way.

@cleidigh
Copy link
Collaborator

cleidigh commented Dec 7, 2024

@ziqin
Ok now I understand why we had different results. I was totally focused on the folder export which is my new code. The selected message export to mbox is still old code. I added the conversion there as well in b10.

@cleidigh

@ziqin
Copy link
Author

ziqin commented Dec 8, 2024

The selected message export works well on my machine with b10.

Thank you very much!

@cleidigh
Copy link
Collaborator

cleidigh commented Dec 8, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants