Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Message formatting is lost when splitting large messages #25

Open
zzzoom opened this issue Jun 1, 2023 · 1 comment
Open

Message formatting is lost when splitting large messages #25

zzzoom opened this issue Jun 1, 2023 · 1 comment

Comments

@zzzoom
Copy link

zzzoom commented Jun 1, 2023

When importing a large formatted message like a code block, the message splitter doesn't terminate and resume formatting on each resulting message so the format is lost, i.e. something like:

long code block 1/2

long code block 2/2

Gets converted into:

```
long code block 1/2


long code block 2/2
```

@pR0Ps
Copy link
Owner

pR0Ps commented Apr 25, 2024

I think handling this specific case is not too hard (use a regex to match ``` blocks, if it intersects with a message boundary then upload the block as an attachment or something), but the more general problem of "how to split text while preserving formatting" is a bit more complex.

For example, if the text **this is some *strong* text** is split in the middle of strong, it's the same kind of issue (it needs to be split like **this is some *str*** | ***ong* text**). Another example is links - they can't be split in the middle. Like parsing HTML, this isn't something that can be done generically with a regex. Properly handling these cases this will depend on understanding the actual syntax of the text and what types of entities can be split. Slack does provide structured information in the export using its concept of "blocks" so the proper way to implement this will most likely involve parsing all that. It could then be used to intelligently pick a split point to avoid breaking non-splittable entities, inserting trailing syntax markers before the split point, and starting syntax markers after the split point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants