Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange linebreaks in the speech #50

Open
MansMeg opened this issue Oct 22, 2024 · 5 comments
Open

Strange linebreaks in the speech #50

MansMeg opened this issue Oct 22, 2024 · 5 comments
Labels
error Errors identified in the data

Comments

@MansMeg
Copy link
Contributor

MansMeg commented Oct 22, 2024

I was looking at this speech. It had really strange line breaks in the middle of the speech when using the swedeb interface.

{"id":"prot-1909--ak--024_026","gender":"Man","party":"S","year":1909,"speaker":"Karl Starbäck"}

Start of speech:

Herr Starbäck:
Herr talman! Jag hade verkligen icke tänkt yttra mig i den här frågan, ty jag hade trott, att det skulle bli ett meningsutbyte mellan juristerna här i kammaren om hvilken form för ändring af lagen eller hvilken form för. 

We should probably try to fix similar issues in multiple documents.

@MansMeg MansMeg added the error Errors identified in the data label Oct 22, 2024
@BobBorges
Copy link
Contributor

When reporting an error (in general, but especially from SWEDEB) we should get the ID attribute and/or line number of the xml element in question. It will make it easier to locate the problem example.

@MansMeg
Copy link
Contributor Author

MansMeg commented Oct 24, 2024

I agree! Ping @fredrik1984

@BobBorges
Copy link
Contributor

Do we know how SWEDEB handles line breaks? It's possibly because of this:

image

It's no problem in terms of TEI, but it also makes the thing difficult to render nicely. I think one of @ninpnin has talked about how to join these in a reasonable way. (right?)

@MansMeg
Copy link
Contributor Author

MansMeg commented Oct 25, 2024

Yes. I think this would be the reason. Ie its an example of segmentation error.

@fredrik1984
Copy link
Contributor

Yes, when we in the Swerik project have implemented an ID for each speech, then SweDeb should use that (as well). I put this issue among the other segmentation-oriented issues in the backlog.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
error Errors identified in the data
Projects
None yet
Development

No branches or pull requests

3 participants