Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better table header handling #14

Open
creisle opened this issue Mar 6, 2023 · 3 comments
Open

Better table header handling #14

creisle opened this issue Mar 6, 2023 · 3 comments
Labels
enhancement New feature or request

Comments

@creisle
Copy link
Collaborator

creisle commented Mar 6, 2023

I've been using the lineraized tables but one thing I've noticed is that when we have something complex like a multi-level header just linearizing makes the number of cells not always match up. so something like this

image

example article used: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2873663/

Currently gets turned into

p53 MUTATION FUNCTIONALa STATUS IARC DATABASEb FEATURESc SOMATIC GERMLINE FAMILIES TOTAL BREAST

And we lose a lot of meaning, not to mention it becomes impossible to match these up properly to the cells text from the body of the table. (see below)

p53 MUTATION FUNCTIONALa STATUS IARC DATABASEb FEATURESc SOMATIC GERMLINE FAMILIES TOTAL BREAST
T125R ALTERED 2 1 0

So i'd like to try something more complex where we simplfiy the header into a single row before we linearize but it would require making the text differ slightly from the original by repeating some words which I am not sure on. The end results would look like this

p53 MUTATION FUNCTIONALa STATUS IARC DATABASEb SOMATIC TOTAL IARC DATABASEb SOMATIC BREAST IARC DATABASEb GERMLINE FAMILIES FEATURESc
T125R ALTERED 2 1 0

@jakelever what do you think? I've already been implementing this for my own purposes but would be happy to put up a PR if you like the idea

@creisle creisle changed the title Better table handling Better table header handling Mar 6, 2023
@jakelever
Copy link
Owner

Sure. Go for it. I don't really use the table information that much, but this does remind me that I need to make sure that CIViCmine is properly aware of the tables. Remind me, there's some metadata on these passages that indicate that they are tables, right?

@creisle creisle added the enhancement New feature or request label Mar 8, 2023
creisle added a commit that referenced this issue Mar 8, 2023
@creisle
Copy link
Collaborator Author

creisle commented Mar 8, 2023

Sure. Go for it. I don't really use the table information that much, but this does remind me that I need to make sure that CIViCmine is properly aware of the tables. Remind me, there's some metadata on these passages that indicate that they are tables, right?

yup, the xml_path infon can be used for that

creisle added a commit that referenced this issue Mar 8, 2023
@jakelever
Copy link
Owner

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants