Skip to content
This repository has been archived by the owner on Jun 15, 2024. It is now read-only.

Sigma-Aldrich: Group paragraph elements or subsections together into a single HierarchyNode #5

Open
GreenCappuccino opened this issue Oct 19, 2022 · 1 comment
Assignees
Labels
enhancement New feature or request supplier/sigma

Comments

@GreenCappuccino
Copy link
Member

Currently, due to setting minimum line spacing, certain real paragraphs are split up. For example:

" ... "
        {
          "title": "OTHER_OTHER",
          "items": [],
          "raw_title": "The branding on the header and/or footer of this document may temporarily not visually \n"
        },
        {
          "title": "OTHER_OTHER",
          "items": [],
          "raw_title": "match the product purchased as we transition our branding. However, all of the \n"
        },
        {
          "title": "OTHER_OTHER",
          "items": [],
          "raw_title": "information in the document regarding the product remains unchanged and matches the \n"
        },
        {
          "title": "OTHER_OTHER",
          "items": [],
          "raw_title": "product ordered. For further information please contact [email protected]. \n"
        },
" ... "

Should be processed into one subsection with a TEXT item inside:

" ... "
        {
          "title": "OTHER_OTHER",
          "items": [
            {
              "type": "TEXT",
              "data": "The branding on the header and/or footer of this document may temporarily not visually match the product purchased as we transition our branding. However, all of the information in the document regarding the product remains unchanged and matches the product ordered. For further information please contact [email protected]."
            },
          ]
        },
" ... "
@GreenCappuccino GreenCappuccino added the enhancement New feature or request label Oct 19, 2022
@GreenCappuccino GreenCappuccino self-assigned this Oct 19, 2022
@GreenCappuccino GreenCappuccino changed the title Group paragraph elements or subsections together into a single HierarchyNode Sigma-Aldrich: Group paragraph elements or subsections together into a single HierarchyNode Oct 19, 2022
@GreenCappuccino
Copy link
Member Author

I'm thinking of possibly solving this problem by avoiding it altogether. There's somewhat of a pattern for when certain subsections are marked with boldface or not.
Maybe if we can detect start and ends of subsections through those methods, we could just treat all lines under that group as a single paragraph. Would require some modification of the initial hierarchy generator though, so I'm likely going to look at solving other problems for now.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request supplier/sigma
Projects
None yet
Development

No branches or pull requests

1 participant