Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add heuristics for finding incorrect dates in the protocols. #51

Open
4 tasks
BobBorges opened this issue Oct 24, 2024 · 4 comments
Open
4 tasks

Add heuristics for finding incorrect dates in the protocols. #51

BobBorges opened this issue Oct 24, 2024 · 4 comments

Comments

@BobBorges
Copy link
Contributor

BobBorges commented Oct 24, 2024

At the library, they want to have exact dates for all the protocols. We know there are some incorrect dates in the protocols due to:

  • cover sheets where the date for several protocols are written
  • running headers of protocol compilations
  • protocols that start/end on the same physical page
  • accidentally scraping references to dates that aren't a document date

We need to work on eliminating them by adding heuristic tests to the workflow

  • Dates shouldn't overlap from one protocol to the next
  • until 1874 date is part of filename
  • date scraping should ignore dates associated with sjükanmalan läkarintyg / läkarbytyg / sjükbetyg + signature of the läkare who signed off on the sickness
  • date spans are < 1 week

What else?

@ninpnin

This comment was marked as duplicate.

@BobBorges

This comment was marked as duplicate.

@BobBorges
Copy link
Contributor Author

BobBorges commented Nov 5, 2024

sjuk betyg example
image

@BobBorges
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants