Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Files failing ingest due to incomplete headers #144

Open
fergusL opened this issue May 14, 2021 · 3 comments
Open

Files failing ingest due to incomplete headers #144

fergusL opened this issue May 14, 2021 · 3 comments
Labels
enhancement New feature or request

Comments

@fergusL
Copy link
Contributor

fergusL commented May 14, 2021

atm the moment I have the raw screener/ingestor set up to only attempt to reprocess files that have failed screening less than three times. After this they stop getting re-queued.

However some files never even make it past ingestion as they have a missing header key (usually it is the "FIELD" key) which trips up the header parsing utility.

I think we could either pass these files off to a junk table or set up the header parsing to assign a default value to certain keys, e.g. a file missing a "FIELD" key would just get assigned hdr["FIELD"] = "UNKNOWN".

At the moment these files will just be perpetually cycling through the screener

@danjampro
Copy link
Contributor

  • Why three times? I think if they fail once that should be enough

  • Files with a missing FIELD key are basically useless to us. I agree that an ingest_failed collection may be a good approach. maybe we could also store the error string or even traceback in this table.

@fergusL
Copy link
Contributor Author

fergusL commented May 14, 2021

  • 3 times mostly for the wcs solve, if the solve times out it should store which index files it has compared with in the header and the next time it tries to solve for wcs it should pick up from where the previous run left off. We could just set a really long timeout but I think this way it will quickly try and solve each file and if it times out, it will come back to those files once it has processed all the new un-ingested files.
  • a lot of the files missing the "FIELD" key are actually just darks, so we don't actually need the FIELD key but the header parser raises an error as we have set it as a required column. We could just delete these old darks as I assume they were taken "manually" rather than via pocs. However atm the screener will just perpetually attempt to ingest these files, so we will be perpetually wasting a few cpu cores on these files. There are also some that fail because of "invalid datetime format" and I think they also just fail ingestion so end up in ingestion purgatory. For now I will just punt these files into a junk table rather than fiddle with the header parsing

@danjampro danjampro added the enhancement New feature or request label Jun 4, 2021
@danjampro
Copy link
Contributor

The current implementation of the FileIngestor keeps track of which files did not succeed an avoids re-queuing them for processing. However, it does not currently remember about these files when the service is restarted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants