Files failing ingest due to incomplete headers #144

fergusL · 2021-05-14T01:05:49Z

atm the moment I have the raw screener/ingestor set up to only attempt to reprocess files that have failed screening less than three times. After this they stop getting re-queued.

However some files never even make it past ingestion as they have a missing header key (usually it is the "FIELD" key) which trips up the header parsing utility.

I think we could either pass these files off to a junk table or set up the header parsing to assign a default value to certain keys, e.g. a file missing a "FIELD" key would just get assigned hdr["FIELD"] = "UNKNOWN".

At the moment these files will just be perpetually cycling through the screener

The text was updated successfully, but these errors were encountered:

danjampro · 2021-05-14T01:13:57Z

Why three times? I think if they fail once that should be enough
Files with a missing FIELD key are basically useless to us. I agree that an ingest_failed collection may be a good approach. maybe we could also store the error string or even traceback in this table.

fergusL · 2021-05-14T05:12:29Z

3 times mostly for the wcs solve, if the solve times out it should store which index files it has compared with in the header and the next time it tries to solve for wcs it should pick up from where the previous run left off. We could just set a really long timeout but I think this way it will quickly try and solve each file and if it times out, it will come back to those files once it has processed all the new un-ingested files.
a lot of the files missing the "FIELD" key are actually just darks, so we don't actually need the FIELD key but the header parser raises an error as we have set it as a required column. We could just delete these old darks as I assume they were taken "manually" rather than via pocs. However atm the screener will just perpetually attempt to ingest these files, so we will be perpetually wasting a few cpu cores on these files. There are also some that fail because of "invalid datetime format" and I think they also just fail ingestion so end up in ingestion purgatory. For now I will just punt these files into a junk table rather than fiddle with the header parsing

danjampro · 2021-08-12T01:20:33Z

The current implementation of the FileIngestor keeps track of which files did not succeed an avoids re-queuing them for processing. However, it does not currently remember about these files when the service is restarted.

danjampro added the enhancement New feature or request label Jun 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files failing ingest due to incomplete headers #144

Files failing ingest due to incomplete headers #144

fergusL commented May 14, 2021 •

edited

Loading

danjampro commented May 14, 2021

fergusL commented May 14, 2021 •

edited

Loading

danjampro commented Aug 12, 2021

Files failing ingest due to incomplete headers #144

Files failing ingest due to incomplete headers #144

Comments

fergusL commented May 14, 2021 • edited Loading

danjampro commented May 14, 2021

fergusL commented May 14, 2021 • edited Loading

danjampro commented Aug 12, 2021

fergusL commented May 14, 2021 •

edited

Loading

fergusL commented May 14, 2021 •

edited

Loading