-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-44945 Implement code to fill in missing columns #44
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you expect to only need to use fixup this once? (so no unit tests?)
Can you note on how you use/used this in TTS/summit to fix the missing data, and how you know it works/didn't work?
approved with these questions answered
# Bail out because we don't have enough info. | ||
return dict() | ||
|
||
if exposure_rec["s_ra"] == 0.0 and exposure_rec["s_dec"] == 0.0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did I see conversation about this somewhere on why we have these? could that info be added to the comment? Maybe I hallucinated it though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recall mentioning that there were zeroed out RA/Dec present in the database, but not how they got there. Really these values ought to be replaced with NULLs IMO.
# Use K&Y 1989 model to compute airmass from altitude | ||
airmass = None | ||
if altitude >= 0 and altitude <= 90: | ||
airmass = 1 / (sin(radians(altitude)) + 0.50572 * (altitude + 6.07995) ** -1.6364) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please put these magic numbers in some constants, if there are names for them?
"zenith_distance_end": zenith_distance_end, | ||
"airmass": airmass, | ||
} | ||
for k, v in calculations.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While you're editing, can you expand these to useful names, or at least the full words?
|
||
if exposure_rec["obs_start_mjd"] is not None and exposure_rec["obs_end_mjd"] is not None: | ||
self.logger.debug("Inferring column: dark_time") | ||
return {"dark_time": 86400 * (exposure_rec["obs_end_mjd"] - exposure_rec["obs_start_mjd"])} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you define this 86400?
dictionary in one go. Call this to apply the fixup methods. To add | ||
more fixup methods, just add them to this class, make sure the | ||
method name starts with "fix_", and make sure the signature matches: | ||
def fix_whatever(self, exposure_rec: dict[str, Any]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be clearer to add the -> dict[str, Any]:
to the end of this?
|
||
|
||
class Fixer: | ||
"""A collection of functions that try to patch up missing data columns.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned about the extent of the coding here. If this information is available from the ObservationInfo produced by the metadata translator but we previously used a header value, we should switch to the ObservationInfo. If the value is in the ObservationInfo but is incorrect, we should fix the translator. If the information is not available in the ObservationInfo, but is available elsewhere in obs_lsst
(e.g. filter to band mappings), we should try to use that. We should only compute something here if it's not computed anywhere else (which could be because we remove the other place it's computed).
Otherwise, keeping all of these in sync will be a maintenance nightmare.
# Can't infer band from physical_filter | ||
return dict() | ||
|
||
for band in "ugrizy": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems better to match r"SDSS([ugrizy])"
or r"([ugrizy])_\d+"
and then extract the group. Then no loop is needed.
This may have been my fault. I like having these when people notify me (either in Jira comments or Slack messages or emails) that they're requesting a review from me so that I can quickly judge the scope of the change. But I agree that they don't belong in PR titles. |
Co-authored-by: Kian-Tat Lim <[email protected]>
Co-authored-by: Kian-Tat Lim <[email protected]>
This code relies on existing data already being loaded into the database (especially s_ra, s_dec, *_mjd, and physical_filter) to fill in missing data.