-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Datapoint ID requirements #109
Comments
@jmdkastro @keflavich Note that some observers/simulators do simply label their objects numerically. But as long as the publication information is available for every uploaded data set, if the data are from the same author but a different paper, it will be possible to distinguish between them in the database. The place where it will currently be difficult to distinguish between them is in the plotting. For example, I am planning to upload two datasets from the same Heyer et al 09 paper — the cores and clouds sample. In the database these will be given IDs of coreN, and cloudN. But once these are in the database, there will be two MarkHeyer entries in the query plot. There are two course of action here.
I just had a quick think of how 2. would work in practice, and I think it would be a pain to implement. I therefore think it makes sense to go for 1. In which case, we need to have a check on the upload page that will not allow multiple entries from the same paper. If someone does want to to upload something new from a paper already in the database, this will need to be done manually. I suspect this won’t happen very often so will not be too much of a burden on the database administrators. Do you agree with this approach? If so, I will add a single entry for the Heyer et al 09 paper containing both clouds and cores. |
On a related point... Regarding the plotting, rather than having the legend label be “FirstnameLastname", wouldn't it be better to have "Lastname + journal paper ID/DOI”? That way you can distinguish between multiple papers from the same first author. |
Agreed that FirstnameLastnameADSID or FirstnameLastnameDOI would be better. |
Even then, there may be more than one galaxy in a single paper. I don't think that observers continue the numbering between different host galaxies but start over. Making sure IDs include this seems like a natural solution. Why is more than one upload per paper a problem? Harder to check for duplicates? From a user perspective, I think this should be possible. When you already know multiple tables from one paper will be used, adding them at the same time makes sense of course. But this won't always be the case. So we'll indeed need to think about how to handle this. Thanks for flagging. As for the legend, I agree (it'll take up a lot more space though... perhaps in a small font below the author name?). |
Re-reading through this: in order for an object to end up in a publication, it needs some sort of unique identifier. "Galaxy 1" should never show up in an ID list: galaxies have names. Simulated galaxies might have IDs like "Author: Publication: Galaxy 1", but again these should be unique (modulo timestep). So, I agree with @jmdkastro's original post. @snlongmore, are there some examples of numerical identifiers for real astronomical objects? |
@snlongmore I think multiple uploads per paper are needed because the underlying catalog - the thing that goes to |
@snlongmore @keflavich
Quickly checking the latest uploads on really crappy wifi, I have a brief suggestion.
We should include a check/requirement upon submission that the IDs are useful -- the uploaded MarkSwinbank, AlbertoBolatto, and LisaWei datasets have IDs that are simple numbers running from 1 to N. This is asking for trouble (and we should change this!), as any new datasets by one of these authors are bound to cause conflict. Another reason why we shouldn't do this is that it'd be good to be able to find out which galaxy these clouds are from without having to know the particular paper.
Good examples of how to do this right for extragalactic clouds are the ErikRosolowsky and TonyWong datasets, which include the host galaxy tags. For Galactic clouds, AdamGinsburg and DanielWalker provide a good example as they simple list the unique phone numbers.
I don't know how hard it is to check for this, but perhaps we should include an instruction in the upload form.
The text was updated successfully, but these errors were encountered: