Datapoint ID requirements #109

jmdkastro · 2015-06-09T10:14:01Z

@snlongmore @keflavich
Quickly checking the latest uploads on really crappy wifi, I have a brief suggestion.

We should include a check/requirement upon submission that the IDs are useful -- the uploaded MarkSwinbank, AlbertoBolatto, and LisaWei datasets have IDs that are simple numbers running from 1 to N. This is asking for trouble (and we should change this!), as any new datasets by one of these authors are bound to cause conflict. Another reason why we shouldn't do this is that it'd be good to be able to find out which galaxy these clouds are from without having to know the particular paper.

Good examples of how to do this right for extragalactic clouds are the ErikRosolowsky and TonyWong datasets, which include the host galaxy tags. For Galactic clouds, AdamGinsburg and DanielWalker provide a good example as they simple list the unique phone numbers.

I don't know how hard it is to check for this, but perhaps we should include an instruction in the upload form.

snlongmore · 2015-06-10T09:18:24Z

@jmdkastro @keflavich
Good point. I agree, it would be good to have a check/requirement and also add a some text on the upload page explaining this and showing some good/bad examples. I’m afraid at the moment I don’t have time to go through the Bolatto, Swinbank and Wei papers and add unique IDs then re-upload.

Note that some observers/simulators do simply label their objects numerically. But as long as the publication information is available for every uploaded data set, if the data are from the same author but a different paper, it will be possible to distinguish between them in the database.

The place where it will currently be difficult to distinguish between them is in the plotting. For example, I am planning to upload two datasets from the same Heyer et al 09 paper — the cores and clouds sample. In the database these will be given IDs of coreN, and cloudN. But once these are in the database, there will be two MarkHeyer entries in the query plot. There are two course of action here.

Only allow a single upload per paper.
edit the database to allow multiple uploads per paper.

I just had a quick think of how 2. would work in practice, and I think it would be a pain to implement. I therefore think it makes sense to go for 1. In which case, we need to have a check on the upload page that will not allow multiple entries from the same paper. If someone does want to to upload something new from a paper already in the database, this will need to be done manually. I suspect this won’t happen very often so will not be too much of a burden on the database administrators.

Do you agree with this approach? If so, I will add a single entry for the Heyer et al 09 paper containing both clouds and cores.

snlongmore · 2015-06-10T09:19:16Z

On a related point...

Regarding the plotting, rather than having the legend label be “FirstnameLastname", wouldn't it be better to have "Lastname + journal paper ID/DOI”? That way you can distinguish between multiple papers from the same first author.

keflavich · 2015-06-10T11:48:01Z

Agreed that FirstnameLastnameADSID or FirstnameLastnameDOI would be better.

jmdkastro · 2015-06-15T13:08:47Z

Even then, there may be more than one galaxy in a single paper. I don't think that observers continue the numbering between different host galaxies but start over. Making sure IDs include this seems like a natural solution.

Why is more than one upload per paper a problem? Harder to check for duplicates? From a user perspective, I think this should be possible. When you already know multiple tables from one paper will be used, adding them at the same time makes sense of course. But this won't always be the case. So we'll indeed need to think about how to handle this. Thanks for flagging.

As for the legend, I agree (it'll take up a lot more space though... perhaps in a small font below the author name?).

keflavich · 2015-06-16T07:10:45Z

Re-reading through this: in order for an object to end up in a publication, it needs some sort of unique identifier. "Galaxy 1" should never show up in an ID list: galaxies have names. Simulated galaxies might have IDs like "Author: Publication: Galaxy 1", but again these should be unique (modulo timestep). So, I agree with @jmdkastro's original post.

@snlongmore, are there some examples of numerical identifiers for real astronomical objects?

keflavich · 2015-06-16T07:11:59Z

@snlongmore I think multiple uploads per paper are needed because the underlying catalog - the thing that goes to uploads/ - may not be uniform.

keflavich added this to the First draft of paper milestone Jun 16, 2015

keflavich added the Metadata label Jun 16, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Datapoint ID requirements #109

Datapoint ID requirements #109

jmdkastro commented Jun 9, 2015

snlongmore commented Jun 10, 2015

snlongmore commented Jun 10, 2015

keflavich commented Jun 10, 2015

jmdkastro commented Jun 15, 2015

keflavich commented Jun 16, 2015

keflavich commented Jun 16, 2015

Datapoint ID requirements #109

Datapoint ID requirements #109

Comments

jmdkastro commented Jun 9, 2015

snlongmore commented Jun 10, 2015

snlongmore commented Jun 10, 2015

keflavich commented Jun 10, 2015

jmdkastro commented Jun 15, 2015

keflavich commented Jun 16, 2015

keflavich commented Jun 16, 2015