Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datapoint ID requirements #109

Open
jmdkastro opened this issue Jun 9, 2015 · 6 comments
Open

Datapoint ID requirements #109

jmdkastro opened this issue Jun 9, 2015 · 6 comments

Comments

@jmdkastro
Copy link
Contributor

@snlongmore @keflavich
Quickly checking the latest uploads on really crappy wifi, I have a brief suggestion.

We should include a check/requirement upon submission that the IDs are useful -- the uploaded MarkSwinbank, AlbertoBolatto, and LisaWei datasets have IDs that are simple numbers running from 1 to N. This is asking for trouble (and we should change this!), as any new datasets by one of these authors are bound to cause conflict. Another reason why we shouldn't do this is that it'd be good to be able to find out which galaxy these clouds are from without having to know the particular paper.

Good examples of how to do this right for extragalactic clouds are the ErikRosolowsky and TonyWong datasets, which include the host galaxy tags. For Galactic clouds, AdamGinsburg and DanielWalker provide a good example as they simple list the unique phone numbers.

I don't know how hard it is to check for this, but perhaps we should include an instruction in the upload form.

@snlongmore
Copy link

@jmdkastro @keflavich
Good point. I agree, it would be good to have a check/requirement and also add a some text on the upload page explaining this and showing some good/bad examples. I’m afraid at the moment I don’t have time to go through the Bolatto, Swinbank and Wei papers and add unique IDs then re-upload.

Note that some observers/simulators do simply label their objects numerically. But as long as the publication information is available for every uploaded data set, if the data are from the same author but a different paper, it will be possible to distinguish between them in the database.

The place where it will currently be difficult to distinguish between them is in the plotting. For example, I am planning to upload two datasets from the same Heyer et al 09 paper — the cores and clouds sample. In the database these will be given IDs of coreN, and cloudN. But once these are in the database, there will be two MarkHeyer entries in the query plot. There are two course of action here.

  1. Only allow a single upload per paper.
  2. edit the database to allow multiple uploads per paper.

I just had a quick think of how 2. would work in practice, and I think it would be a pain to implement. I therefore think it makes sense to go for 1. In which case, we need to have a check on the upload page that will not allow multiple entries from the same paper. If someone does want to to upload something new from a paper already in the database, this will need to be done manually. I suspect this won’t happen very often so will not be too much of a burden on the database administrators.

Do you agree with this approach? If so, I will add a single entry for the Heyer et al 09 paper containing both clouds and cores.

@snlongmore
Copy link

On a related point...

Regarding the plotting, rather than having the legend label be “FirstnameLastname", wouldn't it be better to have "Lastname + journal paper ID/DOI”? That way you can distinguish between multiple papers from the same first author.

@keflavich
Copy link
Contributor

Agreed that FirstnameLastnameADSID or FirstnameLastnameDOI would be better.

@jmdkastro
Copy link
Contributor Author

Even then, there may be more than one galaxy in a single paper. I don't think that observers continue the numbering between different host galaxies but start over. Making sure IDs include this seems like a natural solution.

Why is more than one upload per paper a problem? Harder to check for duplicates? From a user perspective, I think this should be possible. When you already know multiple tables from one paper will be used, adding them at the same time makes sense of course. But this won't always be the case. So we'll indeed need to think about how to handle this. Thanks for flagging.

As for the legend, I agree (it'll take up a lot more space though... perhaps in a small font below the author name?).

@keflavich
Copy link
Contributor

Re-reading through this: in order for an object to end up in a publication, it needs some sort of unique identifier. "Galaxy 1" should never show up in an ID list: galaxies have names. Simulated galaxies might have IDs like "Author: Publication: Galaxy 1", but again these should be unique (modulo timestep). So, I agree with @jmdkastro's original post.

@snlongmore, are there some examples of numerical identifiers for real astronomical objects?

@keflavich
Copy link
Contributor

@snlongmore I think multiple uploads per paper are needed because the underlying catalog - the thing that goes to uploads/ - may not be uniform.

@keflavich keflavich added this to the First draft of paper milestone Jun 16, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants