-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Y24-300 - Addition of anonymous ids for public data sharing - existing data #1896
Comments
I know we discussed that the WSI prefix would be used for the Anon ID could it be the RVI one instead. It’s still anon enough not to distinguish location. |
GSU agree that RVI prefixed Sanger Sample ID can to be used. |
Team Discussion 13Nov2024 Assumption:
How do we identify the samples/Study that need to be updated? |
@KatyTaylor There's an attribute in |
Outstanding questions:
|
Quick clarifications.
|
If the sample has been accessioned whatever identifier is added to the 'name' and then added as the public name then potentially (if data is pushed to the ENA) this would expose the 'new id' as public name if is displayed as the title of the accessioned sample |
@neilsycamore The issue they are trying to address is that they don't have an anonymous ID they can use for publishing data. Their sample supplier uses ids that are identifiable and they automatically transferred these over to SciOps / SequenceScape. The accessioning logic, I believe, uses public name if present and then fails back to supplier id ? So, by setting an anonymous id in the Public Name field they can safely publish. We thought about allocating a new id with baracoda but it seems that in this case the SequenceScape name is sufficiently anonymous. Does that help explain things? Are there any concerns with this approach? |
@TWJW-SANGER , @KatyTaylor I’ve been looking into this, and I noticed that some samples belonging to the RVI Program - Bait Capture study have their I’m unsure how to handle the public names for these samples. Should we remove the "RandD_" prefix to align with the acceptance criteria, or should we leave them as they are for now? Out of 5055 records, only 23 have their sanger_sample_id prefixed with "RandD_," which I found a bit confusing. Additionally, there doesn’t seem to be much consistency in the use of the RVI prefix—some samples are prefixed with "RVI," while others use "RVI_." |
Good spot. My thoughts below:
I suggest that we continue as planned but I will send an email to Adrianne and Ya-Lin to cover these points and CC you both in. |
User story
As a member of GSU, I would like RVI samples already imported into Sequencescape to have an anonymous id added to them, so that when they are shared in public databases, the id does not give any unnecessary information.
Who are the primary contacts for this story
Anna G, Adrianne L
Who is the nominated tester for UAT
Anna G, Ya-Lin H
Acceptance criteria
To be considered successful the solution must allow:
References
This story has a non-blocking relationship with:
Additional context
The data for these already imported samples has not yet been released.
Original request from Anna: "GSU would like to apply additional level of security in relation to sample IDs. This will need to be applied to future and all retrospective samples within RVI." (was split into 3 stories).
N.B. Is it worth checking before doing this story if the Sanger Sample Id would be sufficient? Is this used for existing samples in other studies? We discussed using the accession number but think this is not appropriate because it is specific to EBI and this will potentially be released to other databases as well.
The text was updated successfully, but these errors were encountered: