Y24-300 - Addition of anonymous ids for public data sharing - existing data #1896

KatyTaylor · 2024-09-05T08:26:33Z

User story
As a member of GSU, I would like RVI samples already imported into Sequencescape to have an anonymous id added to them, so that when they are shared in public databases, the id does not give any unnecessary information.

Who are the primary contacts for this story
Anna G, Adrianne L

Who is the nominated tester for UAT
Anna G, Ya-Lin H

Acceptance criteria
To be considered successful the solution must allow:

Samples previously imported into Sequencescape for RVI should have an id created and inserted into the public name (?) field
Id is created using Baracoda (RVI prefix)

References
This story has a non-blocking relationship with:

Additional context
The data for these already imported samples has not yet been released.

Original request from Anna: "GSU would like to apply additional level of security in relation to sample IDs. This will need to be applied to future and all retrospective samples within RVI." (was split into 3 stories).

N.B. Is it worth checking before doing this story if the Sanger Sample Id would be sufficient? Is this used for existing samples in other studies? We discussed using the accession number but think this is not appropriate because it is specific to EBI and this will potentially be released to other databases as well.

TWJW-SANGER · 2024-09-09T09:00:24Z

I know we discussed that the WSI prefix would be used for the Anon ID could it be the RVI one instead. It’s still anon enough not to distinguish location.

TWJW-SANGER · 2024-10-01T14:35:49Z

GSU agree that RVI prefixed Sanger Sample ID can to be used.

SujitDey2022 · 2024-11-13T13:19:12Z

Team Discussion 13Nov2024

Assumption:

Run a script to update the IDs
Referencing to only RVI samples and not Heron Samples
Use of public name field only for RVI and not for Heron

How do we identify the samples/Study that need to be updated?

dasunpubudumal · 2024-11-22T14:57:54Z

@KatyTaylor There's an attribute in sample_metadata table called sample_public_name which the manifest's PUBLIC NAME value is stored into. Is this the target for the new ID?

BenTopping · 2024-12-09T13:22:39Z

Outstanding questions:

How do we determine which samples are RVI samples? Are they all under particular studies?
If we update sanger_sample_id's that will break all RVI manifests (they will have old ids) does this matter?
Should this be done after the manifest story so that we don't get intermediate data created after this data patch and before the manifest patch?

TWJW-SANGER · 2024-12-10T11:03:30Z

Quick clarifications.

How do we determine which samples are RVI samples? Are they all under particular studies?
Yes they will be identified by belonging to the "RVI Program - Bait Capture" study
If we update sanger_sample_id's that will break all RVI manifests (they will have old ids) does this matter?
We are not changing the sanger_sample_id values.
We would like to set the sample's "Public Name" field to be identical to the sample's "Sanger Sample ID" field for these samples.
Should this be done after the manifest story so that we don't get intermediate data created after this data patch and before the manifest patch?
Good point. Yes.

neilsycamore · 2024-12-10T11:32:44Z

If the sample has been accessioned whatever identifier is added to the 'name' and then added as the public name then potentially (if data is pushed to the ENA) this would expose the 'new id' as public name if is displayed as the title of the accessioned sample

TWJW-SANGER · 2024-12-10T16:01:33Z

@neilsycamore The issue they are trying to address is that they don't have an anonymous ID they can use for publishing data. Their sample supplier uses ids that are identifiable and they automatically transferred these over to SciOps / SequenceScape.

The accessioning logic, I believe, uses public name if present and then fails back to supplier id ?

So, by setting an anonymous id in the Public Name field they can safely publish. We thought about allocating a new id with baracoda but it seems that in this case the SequenceScape name is sufficiently anonymous.

Does that help explain things? Are there any concerns with this approach?

sabrine33 · 2024-12-16T12:45:06Z

@TWJW-SANGER , @KatyTaylor I’ve been looking into this, and I noticed that some samples belonging to the RVI Program - Bait Capture study have their sanger_sample_id prefixed with "RandD_" (e.g., RandD_RVIxxxxxx).

I’m unsure how to handle the public names for these samples. Should we remove the "RandD_" prefix to align with the acceptance criteria, or should we leave them as they are for now?

Out of 5055 records, only 23 have their sanger_sample_id prefixed with "RandD_," which I found a bit confusing. Additionally, there doesn’t seem to be much consistency in the use of the RVI prefix—some samples are prefixed with "RVI," while others use "RVI_."
I’m not sure if this is relevant, but I thought it was worth mentioning. I also looked at the date column to try and make sense of it, but that didn’t provide any clear insights.

TWJW-SANGER · 2024-12-17T10:05:28Z

Good spot.

My thoughts below:

I don't think it would be a good idea to change the sanger_sample_id, as this is likely linked to in various ways in different systems.
I suspect that the R&D samples will not be part of the data release. So that it won't matter what their public name is as it will never be exported.
"RVI" vs "RVI_" - is irritating, but still satisfies the requirement to be anonymous and that inconsistency will be present in other systems that feed into SequenceScape already.

I suggest that we continue as planned but I will send an email to Adrianne and Ya-Lin to cover these points and CC you both in.

psd-issuer bot changed the title ~~Addition of anonymous ids for public data sharing - existing data~~ Y24-300 - Addition of anonymous ids for public data sharing - existing data Sep 5, 2024

KatyTaylor added On Hold On hold RVI RVI project Enhancement New feature or request labels Sep 5, 2024

KatyTaylor mentioned this issue Sep 9, 2024

Y24-250 - Addition of anonymous ids for public data sharing - Plate Genie #1851

Closed

2 tasks

KatyTaylor mentioned this issue Oct 17, 2024

Y24-299 - Addition of anonymous ids for public data sharing - sample manifest #1895

Closed

2 tasks

TWJW-SANGER removed the On Hold On hold label Oct 17, 2024

SujitDey2022 added the Size: M Medium - medium effort & risk label Nov 13, 2024

dasunpubudumal self-assigned this Nov 22, 2024

dasunpubudumal removed their assignment Dec 9, 2024

sabrine33 self-assigned this Dec 16, 2024

sabrine33 added the Value: 4 Value to the insitute is high label Dec 16, 2024

sabrine33 linked a pull request Dec 17, 2024 that will close this issue

Populate 'sample_public_name' column values with 'sanger_sample_id' column values for RVI-prefixed samples sanger/sequencescape#4565

Closed

sabrine33 removed a link to a pull request Jan 8, 2025

Populate 'sample_public_name' column values with 'sanger_sample_id' column values for RVI-prefixed samples sanger/sequencescape#4565

Closed

sabrine33 linked a pull request Jan 8, 2025 that will close this issue

Update sample_public_name for RVI Program - Bait Capture study sanger/sequencescape#4586

Closed

sabrine33 removed a link to a pull request Jan 13, 2025

Update sample_public_name for RVI Program - Bait Capture study sanger/sequencescape#4586

Closed

sabrine33 linked a pull request Jan 13, 2025 that will close this issue

Update sample_public_name for RVI Program - Bait Capture study sanger/sequencescape#4594

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Y24-300 - Addition of anonymous ids for public data sharing - existing data #1896

Y24-300 - Addition of anonymous ids for public data sharing - existing data #1896

KatyTaylor commented Sep 5, 2024 •

edited by SujitDey2022

Loading

TWJW-SANGER commented Sep 9, 2024

TWJW-SANGER commented Oct 1, 2024 •

edited

Loading

SujitDey2022 commented Nov 13, 2024

dasunpubudumal commented Nov 22, 2024 •

edited

Loading

BenTopping commented Dec 9, 2024 •

edited

Loading

TWJW-SANGER commented Dec 10, 2024 •

edited

Loading

neilsycamore commented Dec 10, 2024

TWJW-SANGER commented Dec 10, 2024

sabrine33 commented Dec 16, 2024 •

edited

Loading

TWJW-SANGER commented Dec 17, 2024

Y24-300 - Addition of anonymous ids for public data sharing - existing data #1896

Y24-300 - Addition of anonymous ids for public data sharing - existing data #1896

Comments

KatyTaylor commented Sep 5, 2024 • edited by SujitDey2022 Loading

TWJW-SANGER commented Sep 9, 2024

TWJW-SANGER commented Oct 1, 2024 • edited Loading

SujitDey2022 commented Nov 13, 2024

dasunpubudumal commented Nov 22, 2024 • edited Loading

BenTopping commented Dec 9, 2024 • edited Loading

TWJW-SANGER commented Dec 10, 2024 • edited Loading

neilsycamore commented Dec 10, 2024

TWJW-SANGER commented Dec 10, 2024

sabrine33 commented Dec 16, 2024 • edited Loading

TWJW-SANGER commented Dec 17, 2024

KatyTaylor commented Sep 5, 2024 •

edited by SujitDey2022

Loading

TWJW-SANGER commented Oct 1, 2024 •

edited

Loading

dasunpubudumal commented Nov 22, 2024 •

edited

Loading

BenTopping commented Dec 9, 2024 •

edited

Loading

TWJW-SANGER commented Dec 10, 2024 •

edited

Loading

sabrine33 commented Dec 16, 2024 •

edited

Loading