Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with duplicated Target names when running t-test box plot code #70

Closed
1 of 3 tasks
barnacj opened this issue Jan 5, 2024 · 4 comments
Closed
1 of 3 tasks
Assignees
Labels
question Further information is requested tech-support Forwarded on to SomaLogic technical support

Comments

@barnacj
Copy link

barnacj commented Jan 5, 2024

image

Description

When I ran a t-test on my Somascan data comparing sex and tried to visualise using code below, I got an error message see picture):

Picture 1

Steps

I used the make.unique function on my_data$Target, which created Leptin and Leptin.1. This resolved the issue.

Priority Level

  • High
  • Medium
  • Low

Thanks for reporting 🥳!

@barnacj barnacj added the bug Something isn't working label Jan 5, 2024
@stufield
Copy link
Contributor

stufield commented Jan 5, 2024

Would you mind providing some runable code that I can use to reproduce the behavior? Preferably a minimal example, even without actual SomaScan data, that still generates the error you see, but without all the extra overhead of the SomaScan ecosystem.

From your example it looks like there are duplicated target names, though I'm not seeing that on my end.
Which version of SomaScan are you working with?

In addition, I have notified @wschwarzmann in our Support Bioinformatics Team regarding duplicate Target fields in annotations of SomaScan.

Thanks!

@stufield stufield self-assigned this Jan 5, 2024
@stufield stufield added tech-support Forwarded on to SomaLogic technical support question Further information is requested and removed bug Something isn't working labels Jan 5, 2024
@wschwarzmann
Copy link

SomaScan v4.0 and v4.1 (and the upcoming v5.0) contain some duplicates in the Target name column header. The Target names tend to follow the UniProt ID. Because a SOMAmer is specific to an epitope, we can distinguish between different epitopes unique to specific proteoforms, or even unique epitopes on the same protein, even when UniProt doesn't distinguish. The Target Full Name column header often contains more unique information, such as if we're measuring a different isoforms or the N/C terminus of the protein. In cases where there is still duplication, the measurements can further be distinguished by the amino acid range the SOMAmers were selected against. This can be found in the annotated menu downloaded from https://menu.somalogic.com . In cases where even the amino acid range isn't even distinct, please reach out to [email protected] with your specific protein in mind, and the team will get back to you with any additional information we can provide.

@barnacj
Copy link
Author

barnacj commented Jan 8, 2024 via email

@stufield
Copy link
Contributor

stufield commented Jan 9, 2024

A few closing comments:

  • removing the bug label since this isn't actually a bug with the code per se. The annotations can and do have multiple SeqIds to the same Target (sometimes intentionally).
  • suggested work-around (make.unique()) seems perfectly reasonable and appropriate
  • pinning (and closing) this issue for visibility to make it easier for others to find in the future

@stufield stufield pinned this issue Jan 9, 2024
@stufield stufield closed this as completed Jan 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested tech-support Forwarded on to SomaLogic technical support
Projects
None yet
Development

No branches or pull requests

3 participants