-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing subject page for shRNA #396
Comments
Here is a link to observation about GTGAAGAATGTGACAAAGTTT, https://ctd2-dashboard.nci.nih.gov/dashboard/#observation/20130429-dfci-ataris-analysis-818 . |
Where does the subject data come from, when available? |
The problem happened when the target transcript of the RNA is empty. The code is fixed to handle that. |
If the transcript is known to the Dashboard, the subject page for the transcript displays a link to the Target Transcript and to the subject page of the Target Gene. As above, the working example is With the code change made above, if the transcript is not known, the subject page now successfully loads and shows the associated observations. However, it does not have entries for the transcript or gene. |
We have now investigated the actual data file, ../subject_data/shrna/trc_public.05Apr11.txt. The shRNA mentioned above which is not showing transcript or gene symbol is in the data file. The example is #rna/gtgaagaatgtgacaaagttt |
row 16528, nmId column is"NM_024924" |
short explanation of no transcript: the transcript ID, like NM_024924 in the above example is used to find a matching record of a transcript in the database. If there is no match, the transcript will be missing. The transcript information is from the protein background data file. details: there are totally 420 cases of missing transcript. Some of IDs, e.g. n/a, noHits probably should be explicitly excluded in finding the match; Some of others, like 'REPLACED BY ....' may need to be handled differently in the loading code. Here is the list (contens in the brackets are the two relavent fields: transcrtip ID and alternative transcript ID): 1: CCTCGATACAGCATTGGGTTA [NM_001203][NM_001203.2] |
In the above list of transcripts not found, the first entry is So it would be interesting to know why the connection is breaking down, as searching on NM_001203 does find the shrna results. |
Upon further investigation, I found the above list I posted of failed matching are not all because there is no match. Instead, many of these failed because there are multiple matches. For example, NM_001203 is such a case of multiple matches. On the other hand, the original case that started this issue, NM_024924, is indeed a case of no match. The reason of multiple matches is surprising. The way to decide a match is not by exact match of refseqId but by the beginning part of refseqId. For example, NM_001203 matches NM_001203247, NM_001203249, NM_001203248, etc. total 13 matches. I don't know why it is done this way, but it is clearly done intentionally in the implementation. |
This issue has 'evolved' away from the original reported problem. The original title was accurate but now has little to do with the discussion in the comments. We should re-organize what we want to change here, preferably as new issues or a new proposal. |
We should close this issue and create new ones that have more specific goals. |
Search for shRNA sequence GTGAAGAATGTGACAAAGTTT finds two observations, but from the search page it is not possible to go to shRNA page https://ctd2-dashboard.nci.nih.gov/dashboard/#rna/gtgaagaatgtgacaaagttt . However shRNA page sometime works, as search for CAGTTGAGACCTTCTAATTGG finds another shRNA which does have it's own page, https://ctd2-dashboard.nci.nih.gov/dashboard/#rna/cagttgagaccttctaattgg .
The text was updated successfully, but these errors were encountered: