Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing subject page for shRNA #396

Open
vdancik opened this issue Feb 27, 2019 · 11 comments
Open

Missing subject page for shRNA #396

vdancik opened this issue Feb 27, 2019 · 11 comments
Assignees
Labels

Comments

@vdancik
Copy link
Contributor

vdancik commented Feb 27, 2019

Search for shRNA sequence GTGAAGAATGTGACAAAGTTT finds two observations, but from the search page it is not possible to go to shRNA page https://ctd2-dashboard.nci.nih.gov/dashboard/#rna/gtgaagaatgtgacaaagttt . However shRNA page sometime works, as search for CAGTTGAGACCTTCTAATTGG finds another shRNA which does have it's own page, https://ctd2-dashboard.nci.nih.gov/dashboard/#rna/cagttgagaccttctaattgg .

@vdancik
Copy link
Contributor Author

vdancik commented Feb 27, 2019

Here is a link to observation about GTGAAGAATGTGACAAAGTTT, https://ctd2-dashboard.nci.nih.gov/dashboard/#observation/20130429-dfci-ataris-analysis-818 .
Clicking on shRNA link produces a javascript error TypeError: transcript is null

@kcs3
Copy link
Collaborator

kcs3 commented Jun 25, 2019

Where does the subject data come from, when available?

@kcs3 kcs3 assigned zhouji2013 and unassigned kcs3 Oct 17, 2019
@zhouji2013 zhouji2013 added this to the Ongoing improvements milestone Oct 28, 2019
zhouji2013 added a commit that referenced this issue Oct 31, 2019
@zhouji2013
Copy link
Collaborator

The problem happened when the target transcript of the RNA is empty. The code is fixed to handle that.

@kcs3
Copy link
Collaborator

kcs3 commented Nov 4, 2019

If the transcript is known to the Dashboard, the subject page for the transcript displays a link to the Target Transcript and to the subject page of the Target Gene. As above, the working example is
https://ctd2-dashboard.nci.nih.gov/dashboard/#rna/CAGTTGAGACCTTCTAATTGG

With the code change made above, if the transcript is not known, the subject page now successfully loads and shows the associated observations. However, it does not have entries for the transcript or gene.

@kcs3
Copy link
Collaborator

kcs3 commented Nov 4, 2019

We have now investigated the actual data file, ../subject_data/shrna/trc_public.05Apr11.txt. The shRNA mentioned above which is not showing transcript or gene symbol is in the data file. The example is #rna/gtgaagaatgtgacaaagttt
which on the local instance is
http://156.145.29.93:9998/dashboard/#rna/gtgaagaatgtgacaaagttt

@zhouji2013
Copy link
Collaborator

We have now investigated the actual data file, ../subject_data/shrna/trc_public.05Apr11.txt. The shRNA mentioned above which is not showing transcript or gene symbol is in the data file. The example is #rna/gtgaagaatgtgacaaagttt
which on the local instance is
http://156.145.29.93:9998/dashboard/#rna/gtgaagaatgtgacaaagttt

row 16528, nmId column is"NM_024924"

@zhouji2013
Copy link
Collaborator

short explanation of no transcript: the transcript ID, like NM_024924 in the above example is used to find a matching record of a transcript in the database. If there is no match, the transcript will be missing. The transcript information is from the protein background data file.

details: there are totally 420 cases of missing transcript. Some of IDs, e.g. n/a, noHits probably should be explicitly excluded in finding the match; Some of others, like 'REPLACED BY ....' may need to be handled differently in the loading code.

Here is the list (contens in the brackets are the two relavent fields: transcrtip ID and alternative transcript ID):

1: CCTCGATACAGCATTGGGTTA [NM_001203][NM_001203.2]
2: TCAGGAGGTATAGTGGAAGAA [NM_001203][NM_001203.2]
3: CAATCCAATGTCTACTGCTAT [NM_001204][NM_001204.6]
4: CCTCTGGCATATAATCAAGTT [NM_001260][NM_001260.1]
5: GCACAGTTTGGTCCGTTAGAA [NM_001261][NM_001261.3]
6: AGGGACATGAAGGCTGCTAAT [NM_001261][NM_001261.3]
7: CCGCTGCAAGGGTAGTATATA [NM_001261][NM_001261.3]
8: TGATTGAGATTTGTCGAACCA [NM_001261][NM_001261.3]
9: CCATGAGGCAAGAAACTATAT [NM_001315][NM_001315.2]
10: CAACCCACGAATCAAGCTCAT [NM_001348][NM_001348.1]
11: CATCGCACACTTTGACCTGAA [NM_001348][NM_001348.1]
12: GAAGGAGTACACCATCAAGTC [NM_001348][NM_001348.1]
13: GTGATGTGGATATAATGGATT [NM_005758][NR_002726.2]
14: TGACTTATTCTTGTGTTACAG [NM_005758][NR_002726.2]
15: GAGATGTGAAGATGGAGAATA [NM_144610][NM_001174103.1]
16: TAATTGCTGTGGATACTGTAA [NM_144610][NM_001174103.1]
17: AGTTTCCCATTAGGCCCATTA [NM_144610][NM_001174103.1]
18: GCAGAGATAACCCAACACAGT [NM_032833][NM_032833.3]
19: GCAACATATCCCACTCAGAAA [NM_032833][NM_032833.3]
20: GAGGGCCGAATAAGTGTAGTT [NM_032833][NM_032833.3]
21: ACCTCGTAGATGTGGAATTAA [NM_001093][NM_001093.3]
22: GCTGCGGCCAACATCTTCAAA [NM_001166][NM_001166.3]
23: CGCTACATCCTTACCAACCGT [NM_001320][NM_001320.5]
24: CCGTTCGGCATCTGGCTTGAT [NM_152452][noHits]
25: GATTACAGCATACACAGTGAT [NM_152452][noHits]
26: TGACTGCACAGCCAATGGTTT [NM_152452][noHits]
27: GCTGTAGTTCAGAAGAGGTTT [NM_001005][NM_001005.3]
28: GCATCTTCAAAGCTGAACTGA [NM_001005][NM_001005.3]
29: GTGGAACCCAAAGATGAGATA [NM_001005][NM_001005.3]
30: CCCTCTGAGTAGGCCTATAAT [NM_001293][NM_001293.2]
31: GCCTAGTGATAAATCAGCGTT [NM_001293][NM_001293.2]
32: GCCACACTGGAGAGATTAGAA [NM_001293][NM_001293.2]
33: CCAACAGTTGCTGGACAGTTT [NM_001293][NM_001293.2]
34: TGACTGATACTATGGTGCCTT [XM_292099][noHits]
35: GCACAGACACACGCATTGTAA [NM_001347][NM_001347.2]
36: CGGGAAGCAAACGCTGAAGAT [NM_001347][NM_001347.2]
37: CGGAAGCTACTGAACCCTCAT [NM_001347][NM_001347.2]
38: AGTCACATCTACTCCTCCCAA [NM_001347][NM_001347.2]
39: CCCAGACATTTGGATTTCCAT [NM_002760][NR_028062.1]
40: GCTGCTGTTCTAACCTCAGTA [NM_001109][noHits]
41: AGGGAGTCACACTGACCACCT [NM_001322][NM_001322.2]
42: GCCTTCCATGAACAGCCAGAA [NM_001322][NM_001322.2]
43: GCTCCTGATCTTCCTGAAGAT [NM_001017][NM_001017.2]
44: GATGCTAAATTCCGTCTGATT [NM_001017][NM_001017.2]
45: CCACTTGGTTGAAGTTGACAT [NM_001017][NM_001017.2]
46: CCTGAAGATCTCTACCATTTA [NM_001017][NM_001017.2]
47: CAGCACTATCAGCATTGTGAA [NM_020061][NM_020061.4]
48: GCGAACTCATACTGGAGAGAA [NM_003438][NR_023311.1]
49: CCTGGTATTGAAGAGGTGAAT [NM_001208][NR_026983.1]
50: CTTGAAGATAAATCTGCCTAA [NM_000957][NR_028292.1,NR_028294.1,NR_028293.1]
51: GCTGCTTATCATCCTCTCCTA [NM_033519][NR_002140.1]
52: CCCGGACATTGCTGGCTCAAT [NM_182611][NM_001161808.1]
53: CAGGTGAATCAAGGAACCCTT [NM_182611][NM_001161808.1]
54: TGGGCTATATCTGCGGTGAAA [NM_005305][noHits]
55: CCTGCTGCTGTTCCTGCCTTT [NM_005305][noHits]
56: GCTGGAATAGCCAAGCTCTTT [XM_116384][noHits]
57: GATGTCATGGACCTCACAGAA [XM_116384][noHits]
58: GAAAGGCAAGAAGGAGAGCAA [XM_116384][noHits]
59: GCCATTGCCATGGCTGGAATA [XM_116384][noHits]
60: GCTCAGGAGTGAGGATGTCAT [XM_116384][noHits]
61: GCCAACTGGATGAGAACCAAA [NM_052996][NM_052996.2]
62: GTCCAGATTTGGTTTCAGAAT [XM_208028][noHits]
63: GATTCAGATCTGGTTTCAGAA [XM_208028][noHits]
64: GCCCTGCTCCTCCGAGCCTTT [XM_208028][noHits]
65: CAAGCTCTTTGTTGGAGAGGT [XM_210613][noHits]
66: TGCGGGCTATCACTGGCAGTT [XM_210613][noHits]
67: AGGAGGCTCAGAGGATGACAA [XM_210613][noHits]
68: TGAGGACCTAGACGGGAACTT [XM_210613][noHits]
69: CCTGCTGTCTGCCATGTCTGA [XM_210613][noHits]
70: CTAGATGGGAACTTGGAAGAA [XM_210642][noHits]
71: GCAAAGCTAACCTATCATCAA [NM_024498][noHits]
72: CGACCCTTACTACACATAATA [NM_024498][noHits]
73: TGACCCTAAGAAGATATAGAA [NM_024498][noHits]
74: CGCTTACTAAACATAAGGTAA [NM_024498][noHits]
75: CCGCACAATGAGTCAGAAGAT [XM_290345][noHits]
76: CGCATGAGCATCAAAGCCTAT [XM_290345][noHits]
77: CCCTGAGTACAGTGTTGCAAT [XM_290345][noHits]
78: GACATGGAATTTGCTAAGAAT [XM_290345][noHits]
79: GCAATTCATACTGGAGAGAAA [NM_024924][NR_003578.1]
80: GTGAAGAATGTGACAAAGTTT [NM_024924][NR_003578.1]
81: CCAGGAGATCCCACAGGAGAT [XM_291857][noHits]
82: GCAAGTCCTGAGGACAGGCAA [XM_291857][noHits]
83: GCTCACCACTCTGCCCACGAA [XM_291857][noHits]
84: CCAGGAGATCCCACAGGTGAT [XM_291857][noHits]
85: GCTGGAATAGCCAAGGTCTTT [XM_373077][XM_936303.1,XM_001715032.1,XM_373077.2]
86: GAAACGCAAGAAGGAGAGGAA [XM_373077][XM_936303.1,XM_373077.2]
87: GCTTTCCCAAGAGCACGCATT [XM_373077][XM_936303.1,XM_001715032.1,XM_373077.2]
88: TGCGGGTCTGATGCGGTCTAT [XM_373077][XM_936303.1,XM_001715032.1,XM_373077.2]
89: AGCACTTGTTGCACGTCTGAT [XM_373078][XM_936313.1,XM_001715028.1,XM_373078.1]
90: CTGAAGGGCTGCAACGAGGAT [XM_373078][XM_936313.1,XM_001715028.1,XM_373078.1]
91: GTCCGACTCCAAGTCCGGGAA [XM_373009][XM_937659.5,XM_926341.4]
92: TGGAGAACAAGTTCAAGGCCA [XM_373009][XM_937659.5,XM_926341.4]
93: GAACCGCCGAACCAACCCGCT [XM_373009][noHits]
94: GCTGTCGCTCAGCCTCACCGA [XM_373009][XM_937659.5,XM_926341.4]
95: GCGCTACCTGTCGGTGTGCGA [XM_373009][XM_937659.5,XM_926341.4]
96: GTGAGAGTGATCGCGGTCTTA [XM_373255][noHits]
97: GCTCGCTTGAGAGCGCCCTTA [XM_373255][noHits]
98: GCAAGTCAGTTCTCATTTCTT [XM_376622][noHits]
99: CCTTGAGGTTACCAGGTAGAA [XM_376622][noHits]
100: GCTCCCAAGGAAGAAGTAGAT [XM_376622][noHits]
101: CCTACTGATAGGGACTCCATA [XM_376622][noHits]
102: CGAGACAGTCTTTCTCATCTT [XM_376622][noHits]
103: CCTCACAGAAGGTGACAGTGA [XM_377875][noHits]
104: CGGTCAGCGTTCCCGAGAGCA [XM_377875][noHits]
105: GTCTGCCATGTCTGAGGAGCA [XM_377877][noHits]
106: GCTACGAAGTGTGTCGCCGGT [XM_377877][noHits]
107: CTAGACGGGAACTTGGAAGCA [XM_377877][noHits]
108: TCAGAGGATGACAACCCTGCT [XM_377877][noHits]
109: CATGGCTGGAATAGCCAAGCT [XM_377878][noHits]
110: GATGGAATCCCTGAGGACCTA [XM_377878][noHits]
111: CTGTCCCGCTACGAAGTGTGT [XM_377878][noHits]
112: GTGAGGATGTCATGGACCTCA [XM_377878][noHits]
113: CCTCAGCTCCTCCTGCAGCCA [XM_377878][noHits]
114: GCTCATCGACTCGGTCACCAA [XM_376763][noHits]
115: CATGGACCTCACAGAAGGTGA [XM_377879][noHits]
116: CCATGTCTGAGGAGCAGCTGT [XM_377879][noHits]
117: CGGGTCTGATGCGGGCTATCA [XM_377880][noHits]
118: CGAAAGGCAAGAAGGAGAGCA [XM_377880][noHits]
119: CTGAGGACCTAGACGGGAACT [XM_377880][noHits]
120: GAGGAGGCTCAGAGGATGACA [XM_377880][noHits]
121: GAAGCACCCAGGGATCAGGAA [XM_377880][noHits]
122: TCACATTAACAGCCCACAGTT [XM_071173][noHits]
123: CCAGTAGAAATCACACTGGAA [XM_066752][noHits]
124: GCTTGAAGACATTCACAACTT [XM_066752][noHits]
125: GCTGCTCTTCAACACAAGATA [XM_377946][noHits]
126: GTAAGCTACAAGGAGGAGCTT [XM_377946][noHits]
127: GCCCTGATACATCGAATGATA [XM_031553][REPLACED BY TRCN0000221601]
128: GCAGTGGTAGACGAGTGAAAT [XM_031553][REPLACED BY TRCN0000221604]
129: CGTACAATTCAAGGCCATTTA [XM_031553][REPLACED BY TRCN0000221605]
130: CGCCCTGCACACTAGCACCAT [NM_003926][NM_003926.5]
131: CGGCCTGAACGCCTTCGACAT [NM_003926][NM_003926.5]
132: GCCCTGCAGAATACTAATAAT [XM_379792][noHits]
133: GCTGAACCAGACATGGATGAT [XM_379855][noHits]
134: GAGATGTATGAGGTTCGTATT [XM_379855][noHits]
135: CCTTGATTTCCTAGTTGACAT [XM_379855][noHits]
136: CGGATCCCAAACCGCCCTGCT [NM_012148][NM_012148.2]
137: GAAACCTTTCTTTGAGAAGTT [NM_173643][noHits]
138: CGAGTGGCTTTGCCCTCCCGA [NM_033178][noHits]
139: CGCGGTTCACAGACCGCACAT [NM_033178][noHits]
140: GCTCTCCTTGCCAGGTTCCAA [NM_033178][noHits]
141: CGTGGAAATGAACGAGAGCCA [NM_033178][noHits]
142: CAAAGATGAAGACTTGTGGAT [NM_145237][noHits]
143: TGAAGACTTGTGGATATGGAT [NM_145237][noHits]
144: GCGGTTCACTTCGTATCAGAA [XM_376537][REPLACED BY TRCN0000220214]
145: CCTTCTCAGAATAGTCCAATT [XM_376537][REPLACED BY TRCN0000220215]
146: CGTAGTAGAGATCGTATGTAT [XM_376537][REPLACED BY TRCN0000220216]
147: CCTGAGCAGGTAAAGTCTGAA [XM_376537][REPLACED BY TRCN0000220218]
148: CCTCTCTTTGAACCGTTACTT [XM_166527][noHits]
149: CCCGAATTGAGTCGTTTCTAT [XM_166527][noHits]
150: GCCCTGAAGAACAGTAATGAT [NM_032031][NR_002182.1]
151: GCTGTCTCAAACATTCAAGAA [NM_032031][NR_002182.1]
152: GACCTCACAGAAGGTGACAAT [XM_373056][noHits]
153: TCGCCGGTCAGCTTTCCCAAA [XM_373056][noHits]
154: ATGGCTGGAATAGCCAAGCTT [XM_373057][noHits]
155: GCGGCCATTGCCATGGCTGGA [XM_373057][noHits]
156: GAGCAGCTGTCCCGCTACGAA [XM_373057][noHits]
157: CCCTGAGGACCTAGACGGGAA [XM_373058][noHits]
158: GACCTCACAGAAGGTGACAGT [XM_373061][noHits]
159: GCAAGAAGGAGAGCAAGCCCA [XM_373061][noHits]
160: GAAAGACGTTAAATTACGGAT [XM_373076][noHits]
161: CACACCTGTAATCCCAGCATT [XM_370946][noHits]
162: CGTCATGTTGATAATCCAAAT [NM_022050][NR_004859.1]
163: CCAGCAACGAGAACGCCACAT [NM_017876][noHits]
164: CCCAAGGAACATTAGGGTGAA [NM_198083][NM_001193636.1,NM_001193637.1,NM_001193635.1,NM_198083.3]
165: ACCAGAGGTCTTGCTGAGGAT [XM_007651][REPLACED BY TRCN0000221400]
166: CCGACGAATCACATTCTTGAT [n/a][NM_001093.3]
167: CCCGAGAACCTCAAGAAATTA [n/a][NM_001093.3]
168: CGAAACTACCTTCAACTCCAT [NM_001101][NM_001101.3]
169: CAGAAGGTGACAGTGAGGCTT [XM_373059][noHits]
170: CAGGAAGGTGAGCTCAGGAGT [XM_373059][noHits]
171: ACGGGAACTTGGAAGCACCCA [XM_373059][noHits]
172: TGGCAGTTCGGTGTCGGAGAA [XM_373075][noHits]
173: GAGGATGTCATGGACCTCACA [XM_373075][noHits]
174: TGGCTGGAATAGCCAAGCTCT [XM_373075][noHits]
175: ACACATACGAAAGGCAAGAAG [XM_373075][noHits]
176: CCCGGGAGCATCTGGGACTTT [NM_023076][noHits]
177: CCTGTCAGCACCACATCCTCT [NM_023076][noHits]
178: CCTGCCGAAGCTGCACTCGCT [NM_023076][noHits]
179: CCTCATCCTCAACATCCTCAA [XM_290331][NR_003267.1]
180: GCCTGTCTTGTGTGAGGTGTT [XM_290331][NR_003267.1]
181: AGCACCATCAACCTCTACTTT [XM_290331][NR_003267.1]
182: GACTCAATAGATGTAGGGAAA [NM_001303][NM_001303.3]
183: GCCAAAGGAGATGATGCTTTA [XM_371677][noHits]
184: CGTCTGTGTGATAACAGGCAA [XM_291054][noHits]
185: AGCTGGTGAAACATGAAGAAA [XM_291054][noHits]
186: GCTAAACCAGTTCCGGAAGAA [XM_209597][noHits]
187: CCTTCCTTCTCTCGTCTGTAT [XM_376573][noHits]
188: ACTGACTCTTGATGGACACAA [XM_376573][noHits]
189: CCAGTAAATCATCTGCTATTA [NM_001076][NM_001076.2]
190: CGATAGATGGACATATAGTAT [NM_001077][NM_001077.3]
191: TGTTCGATAGATGGACATATA [NM_001077][NM_001077.3]
192: CCCAAGTTTGTGATGGACATA [XM_373373][noHits]
193: CCTCAACTACATGGTCTACAT [XM_373373][noHits]
194: CCCAAAGGAACTGGAAGACTT [XM_374855][noHits]
195: ACAAATAAGGTGGCCCTGGTA [XM_375067][noHits]
196: CAGCAGAATGTGGACCAGGCA [XM_375067][noHits]
197: GCAGGAAAGTGTCGCAAAGAT [XM_375958][noHits]
198: CCATGTACCTACCACCATCAT [XM_377597][noHits]
199: CCCTGTTGTTCTAAAGCTAAA [NM_032267][noHits]
200: CCGATTACCTTTCTTCTGTAA [NM_032267][noHits]
201: CCCTCTGAACATGAGCATCAA [NM_032267][noHits]
202: GCCAGAAATACCTTGTAACTT [NM_032267][noHits]
203: CCTGCACCATTTGGACATCAT [XM_376950][noHits]
204: CCTCGGATCGAATAACGATAA [XM_376950][noHits]
205: GCTCGCATAAATGTGAGTCTT [XM_293656][noHits]
206: CCACAGCAAATGTGATTGATA [XM_293656][noHits]
207: AGGCATGGAGATGAATGACTT [XM_373214][noHits]
208: GCCAAGCTTGCCCTGGCCTAT [XM_374694][noHits]
209: TGAATGACTTGGTGGTGAGCT [XM_374694][noHits]
210: CCACGGCATTTCAGACACTTT [NM_212553][NR_027279.1]
211: CCACCAAATATTTGGAGGCTA [NM_212553][NR_027279.1]
212: GAGTGGCAGTTCAACCACTTT [NM_212553][NR_027279.1]
213: CGAAGAACTCAATGGAGAGAA [NM_212553][NR_027279.1]
214: CAAACGCTCTAAGTTTAAGAA [XM_379892][noHits]
215: TGGTTTCAGAATGAGAGGTCA [XM_374852][noHits]
216: CCCTCCCGACACCTTCGGACA [XM_374852][noHits]
217: CCCACAACTTTCTAGCTGTTT [NM_001033][NM_001033.3]
218: GCTGAGCCTAACTATGGCAAA [NM_001033][NM_001033.3]
219: CCTGCTCAGATCACCATGAAA [NM_001033][NM_001033.3]
220: CCAATCCAGTTCACTCTAAAT [NM_001033][NM_001033.3]
221: CTGTGGTTGTATCTGTTCAAT [NM_001184][NM_001184.3]
222: GCCAAAGTATTTCTAGCCTAT [NM_001184][NM_001184.3]
223: GCCCTTAAATAAAGAAGGTAA [NM_001010][NM_001010.2]
224: CGCAAACTTCGTACTTTCTAT [NM_001010][NM_001010.2]
225: CCGCCAGTATGTTGTAAGAAA [NM_001010][NM_001010.2]
226: GCTGCAGAATATGCTAAACTT [NM_001010][NM_001010.2]
227: CGTGTCTGAGATCATGATGTA [NM_001037][NM_001037.4]
228: CCTGAAGAACTAAAGGACTTA [NM_001289][NM_001289.4]
229: GCGTCTGGATGACTACTTAAA [NM_001289][NM_001289.4]
230: CCCTGACGACAGAAGAATCAT [NM_001071][NM_001071.2]
231: GATAGCTGATGCCCTCCTTCA [NM_183003][noHits]
232: GAATCCACTTCCAACTGGCTA [XM_208356][noHits]
233: GCCTTTAATCAAGCCTGGCAT [NM_201252][NM_001145289.1,NM_201252.3]
234: CGGTGGATGTACCACCACTCA [NM_201252][NM_001145289.1,NM_201252.3]
235: GAGGGCAAGTTCGTGGAGCTT [NM_201252][NM_001145289.1,NM_201252.3]
236: CCTTGATGTCACAAAGAAGAA [XM_371837][noHits]
237: GCACCCACAGTTCTACATCAT [NM_003293][noHits]
238: CATCCAGACTGGAGCGGATAT [NM_003293][noHits]
239: CCTGCAGCAAGCGGGTATCGT [NM_003293][noHits]
240: CAGCCAGAGGGACTCCTGCAA [NM_003293][noHits]
241: GTGCTTGATGAGAATTACAAT [NM_001308][NM_001308.2]
242: CCAGGTATCTACACTGTTAGT [NM_001308][NM_001308.2]
243: CCTGAAGGAAGGTGTTGATTA [NM_001176][NM_001176.3]
244: CCTCAGTTCTGCACACAGCTA [XM_380013][noHits]
245: CTGTGCAATTTCAACATCATA [XM_208443][noHits]
246: CGCTTCCTGAATGCTGAGAAT [XM_372200][noHits]
247: CCTCAGTTTGAGCCAATAGTT [XM_372200][noHits]
248: CCTTGGAGATACCTCATCATA [XM_374801][noHits]
249: CCGTTCCAAATATGAGGAGAA [XM_495823][noHits]
250: GTGTATTGAATGCTCAGGTAT [XM_495823][noHits]
251: AGGAAATCACAAATTCAGCTA [XM_495830][noHits]
252: GTGTATTGAATGCTCAGGAAT [XM_495830][noHits]
253: GAACTCTCAAACAGATGCTTT [XM_495830][noHits]
254: CCAGCCAGCATTATCTTACAA [XM_495830][noHits]
255: GCCAAGGAGTCAAAGAACATA [XM_499367][noHits]
256: CCCTCACTGGATTCATGAGAT [XM_499367][noHits]
257: GCCCATGAAGCGCCACATCTT [XM_495884][noHits]
258: CCTGAAGGAAACGAAAGACAT [XM_496026][noHits]
259: CACAGAATTATTCCAGGGTTT [XM_292596][noHits]
260: CAACACAAATGGTTCCCAGTT [XM_292596][noHits]
261: ACCAGCAAGAAGATCACCATT [XM_371409][noHits]
262: GATGGCAAGCATGTGGTGTTT [XM_371409][noHits]
263: TGGTGACTTCACGCACCATAA [XM_379998][noHits]
264: CCCTTGGACCACGTCTCCTTT [XM_379998][noHits]
265: GCTCGCAGTATCCTAGAATCT [XM_495800][noHits]
266: GCAAAGTGAAAGAAGGCATGA [XM_495800][noHits]
267: AGTGAAAGAAGGCATGAATAT [XM_495800][noHits]
268: CAAATGCTGGACCCAACACAA [XM_495896][noHits]
269: GCCAAGACTGAGTGGTTGGAT [XM_495896][noHits]
270: GTGTGTCTCCTTTGAGCCTTT [XM_170597][XM_001717840.3,XM_001717979.3,XM_170597.8]
271: GCAAGATATGTATGTGGCTAT [XM_497732][noHits]
272: CGGAATTACCAGAATAGAGAA [XM_497732][noHits]
273: GCTTGAATTACTGTGGGCATA [XM_293886][noHits]
274: CGTGTGAATCCTCTGGGTCCT [NM_004142][noHits]
275: GCACCCTAGCCCATGCCTTCT [NM_004142][noHits]
276: GCAGGAGGAATTTGATGTATT [XM_293293][noHits]
277: CGCCATGTTCTCAGATAAGAA [NM_199345][NR_003700.1]
278: GCGGGAGTTTGATTTCTTTAA [NM_199345][NR_003700.1]
279: GCGTGAAGACATAAGCATCAT [NM_199345][NR_003700.1]
280: GCTAGCCCTGACCAGTCCTGT [NM_080789][noHits]
281: CCAGGGCTAGCCCTGACCAGT [NM_080789][noHits]
282: CAATTGTCTGAACAGCGCACT [XM_086287][NR_002930.2]
283: TCGTTGCAGGTTCGAGGCCGA [XM_372626][NR_033866.1]
284: GCCGTGGACCTGTACGAGTAT [XM_496155][noHits]
285: CGTCCCGTCCTGGGTGGGTTT [XM_496155][noHits]
286: CCAGTCAGAAACAGTTTGCTA [XM_496170][noHits]
287: CTGGTATTGAAGAGGTGAATA [XM_293984][noHits]
288: CCTGCTGTGTACCTGTGATAA [XM_070277][noHits]
289: CCCGGAGGAATTTGAGTCTTA [XM_070277][noHits]
290: GCCTTGAAGATGACATTCGCT [NM_001153][NM_001153.3]
291: CCCAACGAGTACATCCATTAT [NM_001098][NM_001098.2]
292: CCGGCTGACTACAACAAGATT [NM_001098][NM_001098.2]
293: CCTGCTAGAGAAGAACATTAA [NM_001098][NM_001098.2]
294: GCCCAAGGTCAACAGAACATT [NM_014513][NM_014513.2]
295: CTCCTCTTCTTTCTCCTTCAT [NM_014513][NM_014513.2]
296: GCGAACTTCATTGCTCCCAAA [NM_014733][NM_014733.3,NM_001105251.1]
297: CCTGAGAGAATACGTGGATAT [NM_014733][NM_014733.3,NM_001105251.1]
298: GCCAGCCATGTGGATTACTAA [NM_014733][NM_014733.3,NM_001105251.1]
299: CCGGAGATTCTTCTTTAATTT [NM_001200][NM_001200.2]
300: CCACTGGAACTGTTCCCAAAT [NM_001201][NM_001201.2]
301: CCCAAGTCCTTTGATGCCTAT [NM_001201][NM_001201.2]
302: CCAGAGCCTTATATCTTGGTA [NM_001201][NM_001201.2]
303: CTTGTTATAAAGAGGCACATA [NM_138726][NR_003088.1,NR_003087.1]
304: CCTGGTTTAGCAGAGTAATTA [NM_138726][NR_003088.1,NR_003087.1]
305: CCGGCCTTCATCGCAGTACAT [NM_031211][NR_002593.1]
306: CGCCTTCCTCAAGCTCTGGAT [NM_031211][NR_002593.1]
307: CCTTGGTGAGACATACTAGAA [NM_181429][NM_181429.1]
308: GCATGGAATTACACAAGCAAA [NM_001316][NM_001316.2]
309: CGCTGACAAGTATCTGTGAAA [NM_001316][NM_001316.2]
310: CCGTCATGAATTTAAGTCAAA [NM_001316][NM_001316.2]
311: CCGTCTTCCTATATGGCCTTA [NM_001316][NM_001316.2]
312: TCAAGTTGGAAGTGTGTCTTT [NM_006780][noHits]
313: GCTGGTATATTTGATGCCTAT [NM_032351][NM_032351.3]
314: CCTCTAATTCTGTAGGACTTT [NM_006021][noHits]
315: CAAGGGTGGATAATTACTGTA [NM_006021][noHits]
316: CCACGGGATTTCAGACACTTT [NM_001001324][noHits]
317: CCCAACGCACTTGTGATTCAT [NM_001001324][noHits]
318: CAGAAGAGAATATCGCTTCTA [NM_152302][noHits]
319: CCGGAAGAAGATGATGGAAAT [NM_001006][NM_001006.3]
320: GCCCAAGTTTGAATTGGGAAA [NM_001006][NM_001006.3]
321: GCCAAGTACAAGTTGTGCAAA [NM_001007][NM_001007.4]
322: CCACAAGTTGAGAGAGTGTCT [NM_001007][NM_001007.4]
323: CCACTCGACTTTCCAACATTT [NM_001007][NM_001007.4]
324: GCTCAGAGTGTTGTACTCGTA [NM_001012][NM_001012.1]
325: CCGTGCCCTGAGGTTGGACGT [NM_001012][NM_001012.1]
326: CTCCTGAGGAAGAAGAGATTT [NM_001012][NM_001012.1]
327: GCTGAAGCTGATCGGCGAGTA [NM_001013][NM_001013.3]
328: GCAAGATGAAGCTGGATTACA [NM_001013][NM_001013.3]
329: CCTTCATTGTCCGCCTGGATT [NM_001013][NM_001013.3]
330: GCCTGAAGATAGAGGATTTCT [NM_001013][NM_001013.3]
331: CCTGCGGGACATGATCATCCT [NM_001018][NM_001018.3]
332: CGAGCAGCTGATGCAGCTGTA [NM_001018][NM_001018.3]
333: ACATGATCATCCTACCCGAGA [NM_001018][NM_001018.3]
334: CGGTTTCATTAAGTTGGACTA [NM_001032][NM_001032.3]
335: GCACCTACATTGACAAGAAAT [NM_001015][NM_001015.3]
336: CCGAGACTATCTGCACTACAT [NM_001015][NM_001015.3]
337: GCACTACATCCGCAAGTACAA [NM_001015][NM_001015.3]
338: GATGCAGAGGACCATTGTCAT [NM_001015][NM_001015.3]
339: GCGGTAATGAAATATGGGAAA [NM_001253][NM_001253.2]
340: GCCAAGACCATCAGAAGTAAA [NM_001253][NM_001253.2]
341: GCGAGTGAAATTGCACGTCAA [NM_001253][NM_001253.2]
342: GAAGGTAACAAACCTCAACGT [NM_006249][NM_006249.4]
343: CCTCTGTTTGCACTGGACATA [NM_199283][noHits]
344: CGGCAACATTATGCTGGACAA [NM_199283][noHits]
345: CATCGACCTCTTCAAGAACAT [NM_199283][noHits]
346: GTTTCCTACTCAAGGAGAGAA [NM_199283][noHits]
347: CCCTGAAAGAATCCACAGTAA [XM_371497][noHits]
348: ACAAACCAACAAGCAGTCGAT [XM_371497][noHits]
349: CCATGGATATTCAGAGCCTCA [XM_496630][noHits]
350: GATATACCACTATGGCCACAT [XM_496630][noHits]
351: CCCTTCTGCTCATGCAGCATT [XM_496630][noHits]
352: GAGGATGAATTAAAGCCTTAT [XM_377958][noHits]
353: GCCCAATTCGAGGCTATCATT [XM_290923][noHits]
354: CGACACAATATCCCTGGACAT [XM_290923][noHits]
355: GAGAGACCTGAACCTGGAAAT [XM_497910][noHits]
356: CCTTGGTTCAAACCACAGATT [XM_372233][noHits]
357: GTGCAAGAATATGGCGACCAA [XM_372705][noHits]
358: CCCAGCGTCGTCATCGTGTTT [XM_372705][noHits]
359: CTCTCAGATGTGCATTGGAGA [XM_378155][noHits]
360: GTGCTTTGGAGACTCTGAGAT [XM_378155][noHits]
361: GTGCATTGGAGACTCTGAGAT [XM_378155][noHits]
362: GATGTGCTTTGGAGACTCTGA [XM_378155][noHits]
363: CACTGCCATCATCTAACCATT [XM_497414][noHits]
364: CTCTAGACTAACGCCACTGAT [XM_497414][noHits]
365: CTATGCTGTGAGGATGAATTA [XM_380022][noHits]
366: GCCGCCTTCTCACAACCACAA [XM_497433][noHits]
367: CGCCTCTTCAACGCGCACGCT [XM_372274][noHits]
368: CCATCGTTACAATGGCCTCTT [XM_499301][noHits]
369: CCCAACTCATATTTGGACTTT [XM_499301][noHits]
370: CCTGAAGTTCTTGTTTCTGTT [XM_375150][noHits]
371: TGCTGGAGTTTAGGAGTTATT [XM_375150][noHits]
372: GCATTGACTAATCAAAGGATT [XM_497790][noHits]
373: GAGACAATGAATTAAGGGAAA [XM_497790][noHits]
374: GCCAGAGGTTTGGCCTGCTTT [XM_497790][noHits]
375: CCTCAACCTTTACTACACATA [NM_001009883][noHits]
376: CCTCTAACCTTACTACACATA [NM_001009883][noHits]
377: CCCTGCAATATGAAGAGACAT [NM_033548][noHits]
378: CAATATGAAGAGACATGCGAT [NM_033548][noHits]
379: TGTCTCTAAGCCAGACCTGAT [NM_033548][noHits]
380: CGTGCCATCTTTAATGTTAAA [NM_199358][noHits]
381: TCTTTCAGCATTGAGAGTATT [NM_199358][noHits]
382: CCACAGATAAGATAACTCATA [NM_004876][noHits]
383: GCCTTCAGGTACATGAAGTAA [NM_001004314][NR_027049.1]
384: GCCTTCGAATACATGGACTAA [NM_001004314][NR_027049.1]
385: TCCGCACTTCTCAGAGACTTT [XM_499494][noHits]
386: GCAAGAAATACCTGAGCTTGA [XM_499494][noHits]
387: CCAACCTGCATGGACTGTGAA [NM_001306][NM_001306.3]
388: CGACCGCAAGGACTACGTCTA [NM_001306][NM_001306.3]
389: GCGCTGGAGAAATACAACAAA [XM_497418][noHits]
390: CGGCGTCAAGGTGAAGATAAT [NM_133431][NM_133430.2,NR_033256.1,NM_020411.2]
391: ACGGCCATAACTAGGGAGGAA [NM_133431][NR_033256.1]
392: CTTCGATGATATTGCCAAATA [NM_174962][noHits]
393: CCAGAGAATCATCCCGAAGAT [NM_174961][NR_027250.1]
394: CTTCAATGATATTGCCACATA [NM_174961][NR_027250.1]
395: CCAGGGATGATGATAAAGCAT [NM_174961][NR_027250.1]
396: CCTGTTCTGAGGATTCCTCTT [NM_198694][NM_198694.2]
397: CCTGCTCTAAGGATTCCTCTT [NM_198694][NM_198694.2]
398: AGAGAGAGGGAGAGAAGAGTT [XM_499454][noHits]
399: CGCCCTCGTCATCATCAGCAT [NM_001305][NM_001305.3]
400: CCAAGTATTCTGCTGCCCGCT [NM_001305][NM_001305.3]
401: GCAACATTGTCACCTCGCAGA [NM_001305][NM_001305.3]
402: TACTTTCTATGAGAAGCGTAT [NM_001010][NM_001010.2]
403: CGGCATGGACGAGCTGTACAA [n/a][noHits]
404: CTCTCGGCATGGACGAGCTGT [n/a][noHits]
405: GCGACGTAAACGGCCACAAGT [n/a][noHits]
406: GCGCGATCACATGGTCCTGCT [n/a][noHits]
407: GTCGAGCTGGACGGCGACGTA [n/a][noHits]
408: GCCACAACATCGAGGACGGCA [n/a][noHits]
409: AGAATCGTCGTATGCAGTGAA [n/a][noHits]
410: TGAGTACTTCGAAATGTCCGT [n/a][noHits]
411: GCTGCAGAATATGCTAAACTT [NM_001010][NM_001010.2]
412: GCCAAGTACAAGTTGTGCAAA [NM_001007][NM_001007.4]
413: AGAATCGTCGTATGCAGTGAA [n/a][noHits]
414: CCGCCAGTATGTTGTAAGAAA [NM_001010][NM_001010.2]
415: CCTTCATTGTCCGCCTGGATT [NM_001013][NM_001013.3]
416: GCAAGATGAAGCTGGATTACA [NM_001013][NM_001013.3]
417: GCCTGAAGATAGAGGATTTCT [NM_001013][NM_001013.3]
418: CAAGCAAGTAGCCTCCGAGAT [NM_001348][NM_001348.1]
419: AGATTGTGAACTATGAGCCGC [NM_001348][NM_001348.1]
420: CGTCTGAAGGAGTACACCATC [NM_001348][NM_001348.1]

@kcs3
Copy link
Collaborator

kcs3 commented Nov 14, 2019

In the above list of transcripts not found, the first entry is
1: CCTCGATACAGCATTGGGTTA [NM_001203][NM_001203.2]
I checked the protein source data, which is the file uniprot_sprot_human.dat. (Despite the .dat ending, it is a text file). There is an entry in that file for NM_001203, where it appears on line 661480(!) in the record:
DR RefSeq; NP_001194.1; NM_001203.2. [O00238-1]
This record corresponds to the gene BMPR1B:
GN Name=BMPR1B;
which is found in the Dashboard.

So it would be interesting to know why the connection is breaking down, as searching on NM_001203 does find the shrna results.

@zhouji2013
Copy link
Collaborator

Upon further investigation, I found the above list I posted of failed matching are not all because there is no match. Instead, many of these failed because there are multiple matches. For example, NM_001203 is such a case of multiple matches. On the other hand, the original case that started this issue, NM_024924, is indeed a case of no match.

The reason of multiple matches is surprising. The way to decide a match is not by exact match of refseqId but by the beginning part of refseqId. For example, NM_001203 matches NM_001203247, NM_001203249, NM_001203248, etc. total 13 matches. I don't know why it is done this way, but it is clearly done intentionally in the implementation.

@zhouji2013
Copy link
Collaborator

This issue has 'evolved' away from the original reported problem. The original title was accurate but now has little to do with the discussion in the comments. We should re-organize what we want to change here, preferably as new issues or a new proposal.

@zhouji2013
Copy link
Collaborator

We should close this issue and create new ones that have more specific goals.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants