Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

do we need an SRA importer? #207

Open
bradfordcondon opened this issue Feb 28, 2019 · 1 comment
Open

do we need an SRA importer? #207

bradfordcondon opened this issue Feb 28, 2019 · 1 comment

Comments

@bradfordcondon
Copy link
Contributor

casey was loading PRJDB4532. no biosample is loaded because its not linked to the project in the returned XML.

Maybe we want to be able to load the SRA experiment? And by loading that you'd load the linked project and biosample?

here's the XML:

<?xml version="1.0" ?>
<EXPERIMENT_PACKAGE_SET>
  <EXPERIMENT_PACKAGE>
    <EXPERIMENT alias="DRX049157" center_name="NIFTS" accession="DRX049157">
      <IDENTIFIERS>
        <PRIMARY_ID>DRX049157</PRIMARY_ID>
      </IDENTIFIERS>
      <TITLE>454 GS FLX+ sequencing of SAMD00046318</TITLE>
      <STUDY_REF refname="DRP003980" refcenter="NIFTS" accession="DRP003980">
        <IDENTIFIERS>
          <PRIMARY_ID>DRP003980</PRIMARY_ID>
          <EXTERNAL_ID namespace="BioProject" label="BioProject ID">PRJDB4532</EXTERNAL_ID>
        </IDENTIFIERS>
      </STUDY_REF>
      <DESIGN><DESIGN_DESCRIPTION/>
        <SAMPLE_DESCRIPTOR refname="DRS057276" refcenter="NIFTS" accession="DRS057276">
          <IDENTIFIERS>
            <PRIMARY_ID>DRS057276</PRIMARY_ID>
            <EXTERNAL_ID namespace="BioSample" label="BioSample ID">SAMD00046318</EXTERNAL_ID>
          </IDENTIFIERS>
        </SAMPLE_DESCRIPTOR>
        <LIBRARY_DESCRIPTOR><LIBRARY_NAME/>
          <LIBRARY_STRATEGY>WGS</LIBRARY_STRATEGY>
          <LIBRARY_SOURCE>GENOMIC</LIBRARY_SOURCE>
          <LIBRARY_SELECTION>RANDOM</LIBRARY_SELECTION>
          <LIBRARY_LAYOUT><SINGLE/></LIBRARY_LAYOUT><LIBRARY_CONSTRUCTION_PROTOCOL/></LIBRARY_DESCRIPTOR>
        <SPOT_DESCRIPTOR>
          <SPOT_DECODE_SPEC>
            <SPOT_LENGTH>677</SPOT_LENGTH>
            <READ_SPEC>
              <READ_INDEX>0</READ_INDEX>
              <READ_CLASS>Application Read</READ_CLASS>
              <READ_TYPE>Forward</READ_TYPE>
              <BASE_COORD>1</BASE_COORD>
            </READ_SPEC>
          </SPOT_DECODE_SPEC>
        </SPOT_DESCRIPTOR>
      </DESIGN>
      <PLATFORM>
        <LS454>
          <INSTRUMENT_MODEL>454 GS FLX+</INSTRUMENT_MODEL>
        </LS454>
      </PLATFORM>
    </EXPERIMENT>
    <SUBMISSION lab_name="Genome Unit, NARO Institute of Fruit Tree Science" alias="DRA004360" center_name="NIFTS" accession="DRA004360">
      <IDENTIFIERS>
        <PRIMARY_ID>DRA004360</PRIMARY_ID>
      </IDENTIFIERS>
    </SUBMISSION>
    <Organization type="center">
      <Name abbr="NIFTS">NIFTS</Name>
    </Organization>
    <STUDY center_name="NIFTS" alias="DRP003980" accession="DRP003980">
      <IDENTIFIERS>
        <PRIMARY_ID>DRP003980</PRIMARY_ID>
        <EXTERNAL_ID namespace="BioProject" label="primary">PRJDB4532</EXTERNAL_ID>
      </IDENTIFIERS>
      <DESCRIPTOR>
        <STUDY_TITLE>Genome sequencing of mango (Mangifera indica) cultivar ''Irwin''</STUDY_TITLE><STUDY_TYPE existing_study_type="Whole Genome Sequencing"/>
        <STUDY_ABSTRACT>This genome was sequenced to search and construct mango genomic DNA markers. Cultivar ''Irwin'' is leading cultivar in Japan.</STUDY_ABSTRACT>
      </DESCRIPTOR>
    </STUDY>
    <SAMPLE alias="SAMD00046318" accession="DRS057276">
      <IDENTIFIERS>
        <PRIMARY_ID>DRS057276</PRIMARY_ID>
        <EXTERNAL_ID namespace="BioSample">SAMD00046318</EXTERNAL_ID>
      </IDENTIFIERS>
      <TITLE>Irwin</TITLE>
      <SAMPLE_NAME>
        <TAXON_ID>29780</TAXON_ID>
        <SCIENTIFIC_NAME>Mangifera indica</SCIENTIFIC_NAME>
      </SAMPLE_NAME>
      <SAMPLE_ATTRIBUTES>
        <SAMPLE_ATTRIBUTE>
          <TAG>sample_name</TAG>
          <VALUE>HXXQCLF01</VALUE>
        </SAMPLE_ATTRIBUTE>
        <SAMPLE_ATTRIBUTE>
          <TAG>cultivar</TAG>
          <VALUE>Irwin</VALUE>
        </SAMPLE_ATTRIBUTE>
        <SAMPLE_ATTRIBUTE>
          <TAG>biomaterial_provider</TAG>
          <VALUE>Okinawa Prefectural Agricultural Research Center</VALUE>
        </SAMPLE_ATTRIBUTE>
        <SAMPLE_ATTRIBUTE>
          <TAG>collection_date</TAG>
          <VALUE>2013</VALUE>
        </SAMPLE_ATTRIBUTE>
        <SAMPLE_ATTRIBUTE>
          <TAG>env_biome</TAG>
          <VALUE>subtropical</VALUE>
        </SAMPLE_ATTRIBUTE>
        <SAMPLE_ATTRIBUTE>
          <TAG>env_feature</TAG>
          <VALUE>farm</VALUE>
        </SAMPLE_ATTRIBUTE>
        <SAMPLE_ATTRIBUTE>
          <TAG>env_material</TAG>
          <VALUE>soil</VALUE>
        </SAMPLE_ATTRIBUTE>
        <SAMPLE_ATTRIBUTE>
          <TAG>geo_loc_name</TAG>
          <VALUE>Japan</VALUE>
        </SAMPLE_ATTRIBUTE>
        <SAMPLE_ATTRIBUTE>
          <TAG>lat_lon</TAG>
          <VALUE>26.1108 N 127.6861 E</VALUE>
        </SAMPLE_ATTRIBUTE>
        <SAMPLE_ATTRIBUTE>
          <TAG>project_name</TAG>
          <VALUE>DNA marker identification from DNA sequences</VALUE>
        </SAMPLE_ATTRIBUTE>
        <SAMPLE_ATTRIBUTE>
          <TAG>isol_growth_condt</TAG>
          <VALUE>23341750</VALUE>
        </SAMPLE_ATTRIBUTE>
        <SAMPLE_ATTRIBUTE>
          <TAG>num_replicons</TAG>
          <VALUE>20</VALUE>
        </SAMPLE_ATTRIBUTE>
        <SAMPLE_ATTRIBUTE>
          <TAG>estimated_size</TAG>
          <VALUE>400 Mbp</VALUE>
        </SAMPLE_ATTRIBUTE>
        <SAMPLE_ATTRIBUTE>
          <TAG>ploidy</TAG>
          <VALUE>diploid</VALUE>
        </SAMPLE_ATTRIBUTE>
        <SAMPLE_ATTRIBUTE>
          <TAG>propagation</TAG>
          <VALUE>asexual</VALUE>
        </SAMPLE_ATTRIBUTE>
        <SAMPLE_ATTRIBUTE>
          <TAG>health_disease_stat</TAG>
          <VALUE>health</VALUE>
        </SAMPLE_ATTRIBUTE>
        <SAMPLE_ATTRIBUTE>
          <TAG>trophic_level</TAG>
          <VALUE>photosynthetic</VALUE>
        </SAMPLE_ATTRIBUTE>
        <SAMPLE_ATTRIBUTE>
          <TAG>BioSampleModel</TAG>
          <VALUE>MIGS.eu</VALUE>
        </SAMPLE_ATTRIBUTE>
      </SAMPLE_ATTRIBUTES>
    </SAMPLE>
    <Pool>
      <Member member_name="" accession="DRS057276" sample_name="SAMD00046318" sample_title="Irwin" spots="1513701" bases="1650906230" tax_id="29780" organism="Mangifera indica">
        <IDENTIFIERS>
          <PRIMARY_ID>DRS057276</PRIMARY_ID>
          <EXTERNAL_ID namespace="BioSample">SAMD00046318</EXTERNAL_ID>
        </IDENTIFIERS>
      </Member>
    </Pool>
    <RUN_SET>
      <RUN alias="DRR054308" center_name="NIFTS" accession="DRR054308" total_spots="724319" total_bases="790234730" size="1854856875" load_done="true" published="2018-01-10 04:26:33" is_public="true" cluster_name="public" static_data_available="1">
        <IDENTIFIERS>
          <PRIMARY_ID>DRR054308</PRIMARY_ID>
        </IDENTIFIERS>
        <TITLE>454 GS FLX+ sequencing of SAMD00046318</TITLE><EXPERIMENT_REF refname="DRX049157" refcenter="NIFTS" accession="DRX049157"/>
        <Pool>
          <Member member_name="" accession="DRS057276" sample_name="SAMD00046318" sample_title="Irwin" spots="724319" bases="790234730" tax_id="29780" organism="Mangifera indica">
            <IDENTIFIERS>
              <PRIMARY_ID>DRS057276</PRIMARY_ID>
              <EXTERNAL_ID namespace="BioSample">SAMD00046318</EXTERNAL_ID>
            </IDENTIFIERS>
          </Member>
        </Pool>
        <Statistics nreads="1" nspots="724319"><Read index="0" count="724319" average="1091.00" stdev="197.60"/></Statistics>
        <Bases cs_native="false" count="790234730"><Base value="A" count="252681330"/><Base value="C" count="133152082"/><Base value="G" count="139436462"/><Base value="T" count="253440531"/><Base value="N" count="11524325"/></Bases>
      </RUN>
      <RUN alias="DRR054307" center_name="NIFTS" accession="DRR054307" total_spots="789382" total_bases="860671500" size="2009178726" load_done="true" published="2018-01-10 04:26:33" is_public="true" cluster_name="public" static_data_available="1">
        <IDENTIFIERS>
          <PRIMARY_ID>DRR054307</PRIMARY_ID>
        </IDENTIFIERS>
        <TITLE>454 GS FLX+ sequencing of SAMD00046318</TITLE><EXPERIMENT_REF refname="DRX049157" refcenter="NIFTS" accession="DRX049157"/>
        <Pool>
          <Member member_name="" accession="DRS057276" sample_name="SAMD00046318" sample_title="Irwin" spots="789382" bases="860671500" tax_id="29780" organism="Mangifera indica">
            <IDENTIFIERS>
              <PRIMARY_ID>DRS057276</PRIMARY_ID>
              <EXTERNAL_ID namespace="BioSample">SAMD00046318</EXTERNAL_ID>
            </IDENTIFIERS>
          </Member>
        </Pool>
        <Statistics nreads="1" nspots="789382"><Read index="0" count="789382" average="1090.31" stdev="191.37"/></Statistics>
        <Bases cs_native="false" count="860671500"><Base value="A" count="277391866"/><Base value="C" count="144641177"/><Base value="G" count="150659961"/><Base value="T" count="276069965"/><Base value="N" count="11908531"/></Bases>
      </RUN>
    </RUN_SET>
  </EXPERIMENT_PACKAGE>
</EXPERIMENT_PACKAGE_SET>
@bradfordcondon
Copy link
Contributor Author

bradfordcondon commented Mar 1, 2019

Unlike assembly, SRAs are typically one of many records that are all grouped together.

For a single SRA (which is defined as anWe have a run, a library, a sample (biosample), a study.

https://www.ncbi.nlm.nih.gov/sra/SRX5431186[accn]

https://www.ncbi.nlm.nih.gov/Traces/study/?WebEnv=NCID_1_23786308_130.14.22.76_5555_1551459537_3256482250_0MetA0_S_HStore&query_key=5

https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?study=SRP187033

So SRP187033 (the project study) is part of project PRJNA552953. it consists of 6 experiments and 6 runs.
https://trace.ncbi.nlm.nih.gov/Traces/study/?acc=SRP187033
the runs are SRR8643699..704, with diferent biosoamples and experiments (SRX5441940...) as well.

SRA defines what it calls analysis for example https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?analysis=DRZ000001 . this is part of a study (in this case DRP000072).

magno example

Project PRJDB4532
HAS
SRA EXPERIMENT record DRX049157
HAS

Study: DRP003980

"SRA Sample" DRS057276 (not linked rom the SRA record but can be found associated with the sample with a broken linkout).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant