- Introduction
- Environment Variables
- Admin Properties
- Taxonomy List
- Subject Data
- Submission Data
- Submission Metadata
- Admin Tool Usage
This page describes how to properly build and use the CTD^2 Dashboard admin tool.
The following environment variables are referenced in this document and should be defined for the proper functioning of the admin tool:
- CTD2_HOME: points to the directory in which the entire dashboard source code repository has been downloaded.
- CTD2_DATA_HOME: points to the directory which contains dashboard data to be imported.
To make environment variables available to your shell, run the EXPORT command:
#!shell
EXPORT CTD2_HOME=/path/to/ctd2-dashboard
EXPORT CTD2_DATA_HOME=/path/to/ctd2-dashboard-data
In order for the admin tool to properly load data into the CTD^2 dashboard database, it needs to know the location of the data. This is the function of $CTD2_HOME/admin/src/main/resources/META-INF-spring/admin.properties file. This file need to exist and have proper values before compiling the admin tool. In the source distribution you will find $CTD2_HOME/admin/src/main/resources/META-INF-spring/admin.properties.example to use as a basis for your admin.properties file. More information about these properties can be found in the Subject Data and Submission Data sections of this document.
All desired organisms should be listed in $CTD2_HOME/admin/src/main/resources/simple-taxonomy-list.txt. This file follows a simple name, taxonomy_id format:
#!shell
name taxonomy_id
Homo sapiens 9606
Mus musculus 10090
Subjects in the CTD^2 Dashboard are those entities that play various roles in experiments conducted by CTD^2 network centers which result in submission data that you will find in the CTD^2 Dashboard. Subject data includes gene, protein, and compound data. Subject data needs to be imported into the Dashboard database before CTD^2 network center data can be imported. With the exception of gene and protein data, all the required subject data can be found in the ctd2-dashboard-seed.zip distribution. This file should be downloaded an unzipped into $CTD2_DASHBOARD_DATA.
+The following subject data and sources are support for import by the admin tool:
- Gene: Gene data as provided by Entrez. This data can be downloaded via ftp at the following URL: ftp://ftp.ncbi.nih.gov//gene/DATA/GENE_INFO/. The gene_data file should be downloaded into $CTD2_DASHBOARD_DATA/subject_data/gene. If this file is placed in any other directory, the following entry in admin.properties needs to be update:
#!shell
gene.data.location=file:${CTD2_DATA_HOME}/subject_data/gene/*.gene_info
- Animal Model: Animal Model data as provided by the Clemons Group at the Broad Institute. After downloading and unzipping ctd2-dashboard-seed.zip, this data can be found in $CTD2_DASHBOARD_DATA/subject_data/animal_model. The following entries in admin.properties specify the location of animal model data:
#!shell
animal.model.location=file:${CTD2_DATA_HOME}/subject_data/animal_model/animal_model.txt
- Cell Line: Cell Line data as provided by the Clemons Group at the Broad Institute. After downloading and unzipping ctd2-dashboard-seed.zip, this data can be found in $CTD2_DASHBOARD_DATA/subject_data/cell_sample. The following entries in admin.properties specify the location of cell line data:
#!shell
cell.line.name.type.location=file:${CTD2_DATA_HOME}/subject_data/cell_sample/cell_name_type.txt
cell.line.annotation.type.location=file:${CTD2_DATA_HOME}/subject_data/cell_sample/cell_anno_type.txt
cell.line.annotation.name.location=file:${CTD2_DATA_HOME}/subject_data/cell_sample/cell_anno_name.txt
cell.line.annotation.source.location=file:${CTD2_DATA_HOME}/subject_data/cell_sample/cell_anno_source.txt
cell.line.annotation.sample.location=file:${CTD2_DATA_HOME}/subject_data/cell_sample/cell_anno.txt
cell.line.id.location=file:${CTD2_DATA_HOME}/subject_data/cell_sample/cell_sample.txt
cell.line.name.location=file:${CTD2_DATA_HOME}/subject_data/cell_sample/cell_sample_name.txt
- Compounds: Compound data as provided by the Clemons Group at the Broad Institute. After downloading and unzipping ctd2-dashboard-seed.zip, this data can be found in $CTD2_DASHBOARD_DATA/subject_data/compound. The following entries in admin.properties specify the location of compound data:
#!shell
compounds.location=file:${CTD2_DATA_HOME}/subject_data/compound/Compounds.txt
compound.synonyms.location=file:${CTD2_DATA_HOME}/subject_data/compound/CompoundSynonyms.txt
- Proteins: Protein data as provided by UniProt. This data can be downloaded via ftp at the following URL: ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/taxonomic_divisions/. The UniProt data files should be downloaded into $CTD2_DASHBOARD_DATA/subject_data/protein. If multiple UniProt files are downloaded, the following entry in admin.properties needs to be set:
#!shell
protein.data.location=file:${CTD2_DATA_HOME}/protein/uniprot_sprot_*.dat
otherwise the specific file can be reference:
#!shell
protein.data.location=file:${CTD2_DATA_HOME}/subject_data/protein/uniprot_sprot_human.dat
- shRNA: shRNA data as provide by the RNAi Consortium at the Broad Institute. After downloading and unzipping ctd2-dashboard-seed.zip, a subset of this data can be found in $CTD2_DASHBOARD_DATA/subject_data/shrna. The following entries in admin.properties specify the location of shRNA data:
#!shell
trc.shrna.data.location=file:${CTD2_DATA_HOME}/subject_data/shrna/trc_public.05Apr11.txt
trc.shrna.filter.location=file:${CTD2_DATA_HOME}/subject_data/shrna/trc-shrnas-filter.txt
- siRNA: After downloading and unzipping ctd2-dashboard-seed.zip, a subset of this data can be found in $CTD2_DASHBOARD_DATA/subject_data/sirna. The following entries in admin.properties specify the location of siRNA data:
#!shell
sirna.reagents.location=file:${CTD2_DATA_HOME}/subject_data/sirna/siRNA_reagents.txt
- Tissue Sample: Tissue Sample data as provided by the Clemons Group at the Broad Institute. After downloading and unzipping ctd2-dashboard-seed.zip, this data can be found in $CTD2_DASHBOARD_DATA/subject_data/tissue_sample. The following entries in admin.properties specify the location of tissue-sample data:
#!shell
tissue.sample.data.location=file:${CTD2_DATA_HOME}/subject_data/tissue_sample/tissue_sample_name.txt
As previously noted, the data that results from the experiments performed by CTD^2 network centers which makes its way into the CTD^2 dashboard database is called submission data. After downloading and unzipping ctd2-dashboard-seed.zip, submission data can be found in $CTD2_DASHBOARD_DATA/subject_data. For each center-submission pair is a property within admin.properties that specifies the location of the data:
#!shell
broad.cmp.sens.lineage.enrich.data.location=file:${CTD2_DATA_HOME}/submissions/20130328-broad_cpd_sens_lineage_enrich-MST-312/20130328-broad_cpd_sens_lineage_enrich-MST-312.txt
broad.cmp.sens.mutation.enrich.data.location=file:${CTD2_DATA_HOME}/submissions/20130328-broad_cpd_sens_mutation_enrich-navitoclax/20130328-broad_cpd_sens_mutation_enrich-navitoclax.txt
broad.tier3.navitoclax.story.data.location=file:${CTD2_DATA_HOME}/submissions/20130402-broad_tier3_navitoclax_story/20130402-broad_tier3_navitoclax_story.txt
columbia.marina.analysis.data.location=file:${CTD2_DATA_HOME}/submissions/20130402-columbia_marina_analysis-T-ALL/20130402-columbia_marina_analysis-T-ALL.txt
columbia.mra.fet.analysis.data.location=file:${CTD2_DATA_HOME}/submissions/20130403-columbia_mra_fet_analysis-glioma/20130403-columbia_mra_fet_analysis-glioma.txt
columbia.joint.mr.shrna.diff.analysis.data.location=file:${CTD2_DATA_HOME}/submissions/20130401-columbia_joint_mr_shrna_diff-T-ALL/20130401-columbia_joint_mr_shrna_diff-T-ALL.txt
columbia.tier4.glioma.story.data.location=file:${CTD2_DATA_HOME}/submissions/20130401-columbia_tier4_glioma_story/20130401-columbia_tier4_glioma_story.txt
cshl.tier4.fgf19.story.data.location=file:${CTD2_DATA_HOME}/submissions/20130403-cshl_tier4_fgf19_story/20130403-cshl_tier4_fgf19_story.txt
dfci.tier4.beta.catenin.story.data.location=file:${CTD2_DATA_HOME}/submissions/20130401-dfci_tier4_beta-catenin_story/20130401-dfci_tier4_beta-catenin_story.txt
dfci.reporter.analysis.data.location=file:${CTD2_DATA_HOME}/submissions/20130426-dfci_reporter_analysis-bcat/20130426-dfci_reporter_analysis-bcat.txt
dfci.ataris.analysis.data.location=file:${CTD2_DATA_HOME}/submissions/20130429-dfci_ataris_analysis/20130429-dfci_ataris_analysis.txt
dfci.ovarian.analysis.data.location=file:${CTD2_DATA_HOME}/submissions/20130426-dfci_ovarian_analysis/20130426-dfci_ovarian_analysis.txt
dfci.pax8.tier3.data.location=file:${CTD2_DATA_HOME}/submissions/20130429-dfci_pax8_tier3/20130429-dfci_pax8_tier3.txt
emory.ppi-raf1.data.location=file:${CTD2_DATA_HOME}/submissions/20131220-emory_PPI_analysis-RAF1/20131220-emory_PPI_analysis-RAF1.txt
fhcrc.tier1.cst.profiling.data.location=file:${CTD2_DATA_HOME}/submissions/20131117-fhcrc-m_tier1_cst_profiling-SOC/20131117-fhcrc-m_tier1_cst_profiling-SOC.txt
utsw.tier2.discoipyrroles.story.data.location=file:${CTD2_DATA_HOME}/submissions/20130921-utsw_discoipyrrole_tier2_story/20130921-utsw_discoipyrrole_tier2_story.txt
utsw.tier4.discoipyrroles.story.data.location=file:${CTD2_DATA_HOME}/submissions/20130503-utsw_tier4_discoipyrroles_story/20130503-utsw_tier4_discoipyrroles_story.txt
ucsf.differential-expression.data.location=file:${CTD2_DATA_HOME}/submissions/20140124-ucsf_differential_expression/20140124-ucsf_differential_expression.txt
The dashboard imports metadata for each submission. After downloading and unzipping ctd2-dashboard-seed.zip, this metadata can be found in the following two files:
-
$CTD2_DATA_HOME/dashboard-CV-per-template.txt: Every dashboard submission is derived from an underlying template. dashboard-CV-per-template.txt is the file that contains metadata for all submission templates known to the dashboard. For each template, it contains the following information:
-
$CTD2_DATA_HOME/dashboard-CV-per-column.txt: This file describes the experimental data and the relationships between the experimental data that each submission data template was designed to capture.
The CTD^2 dashboard pipeline has been developed using the Spring Batch framework. For each new submission the following Spring Batch configuration files need to be modified:
- $CTD2_HOME/admin/src/main/resources/META-INF/spring/observationDataApplicationContext.xml: This file configures a Spring Batch reader to read the new submission data. For example, here is a snippet from this file which defines the Emory University PPI analysis submission reader:
#!shell
<bean name="emoryPPIRAF1Reader" class="org.springframework.batch.item.file.MultiResourceItemReader">
<property name="resources" value="${emory.ppi-raf1.data.location}" />
<property name="delegate">
<bean class="org.springframework.batch.item.file.FlatFileItemReader">
<property name="lineMapper" ref="emoryPPIRAF1LineMapper" />
<property name="linesToSkip" value="7" />
</bean>
</property>
</bean>
The important thing to note is the resource location, emory.ppi-raf1.data.location. This should correspond to an entry in admin.properties as described above. In addition, we have a reference to emoryPPIRAF1LineMapper. This is a reference to the Spring bean which is responsible for the parsing of each line in the Emory University submission. This mapper is defined in the following section.
- $CTD2_HOME/admin/src/main/resources/META-INF/spring/observationDataSharedApplicationContext.xml: This file configures the overall Spring Batch job in addition to the individual submission line mappers and tokenizers. Another important reason for this file is to configure the mappings between the Spring Batch submission processors and the DashboardDao - data access class. Continuing with our Emory University, within the "observationDataImportJob" recipe, you will find the following snippet:
#!shell
<batch:step id="emoryPPIRAF1Step" parent="observationDataStep" next="mskccForetinibStep">
<batch:tasklet>
<batch:chunk reader="emoryPPIRAF1Reader" processor="observationDataProcessor" writer="observationDataWriter"/>
</batch:tasklet>
</batch:step>
Here we are defining the Emory submission processing step within the overall Dashboard submission processing Spring Batch job (observationDataImportJob). Following the job description are the definitions for each submission line mapper and tokenizer. The following snippet defines the Emory University submission line mapper and tokenizer:
#!shell
<bean name="emoryPPIRAF1LineMapper" class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
<property name="fieldSetMapper" ref="observationDataMapper" />
<property name="lineTokenizer" ref="emoryPPIRAF1LineTokenizer" />
</bean>
<bean name="emoryPPIRAF1LineTokenizer" class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer" >
<property name="delimiter" value="\u0009"/>
<property name="names" value="dummy,submission_name,submission_date,template_name,cell_line,gene_symbol_1,gene_symbol_2,assay_type,number_of_measurements,average_fold_over_control_value,p_value,nci_portal"/>
</bean>
The important thing to note here is the "names" property. Here is listing of all column headers found in the Emory University submission template. The first entry is always a dummy placeholder to take into account the metadata labels found in column one of each submission template.
Finally, the mapping between the Spring Batch submission processors and the DashboardDao is defined within the "observationTemplateMap" bean, which is defined after all the line mappers and tokenizers. Here is a snippet which defines these mappings for the Emory University submission:
#!shell
<entry key="emory_PPI_analysis:cell_line" value="subject:findSubjectsBySynonym" />
<entry key="emory_PPI_analysis:gene_symbol_1" value="subject:findGenesBySymbol" />
<entry key="emory_PPI_analysis:gene_symbol_2" value="subject:findGenesBySymbol" />
<entry key="emory_PPI_analysis:assay_type" value="evidence:readString:createObservedLabelEvidence" />
<entry key="emory_PPI_analysis:number_of_measurements" value="evidence:readInt:createObservedNumericEvidence" />
<entry key="emory_PPI_analysis:average_fold_over_control_value" value="evidence:readDouble:createObservedNumericEvidence" />
<entry key="emory_PPI_analysis:p_value" value="evidence:readDouble:createObservedNumericEvidence" />
<entry key="emory_PPI_analysis:nci_portal" value="evidence:readString:createObservedUrlEvidence" />
For each column header in the Emory university submission template, there is an entry in the observationTemplateMap. The key is a combination of the submission template name and column header. The value is a combination of the following attributes:
- Submission Attribute Type: The type of the submission attribute, either 'evidence' or 'subject'.
If the submission attribute type is 'subject':
- DashboardDao Method: The DashboardDao method used to find the subject in the database. Typically one of the following:
- findCompoundsByName
- findTissueSampleByName
- findGenesByEntrezId
- findGenesBySymbol
- findSubjectsBySynonym (used to find cell lines)
- findAnimalModelByName
If the submission attribute type is 'evidence':
- Evidence Read Method: Either readString, readDouble, readInt.
- Evidence Constructor: The name of the method used to create the observed evidence entry in the database. The following methods are supported: createObservedLabelEvidence, createObservedNumericEvidence, createObservedFileEvidence, and createObservedUrlEvidence.
The admin tool is a command line java application. After building the admin tool from the source distribution, dashboard-admin.jar can be found within $CTD2_DASHBOARD_HOME/admin/target. A list of commands that are recognized by the admin tool can be found by running the following command:
#!shell
$JAVA_HOME/bin/java -jar $CTD2_DASHBOARD_HOME/admin/target/dashboard-admin.jar -h
The following commands are recognized by the admin tool:
This command is used to import animal model data subject data.
Example usage:
#!shell
$JAVA_HOME/bin/java -jar $CTD2_DASHBOARD_HOME/admin/target/dashboard-admin.jar -am
This command is used to import cell line subject data.
Example usage:
#!shell
$JAVA_HOME/bin/java -jar $CTD2_DASHBOARD_HOME/admin/target/dashboard-admin.jar -cl
This command is used to import compound subject data.
Example usage:
#!shell
$JAVA_HOME/bin/java -jar $CTD2_DASHBOARD_HOME/admin/target/dashboard-admin.jar -cp
This command is used to import submission metadata.
Example usage:
#!shell
$JAVA_HOME/bin/java -jar $CTD2_DASHBOARD_HOME/admin/target/dashboard-admin.jar -cv
This command is used to import gene subject data.
Example usage:
#!shell
$JAVA_HOME/bin/java -jar $CTD2_DASHBOARD_HOME/admin/target/dashboard-admin.jar -g
This command is used to create a lucene index for free text searching.
Example usage:
#!shell
$JAVA_HOME/bin/java -jar $CTD2_DASHBOARD_HOME/admin/target/dashboard-admin.jar -i
This command is used to rank subjects based on their observations (pre-processing for web site)
Example usage:
#!shell
$JAVA_HOME/bin/java -jar $CTD2_DASHBOARD_HOME/admin/target/dashboard-admin.jar -r
This command is used to import submission data.
Example usage:
#!shell
$JAVA_HOME/bin/java -jar $CTD2_DASHBOARD_HOME/admin/target/dashboard-admin.jar -o
This command is used to import protein subject data.
Example usage:
#!shell
$JAVA_HOME/bin/java -jar $CTD2_DASHBOARD_HOME/admin/target/dashboard-admin.jar -p
This command is used to import shRNA subject data.
Example usage:
#!shell
$JAVA_HOME/bin/java -jar $CTD2_DASHBOARD_HOME/admin/target/dashboard-admin.jar -sh
This command is used to import siRNA subject data.
Example usage:
#!shell
$JAVA_HOME/bin/java -jar $CTD2_DASHBOARD_HOME/admin/target/dashboard-admin.jar -si
This command is used to import taxonomy data.
Example usage:
#!shell
$JAVA_HOME/bin/java -jar $CTD2_DASHBOARD_HOME/admin/target/dashboard-admin.jar -t
This command is used to import tissue sample subject data.
Example usage:
#!shell
$JAVA_HOME/bin/java -jar $CTD2_DASHBOARD_HOME/admin/target/dashboard-admin.jar -ts
In a typical dashboard database build, the following sequence of commands would be followed:
#!shell
$JAVA_HOME/bin/java -jar $CTD2_HOME/admin/target/dashboard-admin.jar -t
$JAVA_HOME/bin/java -jar $CTD2_HOME/admin/target/dashboard-admin.jar -am
$JAVA_HOME/bin/java -jar $CTD2_HOME/admin/target/dashboard-admin.jar -cl
$JAVA_HOME/bin/java -jar $CTD2_HOME/admin/target/dashboard-admin.jar -ts
$JAVA_HOME/bin/java -jar $CTD2_HOME/admin/target/dashboard-admin.jar -cp
$JAVA_HOME/bin/java -jar $CTD2_HOME/admin/target/dashboard-admin.jar -g
$JAVA_HOME/bin/java -jar $CTD2_HOME/admin/target/dashboard-admin.jar -p
$JAVA_HOME/bin/java -jar $CTD2_HOME/admin/target/dashboard-admin.jar -sh
$JAVA_HOME/bin/java -jar $CTD2_HOME/admin/target/dashboard-admin.jar -si
$JAVA_HOME/bin/java -jar $CTD2_HOME/admin/target/dashboard-admin.jar -cv
$JAVA_HOME/bin/java -jar $CTD2_HOME/admin/target/dashboard-admin.jar -o
$JAVA_HOME/bin/java -jar $CTD2_HOME/admin/target/dashboard-admin.jar -i
$JAVA_HOME/bin/java -jar $CTD2_HOME/admin/target/dashboard-admin.jar -r