SoybeanGDB is a comprehensive genome database to accelerate functional genomic and population genetic studies in soybean.
SoybeanGDB employs 39 high-quality soybean genomes, 15,446,616 high-quality SNPs (single nucleotide polymorphism) and 4,136,231 high-quality Indels (small insertions/deletions) among 2898 soybean accessions identified based on next-generation sequencing data. To help users, a variety of versatile analytic tools including JBrowse, BLAST, GO/KEGG annotation, GO/KEGG enrichment analysis, Primer designing, etc., are implemented in SoybeanGDB. SoybeanGDB is deployed at (https://venyao.xyz/SoybeanGDB/) for online use.
The homepage of SoybeanGDB displays the main functionalities of SoybeanGDB (Figure 1).
(1) Search 39 high-quality soybean genomes by gene IDs.
(2) Search 39 high-quality soybean genomes by genome locations.
(3) Search 39 high-quality soybean genomes using BLAST.
(4) Browse transcription factors/regulators in a genome.
(5) Browse syntenic regions between different soybean genomes.
(6) Browse structural variations by location.
(7) JBrowse visualization of 39 high-quality soybean genomes.
(8) Browse and visualize high-quality SNPs among 2898 soybean accessions.
(9) Search and retrieve SNPs among 2898 soybean accessions.
(10) Conduct linkage disequilibrium analysis between SNPs in a genomic region.
(11) Conduct nucleotide diversity analysis among different groups of soybean accessions to identify genes under selection during domestication and modern breeding.
(12) Calculate and visualize allele frequency of user-input SNP sites.
(13) Search and retrieve INDELs among 2898 soybean accessions.
(14) Conduct expression and co-expression analysis of genes in the genome of Zhonghuang 13.
(15) Design primers based on the Zhonghuang 13 genome targeting SNPs and Indels in user-input genomic region.
(16) Search and retrieve orthologous gene groups among 39 soybean genomes.
(17) Conduct GO (gene ontology) annotation/enrichment analysis of user-input protein-coding genes in any of the 39 genomes.
(18) Conduct KEGG annotation/enrichment analysis of user-input protein-coding genes in any of the 39 genomes.
Users can input multiple gene IDs for any of the 39 soybean genomes to obtain the annotations of all input genes. Annotations of all genes can be viewed in a table, which can be exported as a csv or excel file (Figure 2). The gene sequences, CDS sequences, cDNA sequences, and the protein sequences can be download as plain text files.
Users can input a genomic region to view and retrieve the information of genes and transposable elements in any of the 39 soybean genomes. Check the example input data to enter a genomic region in appropriate format (Figure 3). Steps to search any genome by location are shown in Figure 3.
Users can input multiple gene IDs for any of the 39 soybean genomes to visualize the chromosome location of all input genes (Figure 4).
Users can browse transcription factors/regulators annotated in any of the 39 soybean genomes. Steps to browse the annotated transcription factors/regulators are shown in Figure 5.
This function is designed for users to browse the syntenic regions between two soybean genomes. Syntenic blocks will be displayed in a table in the main panel. Genes in the syntenic blocks of the reference/query genome will be listed (Figure 6).
This function is designed for users to browse the structural variations identified between two soybean genomes. Different types of structural variations, including duplications, inversions, tandem repeats, translocations, deletions, and insertions, will be displayed in different panels (Figure 7).
Jbrowser of all 39 high-quality soybean genomes were built to view the genome sequence, protein-coding genes, transposable elements, as well as the GO and KEGG annotations of protein-coding genes (Figure 8; Figure 9). These information are displayed in different tracks of the JBrowser.
Users can browse and download SNPs among 2898 soybean accessions by inputting a single gene ID or genomic region in appropriate format, such as "SoyZH13_09G103313" or "chr1:29765419-29793053" (Figure 10). After clicking the "Submit" button, SNPs will be visualized in the main panel, with different SNPs represented as inverted-triangles in different colors. The result can be further filtered by selecting soybean accessions or setting the mutation effect of SNPs. The result can also be downloaded by clicking the download buttons "Genotype data", "SNPs information" and "PDF-file" at the top of the main panel.
Users can search and retrieve SNPs among 2898 soybean accessions by inputting a single gene ID or genomic region in appropriate format, such as "chr7:29560705-29573051" (Figure 11). After clicking the "Submit" button, the genotypes of selected soybean accessions at SNP sites located in the user-input genomic region will be displayed in the main panel. In addition, SNPs information, genotype data and gene annotations in user-input genomic region can be downloaded using the download buttons at the top of the main panel.
In this menu, a heatmap can be created to display the linkage disequilibrium between pair-wise SNP sites in a user-input genomic region. Essential steps to conduct linkage disequilibrium analysis is shown in Figure 12. Several options are provided to tune the appearance of the heatmap including figure flipping and colors (Figure 12). Finally, the heatmap can be downloaded in PDF or SVG format.
The ‘‘Diversity" submenu under the "SNPs" menu provides the functionality to calculate nucleotide diversity among groups of soybean accessions in a user-input genomic region. Taking SoyZH13_12G067900 as an example, the results can be adjusted by sequentially setting the widgets "Number of SNPs in each window", "Ecotypes to calculate diversity", "Numerator ecotype", "Denominator ecotype", "Mutation effect" and "Upstream/Downstream". After clicking the Submit! button, the results can be visualized in the main panel (Figure 13). The results can also be downloaded as a PDF, SVG or TXT file.
In this menu, allele frequency of user-input SNP sites across different soybean ecotypes (improved cultivar, landrace, and G. soja) can be calculated and visualized (Figure 14). Several parameters are provided to tune the appearance of the result plot, including the colors used for the major and minor allele and the plot size. After clicking the Submit! button, the results would be visualized in the main panel, which can be exported in PDF or SVG format.
Users can search and retrieve high-quality INDELs among 2898 soybean accessions for any input gene ID or genomic region. Steps to search Indels are shown in Figure 15.
The expression levels of protein-coding genes in the genome of Zhonghuang 13 across 27 different tissues/stages were collected in SoybeanGDB. Using the "Gene expression analysis" functionality, the expression level of any user-input gene can be retrieved as a table and visualized as a heatmap (Figure 16). The expression level of the input gene can be downloaded as a csv or excel file. The size of the heatmap can be adjusted using provided widgets in the sidebar panel.
Based on the expression levels of protein-coding genes in the genome of Zhonghuang 13 across 27 different tissues/stages, we implemented a functionality in SoybeanGDB for users to perform co-expression analysis of input genes. For a user-input gene list, the expression correlation coefficient between the input genes and all expressed genes were calculated and displayed in a table, which can be downloaded as csv or excel files (Figure 17). The expression correlation coefficient can also be visualized as a heatmap. Essential steps to conduct co-expression analysis are shown in Figure 17.
Under this menu, users can search one or multiple soybean genomes by sequence similarity utilizing BLAST (Figure 18). The input sequences must be in fasta format. The genome sequences, gene sequences, CDS sequences, and protein sequences of any one or multiple of the 39 high-quality genomes can be searched by BLAST. After clicking the Submit button, BLAST alignment would be conducted and the results can be viewed in the Output panel (Figure 19).
Using this functionality, users can design primers for any input genomic region or gene locus in the genome of Zhonghuang 13, targeting SNPs and Indels in this region (Figure 20). Primer3 was utilized to design the primers. Many options of Primer3 are implemented as graphical interface for users to set appropriate parameters for primer designing. Five best candidate primers would be displayed in a result table, with each column representing diverse features of the designed primers. The detailed information of the best primers is displayed below the table, including the template sequence, the primers sequence and the SNPs/INDELs in the input region.
With this functionality, users can search orthologous groups among 39 soybeans genomes by inputting a single gene ID of any of the 39 genomes (Figure 21). Then, orthologs of the input gene in other genomes would be displayed in the main panel after clicking the Submit! button.
This functionality is used to perform GO annotation of a user-input gene list from any of the 39 soybean genomes. Steps to conduct GO annotation are shown in Figure 22. The full annotation result is displayed as a table, which can be downloaded. The top 30 GO terms with the largest number of genes are displayed as a bar plot (Figure 22).
This functionality is used to perform GO enrichment analysis of a user-input gene list from any of the 39 soybean genomes. Steps to conduct GO enrichment analysis are shown in Figure 23. Enrichment analysis for each GO category (Molecular Function, Biological Process, Cellular Component) was conducted separately. For each category, the full enrichment result is displayed as a table, and the significant enrichment result is displayed in two figures. The enrichment result can be further filtered by other parameters, including adjusted P value, Q value, etc. (Figure 23).
This functionality is used to perform KEGG pathway annotation of a user-input gene list from any of the 39 soybean genomes. Steps to conduct KEGG annotation are shown in Figure 24. The full annotation result is displayed as a table, which can be downloaded. The top 30 largest KEGG pathways are displayed as a bar plot (Figure 24).
This functionality is used to perform KEGG enrichment analysis of a user-input gene list from any of the 39 soybean genomes. Steps to conduct KEGG enrichment analysis are shown in Figure 25. The full enrichment result is displayed as a table, and the significant enrichment result is displayed as a figure. The enrichment result can be further filtered by other parameters, including adjusted P value, Q value, etc. (Figure 25).
A total of 2898 soybean accessions were collected in SoybeanGDB. The detailed information of all 2898 accessions is displayed in the "Accessions" submenu under the "Help" menu of SoybeanGDB, which can be downloaded as a csv or excel file. In the sidebar panel of the "Accessions" submenu, a widget is provided for users to select one or multiple accessions to view the detailed information in the main panel (Figure 26).