Here are the scripts we created to process the results data in our article "Generic and Queryable Data Integration Schema for Transcriptomics and Epigenomics studies". We used datasets from 2 recent articles: one about human health (HCM) and the other about insects cast differentiation (HoneyBee).
Dataset from the following article : "Integrative analysis of transcriptome, DNA methylome and chromatin accessibility reveals candidate therapeutic targets in hypertrophic cardiomyopathy" published in 2024 by Gao et al.
Python file processing the data about differentially expressed genes
Python file processing the data about differentially methylated regions
Python file processing the data about Transcription Factors and their binding sites
Necessary table, created to have the corresponding names and IDs of each gene. Used in TF_HCM.py to create a dictionnary.
SPARQL query used to create the gene_names_ID.tsv table.
Dataset from the following article : "The diverging epigenomic landscapes of honeybee queens and workers revealed by multiomic sequencing" published in 2023 by Zhang et al.
Python file processing the data about ATAC peaks
Python file processing the data about Chip peaks
Python file processing the data about differentially expressed genes, lncRNA and co-differentially expressed genes.
Python file processing the data about KEGG enrichments
Python file processing the data about Transcription Factors and their binding sites
Necessary table, created to have the gene positions of ech gene concerned by a TF binding site. Used in TF_positions_HBee.py to create a dictionnary.
SPARQL query used to create the gene_positions.tsv table.