My name is @YoannPageaud (on Mastodon / on Twitter).
I am a researcher and a developper in cancer bioinformatics.
Welcome to my repository dedicated to computational genomics and epigenomics, where I share all productions that I can make publicly available.
Here you will find:
- Some custom annotation tracks,
- Some tables containing data related to topics I cover,
- Some scripts useful for genomics and epigenomics data analysis,
- And of course, direct links to all the tools and packages I have developed.
extract_genomic_regions.R
- Reads a file and extracts 'chromosome', 'start', and 'end' columns.It supports almost any format. The file can contain any amount of columns. This function does not limit to extraction of genomic regions in a bed format.get_dataset_metrics.R
- Computes metrics from a coverage and a beta-values matrices.get_meth_data.R
- Retrieves methylation calls from a given directory, automatically remove SNPs from the data, format and save them in bed files and compute various statistics on it.handle_directories.R
- dependency ofget_meth_data.R
. Retrieve samples directories into a matrix.load_Meth_Data.R
- Loads Methylation Data from a specific bisulfite sequencing dataset.make_beta_matrix.R
- Creates a matrix of beta values from a bisulfite sequencing dataset.make_coverage_matrix.R
- Creates a matrix of coverage values from a bisulfite sequencing dataset.methylome_tiling.R
- Calculates the average methylation value of genome tiles.OTP_Quality_Control_Reports
- Shiny App giving quality control metrics on the bisulfite sequencing dataset.
BiocompR is an R package built upon ggplot2, and using data.table. It improves some visualisations commonly used in biology and genomics for data comparison and dataset exploration, introduces new kind of plots, provides a toolbox of functions to work with ggplot2 and grid objects, and ultimately, allows users to customize plots produced into publication ready figures.
NCBI.BLAST2DT is an R package allowing you to submit DNA sequences to NCBI BLAST servers directly from the console, to retrieve potential hits on a genome or sequence database, and to collect all results within an R data.table.
It makes use of the R package hoardeR to submit sequences to the NCBI BLAST API, and then parses the XML BLAST results returned to load them as an R data.table to make it more easy to query, sort, order and subset the resulting hits.
methview.qc allows you to run quality control analysis on your methylation array dataset, and to collect all results in neat ready-to-publish plots.
EpiAnnotator is an R Package accompanied by a web interface. It contains regularly updated annotations from 4 public databases: Blueprint, RoadMap, GENCODE and the UCSC Genome Browser. Annotations are hosted locally or in a server environment and automatically updated by scripts of our own design. Thousands of tracks are available, reflecting data on a variety of tissues, cell types and cell lines from the human and mouse genomes. Users need to upload sets of selected and background regions. Results are displayed in customizable and easily interpretable figures.
The EpiAnnotator online web service is available here!
biotab.manager is an R package allowing you to download, manage, subset, and aggregate TCGA patients clinical data (biotabs) from the GDC portal. The package is built upon TCGAbiolinks to query TCGA databases, and makes use of R data.table handle queries results.
DTrsiv is an R package containing a collection of R data.table functions available to quickly and easily clean your data.
Gff3ToBed is a shell script using awk to extract and format specific genomic data contained in a Gff3 (1-based) file to a Bed (0-Based) file.