Skip to content

Latest commit

 

History

History
118 lines (84 loc) · 6.74 KB

File metadata and controls

118 lines (84 loc) · 6.74 KB

R packages for my daily scientific work

My personal curated list of useful R packages and extensions for my work as researcher in life science with focus on (pharmaco)genetics, clinical studies, OMICS and shiny apps.



Overview

workhorse

Packages I load in the beginning of almost every session

  • tidyverse - bundel of useful packages
  • janitor - tool for cleaning and examin "dirty" data
  • readxl - get easily and fastly data out of Excel into R
  • glue - enhanced string manipulation
  • furrr - purrr's mapping functions but in parallel mode
  • fst - saving and loading data extremely fast

statistics

Science without pvalues is barly not possible

general

  • rstatix - (tidyverse) pipe-friendly framework for basic statistical tests
  • gtsummary - elegant and flexible way to create publication-ready analytical and summary tables
  • pROC - tools for visualizing, smoothing and comparing ROC curves
  • quantreg - quantile regression for non-parametric data
  • coin - functions to transform data and a lot of tests
  • breakerofchains - Run your chain until the cursor line. Add the addin by using a shortcut like 'Ctrl + shift + b'.

genomics

  • haplo.stats - analysis of haplotypes.
  • SNPassoc - useful for small genetic association studies
  • SKAT - gene-based association tests and other burden tests
  • ggfastman - plotting tons of pvalues using manhatten plots
  • pathfindR - pathway enrichment analysis via active subnetworks

machine learning

  • caret - toolset for classification and regression models
  • caretEnsemble - ensembles of caret models allows also to use bootstrapping
  • superlearner - easily estimate the performance of multiple machine learning models using cross validation

ggplot2

ggplot extension for further and easy customisation

  • ggpubr - easy-to-use functions for creating ‘ggplot2’- based publication ready plots including statistical add ons.
  • ggsignif - visualization of statistical differences
  • ggbeeswarm - beeswarm plots aka scatter plots or violin/boxplot plots with points
  • cowplot -creating publication-quality figures
  • ggrepel - annotations without overlapp
  • ggannotate - point-and-click tool to annotate plots in the last production step
  • plotly - interactive plots
  • ggtex - Add nice text, lables and boxes to your ggplots

shiny

there are better lists as linked above, but I use following packages very often

others

packages not fitting 100% in the categories above and other commandline tools

R packages

  • clinPK - functions for clinical pharmacokinetics and clinical pharmacology
  • valr - compare and manipulate genome intervals
  • gplots - useful for heatmaps and venn diagrams
  • disgenet2r - information about the genetic basis of human diseases

cmdline tools

  • plink - whole genome association analysis toolset
  • snptest - genome-wide association analysis of SNPs
  • regenie - whole genome regression modelling of massive large GWAS
  • vep - genetic variant annotation including effects and functions
  • SNpeff - genomic variant annotations and functional effect prediction toolbox
  • pypgx - pharmacogenomics profiles from NGS & SNP array data
  • aldy - pharmacogenomics profiles from NGS data
  • samtools - toolbox for high-throughput sequencing data
  • openai - API to access GPT-3
  • celltypist - automated cell type annotation for scRNA-seq
  • salmon - transcript quantification from RNA-seq data3
  • impute imputation server
  • gatk - genome analysis toolkit
  • conda - environment management
  • slurm - workload management
  • nfcore - Easy to use bfx analysis pipelines built using Nextflow