This study was a collaboration with Calico Life Sciences and Gary Churchill aimed to identifying circulating molecules which are correlated with age and/or lifespan. To do this, plasma was profiled at three ages in each of 110 mice using proteomics, metabolomics, and lipidomics.
This repository hosts the codebase used to carryout the studies major analyses and also serves as a landing page for other resources (apps and data) associated with the manuscript.
Any of the ~2,300 molecular features described in this study can be visualized using our Shiny App. This application allow you to look at how a molecule varies with age, fraction of life lived, lifespan, etc, and to stratify results by possible confounding effects such as experimental batches.
Analyses are divided into two major phases:
-
Bioinformatics: Code and notebooks which generated the data present in "Google Cloud Storage (GCS)" but which require infrastructure (e.g., a SLURM cluster for bootstrapping), or logistics (e.g., interfacing between data in googleDrive, GCS, Cromwell), which cannot be easily reproduced by end-users. For clarity, notebook have been rendered and can be seen below:
- 1_wdl_processes: metabolomics and lipidomics informatics run through Cromwell
- 2_non_wdl_processes: manually run notebooks which aggregate proteomics results and perform differential expression (with bootstrapping in SLURM):
- 3_interactive: organize results and upload them to GCS to support major_analyses and the Shiny app
-
Major Analyses: Reproducible notebooks reproducing the vast majority of figures and tables starting from data which end-users can download from Google Cloud Platform (GCP). These notebooks roughly mirror the structure of the manuscript with individual notebooks indicating the section of the manuscript they support:
- Figure1 - Study Design
- Figure2 - Exploratory Data Analysis a. Figure2 - Genetics b. Figure2 - Batch Effects
- Figure3 - Statistics and Functional Enrichments a. Figure3 - Power Analysis
- Figure4 - Aging Archetypes a. Figure3 - Volcano Plots
- Figure5 - Molecular Architecture of Longevity
- Figure6 - Mechanisms
- Set config.json:
- cache_dir: set this to a location on your computer where you want raw data and cached intermediate files to be stored.
- update_figure: this should be false (this option was just used to prepare the manuscript)
- Load hackett-doomics.Rproj in Rstudio. (this will set your working directory to the project directory to normalize paths, and set the project R environment to use pre-set package versions via "renv").
- Run renv::restore() to create install the packages required for all notebooks. This environment currently expected R 4.3.1.
- Running any of the notebooks in "major_analyses" will download and cache the studies intermediate data files.
- major_analyses can be run in order to generate nearly all figures and tables in the manuscript. To do this run figure_1.qmd > figure_2.qmd > figure_3.qmd, etc. Analyses with a suffix, e.g., figure_2_batch.qmd are meant to be run after the corresponding primary file (in this case figure_2.qmd). Rendered versions of all notebooks can be seen above.