- Introduction
- Quick Start
- Install pleioFDR
- Data downloads
- Data preparation
- Run pleioFDR
- pleioFDR results
- FUMA-defined loci
- Octave support
Pleiotropy-informed conditional and conjunctional false discovery rate allows to boost loci discovery in low-powered GWAS by levereging pleiotropic enrichment with a larger GWAS on related phenotype, and to identify genetic loci joinly associated with two phenotypes.
If you use pleioFDR software for your research publication, please cite the following paper(s):
- Andreassen, O.A. et al. Improved detection of common variants associated with schizophrenia and bipolar disorder using pleiotropy-informed conditional false discovery rate. PLoS Genet 9, e1003455 (2013).
The pleioFDR software may not be used in medical applications.
To install and run pleioFDR on a small example, constrained to chromosome 21:
git clone https://github.com/precimed/pleiofdr && cd pleiofdr
wget https://precimed.s3-eu-west-1.amazonaws.com/pleiofdr/pleioFDR_demo_data.tar.gz
tar -xzvf pleioFDR_demo_data.tar.gz
matlab -nodisplay -nosplash < runme.m
To install and run pleioFDR using full example:
git clone https://github.com/precimed/pleiofdr && cd pleiofdr
wget https://precimed.s3-eu-west-1.amazonaws.com/pleiofdr/ref9545380_1kgPhase3eur_LDr2p1.mat
wget https://precimed.s3-eu-west-1.amazonaws.com/pleiofdr/CTG_COG_2018.mat
wget https://precimed.s3-eu-west-1.amazonaws.com/pleiofdr/SSGAC_EDU_2016.mat
cp config_default.txt config.txt
matlab -nodisplay -nosplash < runme.m
For the description of the data, see here.
For the results, inspect the results
folder.
Prerequisites:
- matlab (tested with versions >= 2015)
- workstation with at least 16GB of RAM
The following step by step instruction assumes you are using Linux, however the same can be done in Windows or Mac with minimal modifications.
Download pleioFDR software by going to https://github.com/precimed/pleiofdr in your favorite internet browser, use "Clone or download" button , and "Download zip" do get the latest code.
Alternatively, you may get the code by cloning git repository from command line:
git clone https://github.com/precimed/pleiofdr && cd pleiofdr
Download reference data from here. The reference is based on 1000 Genomes phase 3 data (May 2, 2013 release). Variant calls (vcf files) for 22 autosomes were downloaded from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502 . We kept only samples of European ancestry (IBS, TSI, GBR, CEU, FIN populations) with missing call rate below 10% and only biallelic variants with non-duplicated ids, minor allele frequency above 1%, missing call rate below 10% and Hardy-Weinberg equilibrium exact test p-values greater than 1.E-20. The filtering was performed with PLINK 1.9. Resulted template contained 503 samples and 9,545,380 variants. Further details are available in about.txt.
wget https://precimed.s3-eu-west-1.amazonaws.com/pleiofdr/about.txt
wget https://precimed.s3-eu-west-1.amazonaws.com/pleiofdr/ref9545380_1kgPhase3eur_LDr2p1.mat
wget https://precimed.s3-eu-west-1.amazonaws.com/pleiofdr/CTG_COG_2018.mat
wget https://precimed.s3-eu-west-1.amazonaws.com/pleiofdr/SSGAC_EDU_2016.mat
wget https://precimed.s3-eu-west-1.amazonaws.com/pleiofdr/ref9545380_bfile.tar.gz
wget https://precimed.s3-eu-west-1.amazonaws.com/pleiofdr/9545380.ref
Those at NORMENT with access to NIRD can also download these data from SUMSTAT/misc/9545380_ref
and SUMSTAT/TMP/mat_9545380
.
Here we explain how to convert raw summary statistics to pleioFDR format.
Feel free to skip this step if you would like to try pleioFDR on CTG_COG_2018.mat
and SSGAC_EDU_2016.mat
,
or if you downloaded input data from the internal NORMENT SUMSTATS
inventory.
Prerequisites:
- python >= 2.7 (but < 3) with numpy, scipy and pandas libraries
Downloads:
- Download code from https://github.com/precimed/python_convert, either via web browser, or
git clone https://github.com/precimed/python_convert
. - Download educational attainment and subjective well-being summary statistics
from SSGAC consortium to traitfolder:
wget http://ssgac.org/documents/EduYears_Main.txt.gz -P traitfolder wget http://ssgac.org/documents/SWB_Full.txt.gz -P traitfolder
Conversion steps:
- Use sumstats.py script to standartizize downloaded summary statistics (csv)
and prepare input files for cond/conj fdr analysis (mat):
In the first and second commands --n-val argument indicates sample size. The number is taken from original papers [Okbay et al. (2016)].
python src/converter/sumstats.py csv --auto --sumstats traitfolder/EduYears_Main.txt.gz --n-val 328917 --out traitfolder/ssgac.edu.csv --force python src/converter/sumstats.py csv --auto --sumstats traitfolder/SWB_Full.txt.gz --n-val 298420 --out traitfolder/ssgac.swb.csv --force python src/converter/sumstats.py mat --sumstats traitfolder/ssgac.edu.csv --ref 9545380.ref --out traitfolder/ssgac.edu.mat python src/converter/sumstats.py mat --sumstats traitfolder/ssgac.swb.csv --ref 9545380.ref --out traitfolder/ssgac.swb.mat
- For more details on input arguments please check:
python src/converter/sumstats.py --help python src/converter/sumstats.py csv --help python src/converter/sumstats.py mat --help
Create a configuration file by copying config_default.txt
file, located in the root of pleioFDR repository.
cp config_default.txt config.txt
Edit config.txt
so that
reffile
points to theref9545380_1kgPhase3eur_LDr2p1.mat
filetraitfolder
points to folder containingCTG_COG_2018.mat
andSSGAC_EDU_2016.mat
- set
randprune_n=500
instead of the defaultrandprune_n=20
You may also want to changetraitfile1
andtraitfiles
options.
Start matlab.
Change current folder to the root of pleiofdr
repository (i.e. a folder containing pleiotropy_analysis.m
).
Execute runme
command, which should trigger pleiofdr analysis.
To run pleioFDR from console:
matlab -nodisplay -nosplash < runme.m
Results are placed in an output folder, defined in config.txt
file. By default it is named results
.
Results contain:
- table with LD-independent significant loci
- table with all analyzed variants and their cond/conj FDR values
- conditional qq plots and enrichment plots
- Manhattan plot (by default only .fig file, so you need to open it with matlab and save in another format separately)
- log file
results.mat
file containing condFDR or conjFDR values for all SNPs
Loci tables generated in the step above use custom non-standard logic to clump results based on LD structure.
You may want to re-generate loci using sumstats.py clump
script, which implements the same logic as in FUMA.
To do so, convert results.mat
into a text file (for example using scipy.io.loadmat
and pandas.DataFrame.to_csv
), and then perform sumstats.py clump
.
At this step you may use ref9545380_bfile.tar.gz
as a reference to preform clumping.
NB! Octave support is experimental and not officially supported.
install additional packages:
octave --no-gui <(echo "pkg install -forge io statistics")
octave --no-gui <(echo "pkg install -forge nan")
install gammainc function:
wget http://savannah.gnu.org/bugs/download.php?file_id=37342 -O gammainc.m
wget http://savannah.gnu.org/bugs/download.php?file_id=37341 -O __gammainc_lentz.cc
mkoctfile __gammainc_lentz.cc
run:
octave --no-gui runme.m
- last tested with octave version 4.0.2
- *.mat files need to be saved with "-v7" max:
save('ref9545380_1kgPhase3eur_LDr2p1.mat', '-v7')