Genotype Quality Control

This repository contains three workflows for performing genotype data QC for the eQTL Catalogue project.

Parts of this workflow have been merged into the eQTL-Catalogue/geimpute workflow. This workflow is no longer maintained independently.

Dependencies

Most of the software dependencies for the pipelines are listed in the conda environment file. Docker container with all of these dependencies can be obtained from DockerHub.

The pipelines also require GenotypeHarmonizer and LDAK5 that need to be downladed separately. Script for downloading those can be found here.

1. Pre-imputation QC (pre-imputation.nf)

Preparing genotype data for imputation to the 1000 Genomes Phase 3 reference panel with Michigan Imputation Server. We have installed the imputation server locally.

QC steps:

Align raw genotypes to the reference panel with Genotype Harmonizer.
Convert the genotypes to the VCF format with PLINK.
Exclude variants with Hardy-Weinberg p-value < 1e-6, missingness > 0.05 and minor allele frequency < 0.01 with bcftools
Calculate individual-level missingness using vcftools.
Create separate VCF files for each chromosome.

Execution:

nextflow run pre-imputation_qc.nf -profile eqtl_catalogue -resume\
 --bfile /gpfs/hpc/projects/genomic_references/CEDAR/genotypes/PLINK_100718_1018/CEDAR\
 --output_name CEDAR_GRCh37_genotyped\
 --outdir CEDAR

2. Convert imputed genotypes to GRCh38 coordinates (crossmap.nf)

3. Project individuals to 1000 Genomes Project reference populations (pop_assign.nf).

Input

Genotype data imputed to 1000 Genomes Phase 3 reference panel.

Analysis steps

Perform LD pruning on the reference dataset with PLINK.
Perform PCA and project new samples to the reference principal components with LDAK.

nextflow run pop_assign.nf -profile pop_assign --vcf <path_to_vcf.vcf.gz> --data_name <study_name>

Authors

Initial version of the population assignment pipeline was implemented by Katerina Peikova and Marija Samoviča, later modified by Nurlan Kerimov and Kaur Alasoo.

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
bin		bin
conf		conf
data		data
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
build_container.sh		build_container.sh
crossmap_genotypes.nf		crossmap_genotypes.nf
download_binaries.sh		download_binaries.sh
environment.yml		environment.yml
examples.sh		examples.sh
nextflow.config		nextflow.config
pop_assign.nf		pop_assign.nf
pre-imputation_qc.nf		pre-imputation_qc.nf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Genotype Quality Control

Dependencies

1. Pre-imputation QC (pre-imputation.nf)

2. Convert imputed genotypes to GRCh38 coordinates (crossmap.nf)

3. Project individuals to 1000 Genomes Project reference populations (pop_assign.nf).

Input

Analysis steps

Authors

About

Releases

Packages

Languages

License

eQTL-Catalogue/genotype_qc

Folders and files

Latest commit

History

Repository files navigation

Genotype Quality Control

Dependencies

1. Pre-imputation QC (pre-imputation.nf)

2. Convert imputed genotypes to GRCh38 coordinates (crossmap.nf)

3. Project individuals to 1000 Genomes Project reference populations (pop_assign.nf).

Input

Analysis steps

Authors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages