-
Notifications
You must be signed in to change notification settings - Fork 10
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add additional columns from the big gnomad VCF file (#14)
* generalize colum level * fix tests * adjust version * fix oom error * add column infromation from the gnomad vcf * new columns * new create file * genome version * data folder * add info for config * Update README.md * Update README.md * Update README.md * reduce number of columns to reduce db size * columns * new test set * Update README.md * Update README.md * Update setup.py Co-authored-by: Kalin Nonchev <[email protected]> Co-authored-by: Kalin Nonchev <[email protected]> Co-authored-by: Kalin Nonchev <[email protected]>
- Loading branch information
1 parent
2009f6c
commit fea2b54
Showing
16 changed files
with
242 additions
and
123 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,4 +22,4 @@ dependencies: | |
- joblib | ||
- pytest | ||
- nbformat>=5.1 | ||
- joblib | ||
- joblib |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
base_columns: | ||
- CHROM | ||
- POS | ||
- REF | ||
- ALT | ||
- FILTER | ||
Grch37: | ||
- AC # Alternate allele count for samples | ||
- AN # Total number of alleles in samples | ||
- AF # Alternate allele frequency in samples | ||
- rf_tp_probability # Random forest prediction probability for a site being a true variant | ||
- MQ # Root mean square of the mapping quality of reads across all samples | ||
- QD # Variant call confidence normalized by depth of sample reads supporting a variant | ||
- ReadPosRankSum # Z-score from Wilcoxon rank sum test of alternate vs. reference read position bias | ||
- DP # Depth of informative coverage for each sample; reads with MQ=255 or with bad mates are filtered | ||
- VQSLOD # Log-odds ratio of being a true variant versus being a false positive under the trained VQSR Gaussian mixture model | ||
- AC_popmax # Allele count in the population with the maximum AF | ||
- AN_popmax # Total number of alleles in the population with the maximum AF | ||
- AF_popmax # Maximum allele frequency across populations (excluding samples of Ashkenazi | ||
- AF_eas | ||
- AF_oth | ||
- AF_nfe | ||
- AF_fin | ||
- AF_afr | ||
- AF_asj | ||
Grch38: | ||
- AC # Alternate allele count for samples | ||
- AN # Total number of alleles in samples | ||
- AF # Alternate allele frequency in samples | ||
- InbreedingCoeff # Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation | ||
- MQ # Root mean square of the mapping quality of reads across all samples | ||
- QD # Variant call confidence normalized by depth of sample reads supporting a variant | ||
- ReadPosRankSum # Z-score from Wilcoxon rank sum test of alternate vs. reference read position bias | ||
# - DP # Depth of informative coverage for each sample; reads with MQ=255 or with bad mates are filtered | ||
- VarDP | ||
- AS_VQSLOD | ||
# - VQSLOD # Log-odds ratio of being a true variant versus being a false positive under the trained VQSR Gaussian mixture model | ||
- AC_popmax # Allele count in the population with the maximum AF | ||
- AN_popmax # Total number of alleles in the population with the maximum AF | ||
- AF_popmax # Maximum allele frequency across populations (excluding samples of Ashkenazi | ||
- AF_eas | ||
- AF_oth | ||
- AF_nfe | ||
- AF_fin | ||
- AF_afr | ||
- AF_asj |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,6 @@ | ||
database_location: "test_out" | ||
gnomad_vcf_location: "data" | ||
tables_location: "test_out" | ||
script_locations: "test_out" | ||
KERNEL: "gnomad_db" | ||
database_location: "test_out" # where to create the database, make sure you have space on your device. | ||
gnomad_vcf_location: "data" # where are your *.vcf.bgz located | ||
tables_location: "test_out" # where to store the preprocessed intermediate files, you can leave it like this | ||
script_locations: "test_out" # where to store the scripts, where you can check the progress of your jobs, you can leave it like this | ||
genome: "Grch37" # genome version of the gnomAD vcf file (2.1.1 = Grch37, 3.1.1 = Grch38) | ||
KERNEL: "gnomad_db" |
Oops, something went wrong.