Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DITTO using Neural Networks and SHAP #21

Merged
merged 144 commits into from
Feb 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
144 commits
Select commit Hold shift + click to select a range
b858e1b
first pass
tkmamidi Apr 26, 2023
822e9f0
until dbnsp
tkmamidi Apr 26, 2023
dcbca0b
training config ready
tkmamidi Apr 26, 2023
b83a2c4
fixing clinvar config to get ID
tkmamidi Apr 26, 2023
3039724
update col_config and python notebook for generating train and test f…
tkmamidi Apr 27, 2023
0abd693
training models
tkmamidi Apr 27, 2023
658279e
fixing wierd instance with fatHMM values in parsing
tkmamidi Apr 27, 2023
937f1fa
changing NN script to use argparse instead of hardcoded file paths
tkmamidi May 17, 2023
442a3ab
adding DExTR to parsing stuff
tkmamidi May 18, 2023
9f34f68
adding DExTR file
tkmamidi May 18, 2023
698dfe8
parser can read and write
tkmamidi May 18, 2023
b7955a1
fixing duplicate columns in config
tkmamidi May 19, 2023
a2f9cd4
adjusting clinvar review column values
tkmamidi May 19, 2023
db66be8
removing mitomap as there were no scores for training data
tkmamidi May 20, 2023
cb30221
Merge branch 'training' of https://github.com/uab-cgds-worthey/DITTO …
tkmamidi May 20, 2023
f78fc1a
adding class weight to NN
tkmamidi May 21, 2023
4dfbc59
working DITTO NN
tkmamidi May 22, 2023
794656b
concat training is working. need to test shap
tkmamidi May 22, 2023
7afa53b
DeepExplainer works but can't get shap_values. need to debug
tkmamidi May 23, 2023
b1cd211
DExTR scaled
tkmamidi May 24, 2023
07d0435
NN on GPU and benchmarking for so
tkmamidi May 25, 2023
a77063e
added try:except to skip errors like sample size errors
tkmamidi May 26, 2023
ad68913
updated config file adding varity predictions
tkmamidi Jun 7, 2023
7920c1b
adding more columns for clinvar parsing
tkmamidi Jun 7, 2023
74304ab
new clinvar filtering with the actual clinvar column
tkmamidi Jun 7, 2023
2dadb5b
trying out GPU but it didn't work
tkmamidi Jun 8, 2023
6936244
roc_auc and shap scores
tkmamidi Jun 9, 2023
1462faf
split train and test by variants and retraining
tkmamidi Jun 12, 2023
fe3233f
adds benchmark output as an excel sheet for all tools
tkmamidi Jun 14, 2023
453ed83
fixing roc scores
tkmamidi Jun 15, 2023
15ab5e8
SHAP, confusion matrix and roc_prc curves working
tkmamidi Jun 15, 2023
0566aab
preparing for test parsing
tkmamidi Jun 17, 2023
b16e816
adding neural network to repo. Testing code works but not correctly c…
tkmamidi Jun 17, 2023
2e1da04
adding opencravat command to job script
tkmamidi Jun 20, 2023
b23e12a
Merge branch 'training' of https://github.com/uab-cgds-worthey/DITTO …
tkmamidi Jun 20, 2023
c68ce42
predictions script
tkmamidi Jun 21, 2023
0a77284
Fixing bugs in predict to deal with null columns in test data. adding…
tkmamidi Jun 21, 2023
29a3a96
lovd and train-test rework
tkmamidi Jul 13, 2023
994d4fc
parse and predict working but very slow
tkmamidi Jul 13, 2023
b788530
one script is enough now for parse and predict
tkmamidi Jul 14, 2023
6381f30
restructuring scripts
tkmamidi Jul 14, 2023
a7455af
fixed parsing function before predictions
tkmamidi Jul 15, 2023
8b5f403
parsing and predictions separately
tkmamidi Jul 19, 2023
69c1ab4
adding array job script
tkmamidi Jul 19, 2023
dd9f267
notebook to analyze example case
tkmamidi Jul 25, 2023
f709a2c
testing nextflow
tkmamidi Jul 25, 2023
bad7ab8
basic pipeline start
tkmamidi Jul 26, 2023
99340e9
Add bcftools conda env.
sdhutchins Jul 26, 2023
b7776bb
Add normalizeVCF process
sdhutchins Jul 26, 2023
640395e
updated test vcf and pipeline
tkmamidi Jul 26, 2023
83ca117
Update test vcf to correct format.
sdhutchins Jul 26, 2023
03251e8
Correct base name of vcf.
sdhutchins Jul 26, 2023
7989029
environment file for nextflow
tkmamidi Jul 26, 2023
6cc79e4
Add removeHomRefSites process.
sdhutchins Jul 26, 2023
aff9ea0
Add removeHomRefSites process.
sdhutchins Jul 26, 2023
098db60
Merge branch 'training' into bcftools
sdhutchins Jul 27, 2023
f765dd6
Remove 'output_dir' option.
sdhutchins Jul 27, 2023
f700650
adding pytabix
tkmamidi Jul 27, 2023
1c0b8e3
removes taking only head variants from testing pipeline
tkmamidi Jul 27, 2023
62a6cf0
cleanup and adding one more test file
tkmamidi Jul 27, 2023
769121f
Merge branch 'training' into bcftools
sdhutchins Jul 27, 2023
06c9db1
Remove conda env from each process.
sdhutchins Jul 27, 2023
e95ea40
file name change while making predictions
tkmamidi Jul 27, 2023
48d7bc7
moved opencravat package from pip to conda
tkmamidi Jul 27, 2023
e839ccf
Remove unneeded conda env.
sdhutchins Jul 27, 2023
54f5259
Merge branch 'training' of github.com:uab-cgds-worthey/DITTO into bcf…
sdhutchins Jul 27, 2023
a63fcf2
Add config with conda environment.
sdhutchins Jul 27, 2023
3ecebfe
config and pipeline
tkmamidi Jul 27, 2023
36232d9
Add envs by process.
sdhutchins Jul 27, 2023
3ea3add
conda env fixed and pipeline working as expected
tkmamidi Jul 27, 2023
e4d51b6
Merge branch 'bcftools' of https://github.com/uab-cgds-worthey/DITTO …
tkmamidi Jul 27, 2023
792802b
Merge pull request #19 from uab-cgds-worthey/bcftools
tkmamidi Jul 27, 2023
18cefa9
managing conda envs
tkmamidi Jul 27, 2023
12e0a1f
test data change
tkmamidi Jul 27, 2023
299c36c
oc env change to pip
tkmamidi Jul 27, 2023
962226b
added ability to parse multisample vcf annotations from opencravat
wilkb777 Jul 28, 2023
885ed73
pipeline can handle multi-sample VCF
tkmamidi Jul 28, 2023
d8a66dc
pipeline perfectly working with cheaha
tkmamidi Jul 29, 2023
7064783
adjusting resources in pipeline and running udn-Y sample again
tkmamidi Jul 31, 2023
cbd11d4
fix config to use partition
tkmamidi Aug 1, 2023
2b10be5
adjusting resource limits for processes
tkmamidi Aug 1, 2023
2348e41
pipeline working
tkmamidi Aug 2, 2023
faea267
trying out CAGI6 proband-9 as a trial
tkmamidi Aug 3, 2023
d9cd536
running 35 CAGI6 train samples
tkmamidi Aug 4, 2023
9b4a545
add details to readme
tkmamidi Aug 7, 2023
2190e9d
running workflow starting from openCravat
tkmamidi Aug 8, 2023
6521e0f
adding build info to sbatch script
tkmamidi Aug 8, 2023
5b545ff
renaming the config file
tkmamidi Aug 11, 2023
92fc132
adding model to the directory. moved it from data directory
tkmamidi Aug 12, 2023
1d58527
pipeline to split by 1M lines and opencravat using 5 CPUs
tkmamidi Aug 17, 2023
7844b25
scripts for cohort level analysis
tkmamidi Aug 18, 2023
e8db6a3
adds forked opencravat to env for correct CPU detection
tkmamidi Aug 23, 2023
ed92f46
adjusting pipeline resources for SNV predictions mass submission
tkmamidi Aug 23, 2023
92e6dec
read files from folder
tkmamidi Aug 23, 2023
3fc37a7
--mp is working
tkmamidi Aug 30, 2023
3efff1c
only DITTO scores with nextflow report at the end
tkmamidi Aug 30, 2023
1468e77
new config
tkmamidi Aug 30, 2023
5cf5833
removed resume tag from job scripts
tkmamidi Aug 30, 2023
fd3c1eb
bumping up memory for nextflow jobs
tkmamidi Aug 31, 2023
a3246d9
filter variants by gnomad and DITTO filters
tkmamidi Sep 7, 2023
8e63d59
new pipeline with samplesheet
tkmamidi Sep 13, 2023
f5b5082
adding oc install docs and modified pipeline for CADD snv predictions
tkmamidi Sep 19, 2023
1ff869a
copied OC database to /local for nodes and started the pipeline point…
tkmamidi Oct 19, 2023
2b7e823
update readme
tkmamidi Oct 20, 2023
b1f14fd
changes to cheaha config
tkmamidi Nov 1, 2023
b91e808
updated config
tkmamidi Dec 8, 2023
6772fdf
gather variants per chromosome
tkmamidi Dec 10, 2023
5ca2691
postprocessing scripts
tkmamidi Dec 26, 2023
1cb0bc6
DITTO training database
tkmamidi Dec 30, 2023
5e2d964
cleanup part-1 for PR
tkmamidi Dec 30, 2023
bae3393
moving files around
tkmamidi Dec 31, 2023
ca6306c
adding results and data files to the repo
tkmamidi Jan 2, 2024
1e297f2
updates readme and file paths
tkmamidi Jan 3, 2024
63cf927
adding comments to scripts
tkmamidi Jan 4, 2024
f30ef45
adding variant type classes config file
tkmamidi Jan 4, 2024
407eef8
rewriting opencravat install instructions
tkmamidi Jan 5, 2024
9c80e10
removed a line
tkmamidi Jan 5, 2024
2d49eeb
adding image to readme
tkmamidi Jan 5, 2024
db9edc0
better quality image
tkmamidi Jan 5, 2024
994ab4a
building ditto readme
tkmamidi Jan 5, 2024
273d783
format change
tkmamidi Jan 5, 2024
2ec8aa5
title and gif side by side
tkmamidi Jan 5, 2024
fe185da
generic format
tkmamidi Jan 5, 2024
bab6d8b
bullet fires
tkmamidi Jan 5, 2024
bef0bcd
building DITTO docs
tkmamidi Jan 6, 2024
65b82ee
adding oc logs and readme files
tkmamidi Jan 6, 2024
2afb006
adding benchmark details to readme
tkmamidi Jan 6, 2024
d79232f
removing all benchmarking plots
tkmamidi Jan 6, 2024
f889b75
rewording readme
tkmamidi Jan 25, 2024
cecda38
fixes links
tkmamidi Jan 25, 2024
7e5e303
adds linting workflow
tkmamidi Jan 25, 2024
a150750
formatting document
tkmamidi Jan 25, 2024
23bee66
linting json
tkmamidi Jan 25, 2024
0383375
line length fix
tkmamidi Jan 25, 2024
24e19ab
skipping some links
tkmamidi Jan 25, 2024
dc8d4c1
python notebook link check
tkmamidi Jan 26, 2024
c5d370c
ignores python notebook links for linting
tkmamidi Jan 26, 2024
cba0db1
PR change request from mana - 1
tkmamidi Jan 31, 2024
9910642
contact info and mypackage info update
tkmamidi Feb 1, 2024
9e0dc66
Update docs/build_DITTO.md
tkmamidi Feb 1, 2024
e324714
PR requests part-2
tkmamidi Feb 1, 2024
2424301
removes link check disabler
tkmamidi Feb 2, 2024
87bfa1f
link fix
tkmamidi Feb 2, 2024
db07a13
disable broken links
tkmamidi Feb 2, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions .editorconfig
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# top-most EditorConfig file
root = true

# global definitions
[*]
end_of_line = lf
insert_final_newline = true
trim_trailing_whitespace = true
indent_size = 4
indent_style = space

# override indents for specific filetypes
[*.{md,yml,yaml,html,css,scss,js,cff}]
indent_size = 2
26 changes: 26 additions & 0 deletions .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
---
name: Bug report
about: Create a report to help us improve
title: ''
labels: bug
assignees: ''

---

**Describe the bug**
A clear and concise description of what the bug is.

**To Reproduce**
Steps to reproduce the behavior:

**Expected behavior**
A clear and concise description of what you expected to happen.

**Screenshots**
If applicable, add screenshots to help explain your problem.

**Environment used**
OS, relevant tool versions, etc.

**Additional context**
Add any other context about the problem here.
20 changes: 20 additions & 0 deletions .github/ISSUE_TEMPLATE/feature_request.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
---
name: Feature request
about: Suggest an idea for this project
title: ''
labels: enhancement
assignees: ''

---

**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

**Describe the solution you'd like**
A clear and concise description of what you want to happen.

**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.

**Additional context**
Add any other context or screenshots about the feature request here.
22 changes: 22 additions & 0 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
* **Please check if the PR fulfills these requirements**
- [ ] Tested as per the documentation and they passed
- [ ] Docs have been added / updated (for bug fixes / features)


* **What kind of change does this PR introduce?** (Bug fix, feature, docs update, ...)



* **What is the current behavior?** (You can also link to an open issue here)



* **What is the new behavior (if this is a feature change)?**



* **Does this PR introduce a breaking change?** (What changes might users need to make in their application due to this PR?)



* **Other information**:
40 changes: 40 additions & 0 deletions .github/workflows/linting.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
name: Linting- Markdown,Shell

on:
push:
workflow_dispatch:

jobs:
markdown-linting:
name: Markdown lint
runs-on: ubuntu-22.04

steps:
- name: Checkout Code
uses: actions/checkout@v3

- name: markdownlint-cli
uses: nosborn/[email protected]
with:
files: .
config_file: ".markdownlint.json"
# dot: false
# ignore_files: '".git*/**"'


markdown-check-links:
name: Markdown - checking links
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v3
- uses: gaurav-nelson/github-action-markdown-link-check@v1
with:
config-file: ".markdownlint.json"

shellcheck:
name: Shellcheck
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v3
- name: Run ShellCheck
uses: ludeeus/action-shellcheck@master
27 changes: 6 additions & 21 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -25,12 +25,7 @@ dask-worker-space/
*.egg
*.err
*.out
*.db
*.py*.sh
*.tsv
*.csv
*.gz*

.DS_Store

# PyInstaller
# Usually these files are written by a python script from a template
Expand All @@ -51,7 +46,6 @@ htmlcov/
nosetests.xml
coverage.xml
*,cover
*.pdf

# Translations
*.mo
Expand All @@ -72,9 +66,6 @@ target/
# conda
.conda

# Database
*.db
*.rdb

# Pycharm
.idea
Expand All @@ -83,17 +74,13 @@ target/
.ipynb_checkpoints/

# exclude data from source control by default
/data/
cagi*/
work

#snakemake
#snakemake or nextflow
.snakemake/
# data/
variant_annotation/data/

# exclude test data used for development
to_be_deleted/test_data/data/ref
to_be_deleted/test_data/data/reads
.nextflow/
.nextflow*
report*

#logs
logs/
Expand All @@ -103,5 +90,3 @@ logs/

# .java/fonts dir get created when creating fastqc conda env
.java/

/.vscode/settings.json
10 changes: 10 additions & 0 deletions .markdownlint.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"default": true,
"MD007": {
"indent": 2
},
"MD013": {
"line_length": 120,
"tables": false
}
}
Loading
Loading