This repository consists of python scripts and notebooks used to select best imunogenetic epitopes of the nucleocapsid and spike proteins of SARS-Cov-2 by combining solvent accessebility, predicted peptides and mutation data.
Created for scientific project at Latvian biomedical and reaserch center .
!! IMPORTANT !!
Curently this repository is under development and might not have all needed files and examples.
- Download SARS-Cov-2 variant mutation data
- Load and process predicted epitope data tables for B-cell, T-cell data (MHC-I and MHC-II) and structural data.
- Add to B-cell and T-cel data tables structural and mutation data.
- Create epitope lists in excel files for variants of concern with structural and mutation statistics.
- SARS-Cov-2 sequnce allignment and mmino acid substitution statistics from GISAID data. (Currently partially impllemented)
Combine mutation data, predicted SARS-Cov-2 variant epitops and structuraldata count_mutations.ipynb
Transform brewery 5.0 result structurald data files to JSON structures parseStruct.ipynb
Count mutation from allignment parse.ipynb
For detailed descriptions of used piepeline look at count_mutations.ipynb
count_mutations.ipynb rely on JSON files created with parseStruct.ipynb
parse.ipynb currently is not used in epitope analysis pipeline, but was left on this project if there is need to make mutation frequency data from full SARS-Cov-2 S and N protein sequnce data.
-
Plotly
-
Pandas
-
BioPython
-
All CSV files are expected to have delimination symbol ";"
-
Scripts were created and tested on python 3.10.4 64 bit version
-
Hugging faces Transformer api for ESM AI model.
spikeProt and NProt directories contain data files used in analyasis and epitope result tables.
- Linear Epitopes - folder with scripts for linear epitope 3D protein structure generation using ESM machine learning model and visualization.
In "data" foleder B-cell, MHC-I, MHC-II hold csv input files with predicted B and T cell epitope data. Epitope data were predicted using tools from IEDB:
- B-cell http://tools.iedb.org/bcell/
- MHC-I http://tools.iedb.org/mhci/
- MHC-II http://tools.iedb.org/mhcii/
project
│ README.md
│ count_muation.ipynb
| parse.ipynb
| parse.py
| parseStruct.ipynb
│
└───sProt
│ │ S_epitope_alpha.xlsx
│ │ S_epitope_beta.xlsx
│ │ ...
| |
│ └───data
│ └───B-cell
| | alpha.csv
| | beta.csv
| | ...
|
│ └───MHC-I
| | alpha.csv
| | beta.csv
| | ...
│ └───MHC-II
| | alpha.csv
| | beta.csv
| | ...
| └───reference_sequences
| | alpha_S.fasta
| | beta_S.fasta
| | ...
| └───struct
│
└───nProt
│ │ S_epitope_alpha.xlsx
│ │ S_epitope_beta.xlsx
│ │ ...
| |
│ └───data
| └───B-cell
| | ...
| └───MHC-I
| └─── ...
|
|___Linear epitopes
|___longLinkers
|
|___shortLinkers
|
| generate-3D-struct-combinations.ipynb
| show_best_models.ipynb
- Run python script
- Results are saved in rez.csv
python parse.py example.fasta
NOT IMPLEMENTED: Pass referece sequence seperately.
python parse.py example.fasta reference.fasta
To see help run command without any parameters
python parse.py
Name: Edgars Liepa
email: [email protected]
LinkedIn: Edgars Liepa
Twitter: @liepa_edgars
- 0.1
- Initial Release
This project is licensed under the [NAME HERE] License - see the LICENSE.md file for details
- add that you can specify reference optinally in other file
- Crete tests
Inspiration, code snippets, etc.
A helpful checklist to gauge how your README is coming along:
- One-liner explaining the purpose of the module
- Necessary background context & links
- Potentially unfamiliar terms link to informative sources
- Clear, runnable example of usage
- Installation instructions
- Extensive API documentation
- Performs cognitive funneling
- Caveats and limitations mentioned up-front
- Doesn't rely on images to relay critical information
- License