This repository provides tools and scripts for extracting and adding annotations to EMDB entries, which are used to enhance the metadata associated with EM datasets.
- Installation
- Configuration
- Usage
- Contributing
- License
To install the necessary dependencies, run: pip install -r requirements.txt
The repository uses a config.ini file for configuration, which is not included in the repository. This file should be created in the root directory of the project with the following structure:
[file_paths]
uniprot_tab: <path_to_file>/uniprot.tsv
CP_ftp: <path_to_file>/complextab
components_cif: <path_to_file>/components.cif
chem_comp_list: <path_to_file>/chem_comp_list.xml
pmc_ftp_gz: <path_to_file>/PMID_PMCID_DOI.csv.gz
pmc_ftp: <path_to_file>/PMID_PMCID_DOI.csv
emdb_pubmed: <path_to_file>/emdb_pubmed.log
emdb_orcid: <path_to_file>/emdb_orcid.log
assembly_ftp: <path_to_file>/assembly/
BLAST_DB: <path_to_file>/ncbi-blast-2.13.0+/database/uniprot_sprot
BLASTP_BIN: blastp
sifts_GO: <path_to_file>/pdb_chain_go.csv
GO_obo: <path_to_file>/go.obo
GO_interpro: /nfs/ftp/pub/databases/GO/goa/external2go/interpro2go
sifts: <path_to_file>/split_xml/
alphafold_ftp: <path_to_file>/accession_ids.txt
rfam_ftp: <path_to_file>/rfam_files_combined.txt
[api]
pmc: https://www.ebi.ac.uk/europepmc/webservices/rest/searchPOST
To use the tools and scripts in this repository, you just need to clone it and ensure the config.ini file is properly configured as described above.
Execute the scripts independently in the following recommended order:
fetch_empiar.py: python fetch_empiar.py -w <output_dir_to_store_annotated_empiar_files> -f <path_to_empiar_metadata_files>
fetch_pubmed.py: python fetch_pubmed.py -w <output_dir_to_store_annotated_pubmed_files> -f <path_to_emdb_metadata_files>
added_annotations.py: python added_annotations.py -w <output_dir_to_store_added_annotations> -f <path_to_emdb_metadata_files> --all -t <number_of_threads>
fetch_afdb.py: python fetch_afdb.py -w <output_dir_to_store_annotated_alphafdb_files>
write_xml.py: python write_xml.py <output_dir_to_store_EMICSS_xml_files>
For more information about EMICSS, visit the official EMICSS website (https://www.ebi.ac.uk/emdb/emicss). This page provides detailed information about the EMDB/EMICSS project.