-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fetching from the AlphaFold structure database #492
base: main
Are you sure you want to change the base?
Conversation
- add module for fetching from AlphaFold DB - add tests for module - more to come
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thanks for implementing this long sought feature request. I know this is only a draft, but I had some time for review, so I can already share some thoughts 😃.
@@ -0,0 +1,34 @@ | |||
name: Python Package using Conda |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this part of the PR is out of scope. Furthermore, testing a conda build is already done in the Biotite feedstock in conda-forge. I am in favor of flake8 formatting but to me it is another topic, especially I would prefer to keep it strict and require proper flake8 formatting in all source files. Hence, this would require some effort to fix the flake8 findings in all source files.
_fetch_url = "https://alphafold.com/api/prediction" | ||
|
||
|
||
def fetch(ids, target_path=None, format="pdb", overwrite=False, verbose=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This purely optional, but I think it would be nice to directly support the Alphafold identifiers as well. Furthermore, accepting the Alphafold identifiers given from the PDB computational models would even enable synergy between both, the rcsb
and alphafold
interfaces.
While using official Alphafold identifers should be straight forward (only the v4
would need to be updated when a new version appears), using the identifier from the PDB would require some string editing. Taken from the PDB documentation:
Each CSM is assigned a specific ID in its source database and a prefix indicates the source of the model (“AF” for AlphaFold DB, "MA" for ModelArchive). AlphaFold DB identifiers are then followed by the UniProt accession number for the protein and by the fragment number (usually “F1”). However, in order to enable compatibility of the IDs with many of our services, including all of our APIs and visualization tools, we identify CSMs on RCSB.org using a modified version of the ID. This ID is used on the structure summary page, in searching for structures, in the search results page, and in various tools for 3D structure visualization and analysis. For example, for the AlphaFold structure AF-B3EWR1-F1, the RCSB.org assigned CSM ID is AF_AFB3EWR1F1 and is used in the query results page as shown in Figure 4.
Which type of identifier is present can be detected from whether it starts with AF_
(PDB), AF-
(Alphafold DB) or something else (Uniprot).
I would let you decide if we want this feature directly from the beginning
|
||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, are you planning to continue work on this PR? If not, I could finish the AFDB interface. |
Would be amazing if you’d finish it up @padix-key |
This PR adds the ability to fetch databases from the AlphaFold structure database by querying on UniProt accession
biotite.database.alphafold
module for fetching structures via UniProt accessiontests/database/test_alphafold.py