Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request training set data #124

Open
Jennyhu666 opened this issue Dec 24, 2024 · 3 comments
Open

Request training set data #124

Jennyhu666 opened this issue Dec 24, 2024 · 3 comments

Comments

@Jennyhu666
Copy link

I noticed that authors had released pre-processed pdb files, about 60G. Are all of these files used to train the model? Because I noticed that there are structures released after 2021 that also exist in this file. Request pdb data that the author used for training. Or could you please tell me where can I download them?

@gcorso
Copy link
Collaborator

gcorso commented Dec 24, 2024

Hi @Jennyhu666 the raw data was downloaded directly from the PDB website, you can select data used for training taking the following filters:

  • Number of chains between 1 and 300
  • Release date before 2021-09-30
  • Resolution below 9A

@Jennyhu666
Copy link
Author

Thank you. I want to test the model in practice, and if my target has not released the pdb structure before 2021 then it is not in the model training set. I want to know if there is, for example, a txt file that identifies all the pdb structures used in the training set.

@gcorso
Copy link
Collaborator

gcorso commented Dec 27, 2024

No, unfortunately, we don't have such a list but it shouldn't be too hard to obtain using the PDB API with the filters above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants