Replies: 1 comment
-
Thanks @tJala! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Polaris Link
https://polarishub.io/datasets/adaptyv-bio/egfr-binders-v0
README
This dataset contains 202 designed EGFR-binding protein sequences, along with experimental binding affinity results tested by the Adaptyv Bio team as part of the EGFR Binders Design Competition Round 1.
An additional data package containing raw lab data and kinetic curves can be downloaded here:
Methods
Metrics
PAE Scores
The designs were first assessed using the PAE_interaction metric. To calculate this, we began by generating a structural prediction using ColabFold (with 2 models, 5 recycles, and no initial guess or templates). The Predicted Aligned Error (PAE) of the top-ranked prediction was then averaged across residue pairs, where one residue belongs to the target and the other to the binder, as done here.
pLDDT Scores
For each design we also computed the corresponding predicted Local Distance Difference Test (pLDDT) scores from AlphaFold2. Instead of considering the entire protein complex, we focused exclusively on the binder chain, excluding other regions. We computed the average pLDDT score over all residues of the binder chain alone.
Sequence Similarity Check
We checked each sequence against several databases of known sequences. As part of the initial competition rules, only proteins that were more than 10 amino acids (AA) away from a known binder were considered valid and counted in the final leaderboard. The results of that similarity search are stored in the “similarity_check” column. The similarity check metric is calculated as
identity * coverage
, where:• Identity is the percentage of matching amino acids between the a subsequence of the query and a subsequence of the database entry.
• Coverage is the proportion of the query sequence that aligns with a database entry.
Proteins with less than 10 amino acid distance to a database entry were excluded from the competition. A
similarity_check
value of “null” indicates that no matches were found in any of the the databases.The databases that we checked are SwisssProt, THPdb, USPTO and binders designed by Cao et al. (2022). The scripts can be found in the scripts folder.
Experimental Workflow
DNA Design
The submitted protein sequences were reverse-translated, and the corresponding DNA sequences were optimized using Adaptyv's internal pipeline. This process considered several parameters, including optimal codon usage for cell-free systems, mRNA secondary structure stability, and synthesizability factors. Additionally, non-coding regions at the 5' and 3' ends, optimized for cell-free expression, were incorporated into the coding sequences. Suitable gene constructs were successfully generated for all submitted protein sequences.
Protein Synthesis
Protein synthesis was carried out using an optimized cell-free expression system, suitable for a wide range of proteins. The template DNA was added, and protein expression was conducted over a defined period. During the competition, at least two expression batches were performed for each sequence entry, with some sequences tested up to four times under varying conditions. Protein synthesis success was assessed via a label-free quantification assay. Sequences that yielded less than 0.02 µg/mL of protein were excluded from further experimental characterization.
Binding Assay
The binding assay was conducted using Bio-Layer Interferometry (BLI), a label-free technology for biomolecular interaction measurement. A multi-cycle kinetic assay was performed against the target antigen. Expressed ligands were immobilized on the probe surface using tag-specific chemistry, and several concentrations of the antigen (ranging from 316.2 nM to 10 nM) were flowed over the probe. The experiments were performed in duplicate using a PBS-T buffer with 0.02% BSA at 25°C.
Data analysis
The binding signals were baseline-corrected and globally fitted using a 1:1 binding model across all tested concentrations for each replicate (Global Fitting). This approach allowed us to extract the kinetic rates (association and dissociation) and calculate the affinity constants (KD) for each ligand. The predicted binding curves were generated based on the fit parameters, ensuring an accurate representation of the interaction dynamics. In cases where the maximum signal fell below the quantifiable threshold, or when the interaction kinetics were too fast relative to the device's temporal resolution, we employed equilibrium analysis to estimate the dissociation constant (KD). Each experimental replicate was analized independently.
Dataset Source
Adaptyv Bio (https://beta.adaptyvbio.com/)
Dataset Curation
https://github.com/polaris-hub/polaris-recipes/tree/adaptyv/org-AdaptyvBio/EGFR_binders/v0
Dataset Completeness
readme
,source
andcuration_reference
fields for my Polaris dataset.Anything else we should know?
No response
Beta Was this translation helpful? Give feedback.
All reactions