Adaptyv EGFR Binders Competition (Round 1) #20

tJala · 2024-09-27T08:24:05Z

tJala
Sep 27, 2024

Polaris Link

https://polarishub.io/datasets/adaptyv-bio/egfr-binders-v0

README

This dataset contains 202 designed EGFR-binding protein sequences, along with experimental binding affinity results tested by the Adaptyv Bio team as part of the EGFR Binders Design Competition Round 1.

An additional data package containing raw lab data and kinetic curves can be downloaded here:

https://api.adaptyvbio.com/storage/v1/object/public/egfr_design_competition/package.zip

Methods

Metrics

PAE Scores

The designs were first assessed using the PAE_interaction metric. To calculate this, we began by generating a structural prediction using ColabFold (with 2 models, 5 recycles, and no initial guess or templates). The Predicted Aligned Error (PAE) of the top-ranked prediction was then averaged across residue pairs, where one residue belongs to the target and the other to the binder, as done here.

pLDDT Scores

For each design we also computed the corresponding predicted Local Distance Difference Test (pLDDT) scores from AlphaFold2. Instead of considering the entire protein complex, we focused exclusively on the binder chain, excluding other regions. We computed the average pLDDT score over all residues of the binder chain alone.

Sequence Similarity Check

We checked each sequence against several databases of known sequences. As part of the initial competition rules, only proteins that were more than 10 amino acids (AA) away from a known binder were considered valid and counted in the final leaderboard. The results of that similarity search are stored in the “similarity_check” column. The similarity check metric is calculated as identity * coverage, where:

• Identity is the percentage of matching amino acids between the a subsequence of the query and a subsequence of the database entry.

• Coverage is the proportion of the query sequence that aligns with a database entry.

Proteins with less than 10 amino acid distance to a database entry were excluded from the competition. A similarity_check value of “null” indicates that no matches were found in any of the the databases.

The databases that we checked are SwisssProt, THPdb, USPTO and binders designed by Cao et al. (2022). The scripts can be found in the scripts folder.

Experimental Workflow

DNA Design

The submitted protein sequences were reverse-translated, and the corresponding DNA sequences were optimized using Adaptyv's internal pipeline. This process considered several parameters, including optimal codon usage for cell-free systems, mRNA secondary structure stability, and synthesizability factors. Additionally, non-coding regions at the 5' and 3' ends, optimized for cell-free expression, were incorporated into the coding sequences. Suitable gene constructs were successfully generated for all submitted protein sequences.

Protein Synthesis

Protein synthesis was carried out using an optimized cell-free expression system, suitable for a wide range of proteins. The template DNA was added, and protein expression was conducted over a defined period. During the competition, at least two expression batches were performed for each sequence entry, with some sequences tested up to four times under varying conditions. Protein synthesis success was assessed via a label-free quantification assay. Sequences that yielded less than 0.02 µg/mL of protein were excluded from further experimental characterization.

Binding Assay

The binding assay was conducted using Bio-Layer Interferometry (BLI), a label-free technology for biomolecular interaction measurement. A multi-cycle kinetic assay was performed against the target antigen. Expressed ligands were immobilized on the probe surface using tag-specific chemistry, and several concentrations of the antigen (ranging from 316.2 nM to 10 nM) were flowed over the probe. The experiments were performed in duplicate using a PBS-T buffer with 0.02% BSA at 25°C.

Data analysis

The binding signals were baseline-corrected and globally fitted using a 1:1 binding model across all tested concentrations for each replicate (Global Fitting). This approach allowed us to extract the kinetic rates (association and dissociation) and calculate the affinity constants (KD) for each ligand. The predicted binding curves were generated based on the fit parameters, ensuring an accurate representation of the interaction dynamics. In cases where the maximum signal fell below the quantifiable threshold, or when the interaction kinetics were too fast relative to the device's temporal resolution, we employed equilibrium analysis to estimate the dissociation constant (KD). Each experimental replicate was analized independently.

Dataset Source

Adaptyv Bio (https://beta.adaptyvbio.com/)

Dataset Curation

https://github.com/polaris-hub/polaris-recipes/tree/adaptyv/org-AdaptyvBio/EGFR_binders/v0

Dataset Completeness

I confirm that I filled out at least the readme, source and curation_reference fields for my Polaris dataset.

Anything else we should know?

No response

zhu0619 · 2024-10-03T14:13:30Z

zhu0619
Oct 3, 2024
Maintainer

Thanks @tJala!
The dataset adaptyv-bio/egfr-binders-v0) has been certified.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adaptyv EGFR Binders Competition (Round 1) #20

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Adaptyv EGFR Binders Competition (Round 1) #20

tJala Sep 27, 2024

Polaris Link

README

Methods

Metrics

PAE Scores

pLDDT Scores

Sequence Similarity Check

Experimental Workflow

DNA Design

Protein Synthesis

Binding Assay

Data analysis

Dataset Source

Dataset Curation

Dataset Completeness

Anything else we should know?

Replies: 1 comment

zhu0619 Oct 3, 2024 Maintainer

tJala
Sep 27, 2024

zhu0619
Oct 3, 2024
Maintainer