- The T-cell receptors (TCR) lies on the surface of the T-cell for recognition of foreign peptides.
- Peptides are presented by major histocompatibility complex (MHC) found on the surface of tumor cells or virus-infected cells.
- Common datasets for studying TCR-peptide interactions contain sequences of peptides and sequences of 𝛽 chain of CDR3 of TCRs.
- Nearest neighbor (SwarmTCR [cite/ft/f:@ehrlichSwarmTCRComputationalApproach2021])
- Distance-based minimization (TCRdist [cite/ft/f:@dashQuantifiablePredictiveFeatures2017])
- PCA with decision tree [cite/ft/f:@tongSETESequencebasedEnsemble2020]
- Random Forest [cite/ft/f:@gielisDetectionEnrichedCell2019a; @deneuterFeasibilityMiningCD82018]
- Deep Learning [cite/ft/f:@luDeepLearningbasedPrediction2021; @jianTCellReceptorPeptideInteraction2022]
- Format
- Positive (TCR, Peptide, MHC)
- And lots of TCRs
- Dataset
- VDJdb [cite/ft/f:@bagaevVDJdb2019Database2020]
- McPAS-TCR [cite/ft/f:@tickotskyMcPASTCRManuallyCurated2017]
\LARGE T-Cell Receptor-Peptide Interaction Prediction with Physical Model Augmented Pseudo-Labeling \normalsize [cite/ft/f:@jianTCellReceptorPeptideInteraction2022]
\Large Current datasets for training deep learning models of this purpose remain constrained without diverse TCRs and peptides.
\Large Extend training dataset
- Data-augmented psudo-label of TCR-peptide pairs
- Use teacher model to generate pseudo-labels and retrain the model with them
- Physical modeling of TCR-peptide interaction
- Molecular dynamic (MD)
- Docking energy
Docking is a computational method for predicting the structures of protein complex (e.g., dimer of two molecules) given the structure of each monomer. It searches the configuration of the complex by minimizing an energy scoring function.
In this work, they use the final docking energy (of the optimal structure of the complex) between a TCR and peptide as the surrogate binding label for the TCR-peptide pair.
- Dataset \(\mathcal{D}\)
- VDJdb [cite/ft/f:@bagaevVDJdb2019Database2020]
- McPAS-TCR [cite/ft/f:@tickotskyMcPASTCRManuallyCurated2017]
- Labeled (Training dataset, \(\mathcal{D}train\))
- TCR-peptide pairs with known binding affinity (1 positive, 0 negative)
- Unlabeled
- TCRdb (no peptide) with peptide from \(\mathcal{D}\).
- \(\mathcal{D}auxiliary\)
There are four steps in a single training step:
- Learning from labeled dataset \(\mathcal{L}label\)
- Learning from physical modeling \(\mathcal{L}phy\)
- Learning from data-augmented pseudo-labeling \(\mathcal{L}pseudo-label\)
- Look ahead meta-update
- \(pred = fθ(t, p)\)
- \(t\) is the TCR
- \(p\) is the peptide
- The embedding of TCR and peptide from ERGO [cite/ft/f:@springerPredictionSpecificTCRPeptide2020].
- TCRs use LSTM or AE
- Peptides use LSTM
- \(fθ\) is the model
- \(fθ = MLP(concat(t, p))\)
- \(\mathcal{L}label = BCE(pred, y)\)
- Molecular dynamic (MD): accurate but slow
- Docking energy: HDOCK [cite/ft/f:@yanHDOCKServerIntegrated2020a]
- TCR/Peptide -> BLAST+ -> MSA -> MODELLER -> Structure -> Docking energy
- Top 25% Negative
- Bottom 25% Positive
- \(pred’ = fθ(t’, p’)\)
- \((t’, p’)$ become tuples in $\mathcal{D}auxiliary\)
- \(\mathcal{L}phy = BCE(pred’, y)\)
- \(prob = fteacher(t’, p’)\)
- \(pred’ = fθ(t’, p’)\)
- \(\mathcal{L}pseudo-label = \mathtt{KL-divergence}(pred’, prob)\)
- Learning from labeled dataset
- \(out = model(t, p)\)
- \(\mathcal{L}label = BCE(out)\)
- \(model.update(\mathcal{L}label)\)
- Learning from data-augmented pseudo-labeling
- \(out = model(t’, p’)\)
- \(out’ = modelteacher(t’, p’)\)
- \(\mathcal{L}pseudo-label = KL(out, out’)\)
- \(model.update(\mathcal{L}pseudo-label)\)
- \(param = model.param\)
- Learning from physical modeling
- \(out = model(t’, p’)\)
- \(\mathcal{L}phy = BCE(out)\)
- \(model.update(\mathcal{L}phy)\)
- Look ahead meta-update
- Learning Rate * 2
- \(\mathcal{L} = BCE(model(t, p))\)
- If \(\mathcal{L} > \mathcal{L}label\)
- \(model.param = param\)
- A rare peptide KRWIILGLNK has only AUC score of 52.8,
- while this method achieves 68.1.
- Note that the average AUC for all peptides is 54.4.
- Goal: Improve the prediction of TCR-peptide interactions
- Solution:
- Docking energies as the physical properties between TCR-peptide pairs
- Data-augmented pseudo-labeling
- Look ahead meta-update
- Experiments on VDJdb and McPAS datasets