Skip to content

T-Cell Receptor-Peptide Interaction Prediction with Physical Model Augmented Pseudo-Labeling (slides by Nasy, paper is not our work)

Notifications You must be signed in to change notification settings

uta-smile/jianTCellReceptorPeptideInteraction2022-slides

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

T-Cell Receptor-Peptide Interaction Prediction with Physical Model Augmented Pseudo-Labeling

1 Introduction

1.1 TCR Peptide Interaction Prediction

1.1.1 TCR Peptide Interaction

  • The T-cell receptors (TCR) lies on the surface of the T-cell for recognition of foreign peptides.
  • Peptides are presented by major histocompatibility complex (MHC) found on the surface of tumor cells or virus-infected cells.
  • Common datasets for studying TCR-peptide interactions contain sequences of peptides and sequences of 𝛽 chain of CDR3 of TCRs.

1.1.2 Illustration of T-cell receptors (TCR) and peptide binding

./p1.png

1.2 Methods

  • Nearest neighbor (SwarmTCR [cite/ft/f:@ehrlichSwarmTCRComputationalApproach2021])
  • Distance-based minimization (TCRdist [cite/ft/f:@dashQuantifiablePredictiveFeatures2017])
  • PCA with decision tree [cite/ft/f:@tongSETESequencebasedEnsemble2020]
  • Random Forest [cite/ft/f:@gielisDetectionEnrichedCell2019a; @deneuterFeasibilityMiningCD82018]
  • Deep Learning [cite/ft/f:@luDeepLearningbasedPrediction2021; @jianTCellReceptorPeptideInteraction2022]

1.3 Datasets

  • Format
    • Positive (TCR, Peptide, MHC)
    • And lots of TCRs
  • Dataset
    • VDJdb [cite/ft/f:@bagaevVDJdb2019Database2020]
    • McPAS-TCR [cite/ft/f:@tickotskyMcPASTCRManuallyCurated2017]

2 T-Cell Receptor-Peptide Interaction Prediction with Physical Model Augmented Pseudo-Labeling

2.1 Paper

\LARGE T-Cell Receptor-Peptide Interaction Prediction with Physical Model Augmented Pseudo-Labeling \normalsize [cite/ft/f:@jianTCellReceptorPeptideInteraction2022]

2.2 Problem

\Large Current datasets for training deep learning models of this purpose remain constrained without diverse TCRs and peptides.

2.3 Solution

\Large Extend training dataset

2.4 Solution

  • Data-augmented psudo-label of TCR-peptide pairs
    • Use teacher model to generate pseudo-labels and retrain the model with them
  • Physical modeling of TCR-peptide interaction
    • Molecular dynamic (MD)
    • Docking energy

2.5 What is Docking energy

Docking is a computational method for predicting the structures of protein complex (e.g., dimer of two molecules) given the structure of each monomer. It searches the configuration of the complex by minimizing an energy scoring function.

In this work, they use the final docking energy (of the optimal structure of the complex) between a TCR and peptide as the surrogate binding label for the TCR-peptide pair.

2.6 Dataset

  • Dataset \(\mathcal{D}\)
    • VDJdb [cite/ft/f:@bagaevVDJdb2019Database2020]
    • McPAS-TCR [cite/ft/f:@tickotskyMcPASTCRManuallyCurated2017]
  • Labeled (Training dataset, \(\mathcal{D}train\))
    • TCR-peptide pairs with known binding affinity (1 positive, 0 negative)
  • Unlabeled
    • TCRdb (no peptide) with peptide from \(\mathcal{D}\).
    • \(\mathcal{D}auxiliary\)

2.7 Method

There are four steps in a single training step:

  • Learning from labeled dataset \(\mathcal{L}label\)
  • Learning from physical modeling \(\mathcal{L}phy\)
  • Learning from data-augmented pseudo-labeling \(\mathcal{L}pseudo-label\)
  • Look ahead meta-update

2.7.1 Overview

./p2.png

2.7.2 ERGO

./p3.png

2.8 Learning from labeled dataset \(\mathcal{L}label\)

  • \(pred = fθ(t, p)\)
    • \(t\) is the TCR
    • \(p\) is the peptide
    • The embedding of TCR and peptide from ERGO [cite/ft/f:@springerPredictionSpecificTCRPeptide2020].
      • TCRs use LSTM or AE
      • Peptides use LSTM
    • \(fθ\) is the model
      • \(fθ = MLP(concat(t, p))\)
  • \(\mathcal{L}label = BCE(pred, y)\)

2.9 Learning from physical modeling \(\mathcal{L}phy\)

  • Molecular dynamic (MD): accurate but slow
  • Docking energy: HDOCK [cite/ft/f:@yanHDOCKServerIntegrated2020a]
  • TCR/Peptide -> BLAST+ -> MSA -> MODELLER -> Structure -> Docking energy
    • Top 25% Negative
    • Bottom 25% Positive
  • \(pred’ = fθ(t’, p’)\)
    • \((t’, p’)$ become tuples in $\mathcal{D}auxiliary\)
  • \(\mathcal{L}phy = BCE(pred’, y)\)

2.10 Learning from data-augmented pseudo-labeling \(\mathcal{L}pseudo-label\)

  • \(prob = fteacher(t’, p’)\)
  • \(pred’ = fθ(t’, p’)\)
  • \(\mathcal{L}pseudo-label = \mathtt{KL-divergence}(pred’, prob)\)

2.11 Look Ahead Meta-Update

  • Learning from labeled dataset
    • \(out = model(t, p)\)
    • \(\mathcal{L}label = BCE(out)\)
    • \(model.update(\mathcal{L}label)\)
  • Learning from data-augmented pseudo-labeling
    • \(out = model(t’, p’)\)
    • \(out’ = modelteacher(t’, p’)\)
    • \(\mathcal{L}pseudo-label = KL(out, out’)\)
    • \(model.update(\mathcal{L}pseudo-label)\)
    • \(param = model.param\)
  • Learning from physical modeling
    • \(out = model(t’, p’)\)
    • \(\mathcal{L}phy = BCE(out)\)
    • \(model.update(\mathcal{L}phy)\)
  • Look ahead meta-update
    • Learning Rate * 2
    • \(\mathcal{L} = BCE(model(t, p))\)
    • If \(\mathcal{L} > \mathcal{L}label\)
      • \(model.param = param\)

2.12 Look Ahead Meta-Update

./pm.png

2.13 Results McPAS

2.13.1 LSTM

./p4.png

2.13.2 AE

./p5.png

2.14 Results VDJdb

2.14.1 LSTM

./p6.png

2.14.2 AE

./p7.png

2.15 Results Rare Peptides

  • A rare peptide KRWIILGLNK has only AUC score of 52.8,
  • while this method achieves 68.1.
  • Note that the average AUC for all peptides is 54.4.

./p8.png

3 Conclusion

  • Goal: Improve the prediction of TCR-peptide interactions
  • Solution:
    • Docking energies as the physical properties between TCR-peptide pairs
    • Data-augmented pseudo-labeling
    • Look ahead meta-update
    • Experiments on VDJdb and McPAS datasets

4 References

4.1 References

About

T-Cell Receptor-Peptide Interaction Prediction with Physical Model Augmented Pseudo-Labeling (slides by Nasy, paper is not our work)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages