-
Notifications
You must be signed in to change notification settings - Fork 480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Answer to first question #13
Comments
Obviously it is a problem of great complexity, on different levels but as you yourself have mentioned we are very, very far from pure reverse engineering an organism, but every small step in that direction leaves us less and less far. |
What approach could work in nature? I'm thinking that the real way forward with this is deep learning powered molecular dynamics simulators. My understanding is that the main thing holding them back for protein folding is that proteins fold 1000x slower than we can currently simulate (on the ms timescale, not us). http://www.ks.uiuc.edu/Research/folding/ Compute for that villin headpiece folding: (76 aa * 19.20 atoms/aa)**2 * 10 ms * 1e10 steps/ms ~ 1e17 While the people who wrote these simulators are probably great at physics and chemistry, I'm not sure how good they are at programming (I judge this based on reading the docs, though I may be off base). I'm also not sure what fidelity the simulation needs to be at to get results, you know any studies on this? I wouldn't want to simulate one protein in isolation either, I want to simulate the whole thing. ~1M atoms for 1 s. At a 1/500 ps timescale, which was the default in the openmm tutorial, that's 10^15 steps * (10^6)^2 atoms (squared for all the interactions). That's 10^27 or 2^89, beyond what computers are capable of right now, and perhaps why people don't try this. But what fidelity is actually required? How sparse is the matrix of atom to atom interactions? If it's more like 10^9 * 10^6 * 10^3 is much more doable. I have access to a petaflop, can do that in 15 minutes. |
Of course this is the main problem out there (mine too, I'm just able to do scrap scripts).
Speaking about DL approaches for 3D protein structure, maybe the best out there is the DeepMind's project called AlphaFold (paper here
Here comes the problem. Proteins are not just composed by amminoacids: they usually were modified inside organisms with the addition of carbohydrates, lipids or other molecules (sometimes also inorganic ions, like the hemoglobin). Also the well-known spike protein of SARS-CoV-2 is highly glycosilated. The only way to know which kind of post-translation modification occurs is to do a chemical investigation (like mass spectrum and NMR) to model the 3D structure. There's some pattern discovery methods used out there, but they're not accurate (based on hidden Markov Model, mainly).
This is not needed. Proteins are polymers, made by 20 monomers (amminoacids) which were assembled following chemical rules. Protein folding were determined mainly by the charged parts (anions, cations) of each residue, localized onto the lateral chain (R into the image). So the question here is to identify how this residues were positioned into a 3D space to identify a "chemical force" (in term of positive and negative charge) which could be used to know how each residual chain is collocated regards to other amminoacids (also in term of 3D coordinates). This is the "real true" approach, which could determine how a "pure" protein (so with no post-translation modifications) is made: maybe the computational effort is so huge that we're currently using heuristic approaches. |
From https://www.cell.com/neuron/pdf/S0896-6273(18)30684-6.pdf For systems with about 50,000 atoms (typical for a moderately sized, solvated protein), one GPU can currently simulate a microsecond in a few days. Speeding this up by 1000x really shouldn't be too hard, and by shouldn't be too hard, I mean a startup of 5 could do it in a year. I think there's a potential opportunity here, I see no reason why this can't fold proteins. How much is a solution to the protein structure prediction problem worth? The best lab working on it today? https://zhanglab.ccmb.med.umich.edu/papers/2017_3.pdf "...leaving simulation timescales as the main barrier for MD ab initio folding simulations" The referred to Chapter 12: https://www.mpibpc.mpg.de/15873626/Kubitzki_2017_ProteinDynamics.pdf Without a simulator, this project is hard to continue. It's a bunch of CAD models I can't render. Static analysis tools are limited to crude FLIRT signatures and neural nets being asked to do things a human can't. And I'm definitely not doing any pipetting. The way forward for bio is simulation, doesn't have to be cycle accurate, but good enough to see the emergent behavior. A good simulator should capture the whole stack, starting with physics based energy functions. Then you can learn functions to accelerate computation, with the ability to check work with the lowest levels. |
Don't understand why we're talking about molecular dynamics at this time, because these prediction are used just to know how a protein fold by itself and become useful in other application. Viruses like SARS-CoV-2 encode just for a small number of proteins, where you could identify 3 main actors:
Although could seem cool to predict what a virus do inside a cell (connecting the reverse engineering theme) maybe it needs a specific predictor which needs to be calibrated with experimental knowledge, so it appears a time-consuming approach with no real advantages. We've got powerful methods to track at a molecular level what a virus do since its infection and the cell response; unlike predictors, these methods are replicable and evidence-based, so they're certainly more accepted. Just to close the argument about 3D protein structure prediction and its application. It becomes useful in this case not to understand viral infection dynamic, but mainly to design some powerful drugs. An efficient antiviral is usually a molecule similar to a nucleotide, which has got a strong affinity with the RNA-polymerase of the virus in order to bind the protein and never leave it. You'll certainly see out there everyone excited for hydroxycloroquine or retonavir: they've got for sure antiviral activity, but they're not really efficient. During the first years of the 2000s several drug designing software were released and none of them were capable to do a proper work, so drug companies gradually lose interest in prediction softwares. Now, with the new machine learning golden age, who knows.. This work, released last year, is rapidly becoming a milestone for 3D protein structure prediction with a proper using of NN. To understand more about predictions for drug discovery, give a read here. |
I don't think neural networks for structure prediction are the right approach. They are supervised, and this is nothing like how nature solves the problem. You need to simulate through the real trajectory. |
We know the cleaveage sites into the protein, as explained here.
Published on Cell, https://doi.org/10.1016/j.cell.2020.02.052
Besides that, as a biotechnologist I would recommend to stop thinking that this approach could work in nature. Most of your questions could be answered by an undergraduate with some knowledge and ability to read and understand scientific papers.
We're able to engineer some organisms, ya sure, but we're so far to a pure "reverse engineering", because of chemical interactions which causes that every protein, every molecule inside a cell couldn't be traited as a standalone thing.
The text was updated successfully, but these errors were encountered: