Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace PyRosetta with OpenMM in Fragmenstein #1078

Open
phraenquex opened this issue Jul 4, 2023 · 36 comments
Open

Replace PyRosetta with OpenMM in Fragmenstein #1078

phraenquex opened this issue Jul 4, 2023 · 36 comments

Comments

@phraenquex
Copy link
Collaborator

No description provided.

@matteoferla
Copy link
Collaborator

@phraenquex
Copy link
Collaborator Author

phraenquex commented Oct 10, 2023

@matteoferla:

Get Steph's use-case and see if there's a work-around.

Bug OpenMM community about the (1) funny numbers and (2) very slow running.

To finalise the ticket:
A. RMSD needs to be sensible. (Energy not important.)
B. Slow running (probably) okay. @alanbchristie check if Knitwork is well implented to exploit NextFlow. #944)

@matteoferla
Copy link
Collaborator

Get Steph's use-case and see if there's a work-around.

She said there was no issue in the Fragmenstein side

OpenMM

I could not get the calculated values to behave nor cut down the speed, it all works fine, but the values on testing are terrible.
However, I have "simply" fixed the RDKit minimization to be aware of the protein, but as a immobile entity.
(Technical detail: By include a frozen extracted protein neighbourhood, which resolves the clashes problem in all cases bar when the template ought to move (i.e. this is not flexible))

The values are a bit hairy, but the sign is somewhat consistent.
(Technical detail: on account it is internal energy U, and not a hybrid(=fudged) Gibbs potential. Plus in vacuum is not great, but could be argued to be a pedantic issue)

Even with PyRosetta the interactions were the main driver and the results get filtered by bad energy and sorted by interactions primarily at least in my hands.

Bad:

  • fixed template
  • in vacuo

Good:

  • takes a fraction of a second
  • it works

@phraenquex
Copy link
Collaborator Author

@matteoferla when is your RDkit fix going into production?

(@Waztom add to agenda for Thursday.)

@matteoferla
Copy link
Collaborator

matteoferla commented Nov 14, 2023

Unless the Squonk job has a frozen version of fragmenstein (doubtful), it should have been in production since last month (i.e. version 0.13.20)

@tdudgeon —Is this true? And if not, could the version of Fragmenstein be changed to the latest 0.13.34. I have access only to https://gitlab.com/informaticsmatters/squonk-fragmenstein

@phraenquex
Copy link
Collaborator Author

@tdudgeon there's question for you from @matteoferla, in the question above.

If this is indeed deployed, then the ticket can move to "In production."

@tdudgeon
Copy link
Collaborator

Currently Fragmenstein is used for 2 sets of jobs:

  1. original fragmenstein jobs (https://github.com/InformaticsMatters/squonk2-fragmenstein)
  2. Steph/Rubens fragment merges (https://github.com/stephwills/fragment_network_merges)

Both use the same base container that currently uses Fragmenstein 0.12.0 (https://github.com/InformaticsMatters/squonk2-fragmenstein/blob/main/requirements.txt#L2)

Updating this is simple and will just need the container images to be rebuilt.

But are there other impacts? If this means that the updated jobs would now use OpenMM for minimisation (rather that Pyrosetta (originally) or nothing (more recently)) do we need some extra options to allow this minimisation to be enabled and/or configured?

@matteoferla
Copy link
Collaborator

Thank you for the link: that is super helpful!

No, they will not use OpenMM for minimisation —rigid backbone only.

Custom options For the PyRosetta, the academic username and password are universal, so we could conceivably have an input box wherein the user provides the password which if its hash is correct runs the job. For some demo I did it that way.

Future I should say that there is another ticket, wherein I need to Squonkificate the normal Fragmenstein pipeline (bunk double or triple mergers, analog-by-catalog search, placement, clustering and ranking), but I need to change the analog-by-catalog search by copying Steph's FragmentKnitting via FragmentNetwork as oppose to use sw.docking.org. Steph is busy so I have been trying to reverse engineer it. I will talk to Rubén about how the Squonkification of Steph's code.

@tdudgeon
Copy link
Collaborator

So Fragmenstein version should be increased to 0.13.34, without any other changes being needed?

@phraenquex
Copy link
Collaborator Author

@matteoferla your answers don't look relevant to this ticket; if you need them actioned, please create new ticket. I suspect they pertain to Brownish release.

As for "custom options": no need for a ticket, the answer is categorical: there will be no PyRosetta in Fragalysis, or any support for custom licenses, certainly not in the forseeable future - we don't have the legal backup to make this happen.

@matteoferla
Copy link
Collaborator

@phraenquex Apologies. I will try to be more clear in future
@tdudgeon Yes, that is correct and sorry for simply pressing an emoji response in lieu of a written reply

@phraenquex
Copy link
Collaborator Author

@tdudgeon @matteoferla can I move this into Production swimlane now?

If not, what are the remaining actions?

@tdudgeon
Copy link
Collaborator

I still need to update the fragmenstein version and deploy.
This should not affect any functionality.

@phraenquex
Copy link
Collaborator Author

I should have asked: when is that happening? Is it a lot of work - or can it happen now, without slowing down green release?

@matteoferla you're on standby to test it, presumably?

@matteoferla
Copy link
Collaborator

matteoferla commented Nov 24, 2023

@phraenquex my help stops at saying to change this line to Fragmenstein == 0.13.36, or Fragmenstein >= 0.13.36: the creation of an image for Squonk is a sys admin task beyond my skillset or permissions

@tdudgeon
Copy link
Collaborator

Yes, I just need to make the change and rebuild the images. Just not had time to do so yet.

@phraenquex
Copy link
Collaborator Author

@tdudgeon that hasn't answered my question.

@matteoferla no ticket is complete until someone knowledgeable has confirmed that it "works". You're the only one that could credibly do that, at the moment. Unless you show @mwinokan how to do it.

@matteoferla
Copy link
Collaborator

@phraenquex: yes, as soon as @tdudgeon signals that the image rebuild is complete I will test it.
From the image rebuild side, I am guessing this fully completely depends if Tim's infrastructure in Diamond is working following the shutdown earlier this month.

@tdudgeon
Copy link
Collaborator

The fragmenstein images have now been rebuilt, and should be testable in Squonk.

Steph's fragment knitting images need to be rebuilt by @rsanchezgarc as I can't push to the XChem DockerHub repo.
Ruben, update the base container for your one to informaticsmatters/squonk2-fragmenstein-base/1.0.4 (see https://hub.docker.com/layers/informaticsmatters/squonk2-fragmenstein-base/1.0.4/images/sha256-01f71f20308a4fa2f3c3a3562557368f7ddc70b2e177334456bb0173410cc0ab?context=repo), or you can use the stable tag if you don't mind the base image potentially shifting over time. Once done that job can be tested too.

@phraenquex No, it shouldn't be moved to production until the above has been done. But really this isn't much to do with Fragalysis staging/production/etc as it's purely Squonk related, and Squonk will use the new images as soon as they become available.

@phraenquex
Copy link
Collaborator Author

Thanks @tdudgeon. @rsanchezgarc could you give an indication of when this can happen? (If you can't do it, we'll ask @mwinokan or @Waztom to take it on.)

Correction: They are absolutely staging/production related: the categories refer to what's available to users, not to some arcane technical details about repos and stuff. They always have.

@rsanchezgarc
Copy link
Collaborator

@tdudgeon @matteoferla . Shouldn't we also change the line of code in which we set up Victor or Wictor or whatever the new openmm class is called?

@matteoferla
Copy link
Collaborator

@rsanchezgarc —OpenVictor was a no go. OpenMM energy minimisation is very slow —it happens fully in cartesian space without internal space tricks and akin to an MD run at zero kelvin with a shaking sampler to add motion as far as I can tell. Template choice becomes critical in Wictor (without PyRosetta) as the neighbourhood is frozen. So RDKit is the only choice here.

@rsanchezgarc
Copy link
Collaborator

@matteoferla .Then, how do you use it? Just as our old v = Victor() ?

@rsanchezgarc
Copy link
Collaborator

@tdudgeon. I already had FROM informaticsmatters/squonk2-fragmenstein-base:stable in the Dockerfile
@matteoferla said that we need to use Wictor()
So I just built the new image and pushed it. Is that all @tdudgeon?

@tdudgeon
Copy link
Collaborator

If the job hasn't changed (only the fragmenstein implementation) then all that's needed is a new container image.

@rsanchezgarc
Copy link
Collaborator

@tdudgeon so it should be ready now

@phraenquex
Copy link
Collaborator Author

@tdudgeon thinks it was already done, needs to check - please do for green release.

@phraenquex phraenquex added the 2024-03-13 green Data dissemination label May 23, 2024
@mwinokan mwinokan moved this to Dev Done - Do review (DEV) in Fragalysis May 29, 2024
@mwinokan
Copy link
Collaborator

@tdudgeon can you confirm these changes were made? We will need to revisit the deployed algorithms in general in the orange release

@tdudgeon
Copy link
Collaborator

The jobs are currently using Fragmenstein version 0.13.36 which I believe is one that no longer uses PyRosetta, but @matteoferla should confirm this.
Also, the code seems to be exclusively using Wictor not Victor.

@mwinokan
Copy link
Collaborator

Thank you for confirming @tdudgeon

@mwinokan mwinokan moved this from Dev Done - Do review (DEV) to In production (Done) in Fragalysis Jun 12, 2024
@matteoferla
Copy link
Collaborator

@tdudgeon — I can confirm that is correct.
The current version is 1.0.6, but there were only minor bug fixes, so 0.13 should be okay, functionally.

As you said, Fragmenstein can be run

  • default (PyRosetta) — Victor class
  • RDKit only with PyRosetta —Wictor class. The one to use here.
  • openMM — OpenVictor class. Requires GPU and is way too slow for 1k+ mergers

The code implemented in Fragalysis only does a merger, which is likely not in catalogue space.
The pipeline normally consists of Fragmenstein mergers, analogue search in catalogues via NextMove Software's SmallWorld server hosted by John Irwin and analogues placed by Fragmenstein and results analysed. The middle bit needed changing to be open/independent. The other method I have used was OpenEye GraphSim, which is not open. I never did test an open analogue search —I had my eyes on faiss library as it can be run in CPU or GPU.

@mwinokan
Copy link
Collaborator

Thanks @matteoferla. We will revisit the deployed algorithms in a later release (see #1454), and I will get in touch regarding the analogue search (although we quite like the 'pure' merges for CAR anyway)

@matteoferla
Copy link
Collaborator

@mwinokan, great, using the unaltered mergers for the in-house synthesis pipeline makes total sense: jumping into catalogue space is infuriating (eg. heteroarenes to benzenes and loss of substituents) and that is even with the fact that SW runs not on FPs but on edit distance —the Astex Fragment network is an open copycat but with fewer of Roger and John's mathemagical tricks. 🤷

@phraenquex
Copy link
Collaborator Author

@mwinokan I've updated #1454 to include that headline from the Joint Meeting, to filter RDkite poses with PoseBusters.

That may be necessary for this ticket too - especially if it's not hard to do.

@mwinokan
Copy link
Collaborator

@phraenquex I think it's best to address it in #1454

@phraenquex
Copy link
Collaborator Author

Sure - in which case, we're probably better off moving this ticket to #won'tfix - life being too short, and there being loads of loose ends with the overall problem anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In production (Done)
Development

No branches or pull requests

5 participants