Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to integrate custom atom features? #75

Open
feyhong1112 opened this issue Aug 1, 2024 · 5 comments
Open

How to integrate custom atom features? #75

feyhong1112 opened this issue Aug 1, 2024 · 5 comments
Assignees
Labels
question Request for help or information

Comments

@feyhong1112
Copy link

feyhong1112 commented Aug 1, 2024

Hi, may I ask if I can add the atom positions in atom_feature_utils.py? If I include the coordinates, does this mean I should directly sum the features in the node feature vector from the GNN graph? I apologize for the confusion, but I am not sure where the GNN model is located or where the features get embedded. If you could help me locate where the GNN model or the feature embedding process is, I would humbly thank you

@kmaziarz kmaziarz self-assigned this Aug 2, 2024
@kmaziarz kmaziarz added the question Request for help or information label Aug 2, 2024
@kmaziarz
Copy link
Collaborator

kmaziarz commented Aug 2, 2024

If you add an additional AtomFeatureExtractor in atom_feature_utils.py, and append it to the list of default featurizers, then that could work, but these featurizers only work on individual atoms, whereas your 3D position featurizer would need to take a look at the entire molecule to determine the conformation first.

Maybe a better approach would be to take a look at featurise_atoms in molecule_dataset_utils.py. As you can see, this function builds a NodeFeatures object by gathering features from the featurizers. At this point, the mol is in scope, so you could plug in your 3D position determination there, and then append the resulting positions to the all_atom_features list. Note that you may want to normalize (at least center) those positions before providing them, as the model would treat them like any other atom features, without any rototranslational invariance/equivariance.

I am not 100% sure everything will work out-of-the-box, but I did check that featurise_atoms is also called during decoding (see moler_decoding_utils.py), so plugging your change in that one place might just be sufficient.

@feyhong1112
Copy link
Author

Sir, Thank you so much.

@feyhong1112
Copy link
Author

Sorry to say that but what method you are use to embedded the coordinate into mol?
I have complicate error with
Converting graph sample for Cc1ccc(B(O)O)cc1Br failed - aborting!Converting graph sample for O=C(O)C(=O)Nc1sc2c(c1C(=O)O)CC[NH2+]C2 failed - aborting!

@feyhong1112
Copy link
Author

sorry to show my code which not very good structure format. Thank you to always help.

  1. convert 2D or 3D SDF to mol rdkit and use smile to calculate the sa_score, clogp and others
  2. extract the coordinate and emmbed to NodeFeature
    below here is my code
def featurise_atoms(
    mol: Mol,
    atom_feature_extractors: List[AtomFeatureExtractor],
    motif_vocabulary: Optional[MotifVocabulary] = None,
    motifs: List[MotifAnnotation] = [],
) -> NodeFeatures:
    
    if motif_vocabulary is not None:
        atom_type_feature_extractor = next(
            featuriser
            for featuriser in atom_feature_extractors
            if isinstance(featuriser, AtomTypeFeatureExtractor)
        )

        enclosing_motif_id: Dict[int, int] = {}
        for motif in motifs:
            motif_id = motif_vocabulary.vocabulary[motif.motif_type]

            for atom in motif.atoms:
                enclosing_motif_id[atom.atom_id] = motif_id

        num_motifs = len(motif_vocabulary.vocabulary)

        all_atom_class_ids = []
        num_atom_classes = atom_type_feature_extractor.feature_width + num_motifs
    else:
        assert not motifs

        all_atom_class_ids = None
        num_atom_classes = None

    all_atom_features = []
    for atom_id, atom in enumerate(mol.GetAtoms()):
        atom_symbol = get_atom_symbol(atom)

        atom_features = [
            atom_featuriser.featurise(atom) for atom_featuriser in atom_feature_extractors
        ]

        if motif_vocabulary is not None:
            motif_or_atom_id = enclosing_motif_id.get(
                atom_id, atom_type_feature_extractor.type_name_to_index(atom_symbol) + num_motifs
            )
            assert motif_or_atom_id < num_atom_classes
            all_atom_class_ids.append(motif_or_atom_id)

        # Ensure the molecule has a conformer
        if mol.GetNumConformers() == 0:
            AllChem.EmbedMolecule(mol)  # Generate a conformer if none exists

        # Extract coordinate
        ligand_conformer = mol.GetConformer()
        atom_coordinates = np.array([
            [ligand_conformer.GetAtomPosition(atom_id)[0], ligand_conformer.GetAtomPosition(atom_id)[1],
             ligand_conformer.GetAtomPosition(atom_id)[2]]], dtype=np.float32)
        
        atom_distances = np.sqrt(np.sum(np.square(atom_coordinates.reshape(-1, 1, 3) - atom_coordinates.reshape(1, -1, 3)), axis=-1))
        interaction = np.where(atom_distances < 4.0, 1., 0.)

        mol_interactions = [interaction.reshape(-1)]

        atom_features = np.concatenate(atom_features).astype(np.float32)
        mol_interactions = np.concatenate(mol_interactions).astype(np.float32)

        all_atom_features.append(atom_features)
        all_atom_features.append(mol_interactions)

    return NodeFeatures(
        real_valued_features=all_atom_features,
        categorical_features=all_atom_class_ids,
        num_categorical_classes=num_atom_classes,
    )

@kmaziarz
Copy link
Collaborator

kmaziarz commented Aug 22, 2024

The all_atom_features list is supposed to hold one feature vector per atom, yet in your changed code you append to it twice when iterating over the atoms. If you want to add extra atom features, you would need to add them to the atom_features list (before the np.concatenate(atom_features).astype(np.float32) is called), and keep the single append to all_atom_features.

Note that this function deals with features for atoms, not atom pairs or bonds. An example atom feature would be atom_coordinates computed in your code. The atom_distances and interaction matrices you construct are for pairs of atoms, which doesn't really fit as atom features. You could start by using atom_coordinates (only the part corresponding to the particular atom that is being considered in the loop), and perhaps also some atom-level statistics of the interaction matrix (e.g. number of other atoms closer than a given distance), and see if then you can get the code to run.

@kmaziarz kmaziarz changed the title direct sum Atom conformer in atom_feature_utils.py How to integrate custom atom features? Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Request for help or information
Projects
None yet
Development

No branches or pull requests

2 participants