How to integrate custom atom features? #75

feyhong1112 · 2024-08-01T21:48:34Z

Hi, may I ask if I can add the atom positions in atom_feature_utils.py? If I include the coordinates, does this mean I should directly sum the features in the node feature vector from the GNN graph? I apologize for the confusion, but I am not sure where the GNN model is located or where the features get embedded. If you could help me locate where the GNN model or the feature embedding process is, I would humbly thank you

kmaziarz · 2024-08-02T12:58:46Z

If you add an additional AtomFeatureExtractor in atom_feature_utils.py, and append it to the list of default featurizers, then that could work, but these featurizers only work on individual atoms, whereas your 3D position featurizer would need to take a look at the entire molecule to determine the conformation first.

Maybe a better approach would be to take a look at featurise_atoms in molecule_dataset_utils.py. As you can see, this function builds a NodeFeatures object by gathering features from the featurizers. At this point, the mol is in scope, so you could plug in your 3D position determination there, and then append the resulting positions to the all_atom_features list. Note that you may want to normalize (at least center) those positions before providing them, as the model would treat them like any other atom features, without any rototranslational invariance/equivariance.

I am not 100% sure everything will work out-of-the-box, but I did check that featurise_atoms is also called during decoding (see moler_decoding_utils.py), so plugging your change in that one place might just be sufficient.

feyhong1112 · 2024-08-06T21:34:33Z

Sir, Thank you so much.

feyhong1112 · 2024-08-22T10:31:10Z

Sorry to say that but what method you are use to embedded the coordinate into mol?
I have complicate error with
Converting graph sample for Cc1ccc(B(O)O)cc1Br failed - aborting!Converting graph sample for O=C(O)C(=O)Nc1sc2c(c1C(=O)O)CC[NH2+]C2 failed - aborting!

feyhong1112 · 2024-08-22T11:13:56Z

sorry to show my code which not very good structure format. Thank you to always help.

convert 2D or 3D SDF to mol rdkit and use smile to calculate the sa_score, clogp and others
extract the coordinate and emmbed to NodeFeature
below here is my code

def featurise_atoms(
    mol: Mol,
    atom_feature_extractors: List[AtomFeatureExtractor],
    motif_vocabulary: Optional[MotifVocabulary] = None,
    motifs: List[MotifAnnotation] = [],
) -> NodeFeatures:
    
    if motif_vocabulary is not None:
        atom_type_feature_extractor = next(
            featuriser
            for featuriser in atom_feature_extractors
            if isinstance(featuriser, AtomTypeFeatureExtractor)
        )

        enclosing_motif_id: Dict[int, int] = {}
        for motif in motifs:
            motif_id = motif_vocabulary.vocabulary[motif.motif_type]

            for atom in motif.atoms:
                enclosing_motif_id[atom.atom_id] = motif_id

        num_motifs = len(motif_vocabulary.vocabulary)

        all_atom_class_ids = []
        num_atom_classes = atom_type_feature_extractor.feature_width + num_motifs
    else:
        assert not motifs

        all_atom_class_ids = None
        num_atom_classes = None

    all_atom_features = []
    for atom_id, atom in enumerate(mol.GetAtoms()):
        atom_symbol = get_atom_symbol(atom)

        atom_features = [
            atom_featuriser.featurise(atom) for atom_featuriser in atom_feature_extractors
        ]

        if motif_vocabulary is not None:
            motif_or_atom_id = enclosing_motif_id.get(
                atom_id, atom_type_feature_extractor.type_name_to_index(atom_symbol) + num_motifs
            )
            assert motif_or_atom_id < num_atom_classes
            all_atom_class_ids.append(motif_or_atom_id)

        # Ensure the molecule has a conformer
        if mol.GetNumConformers() == 0:
            AllChem.EmbedMolecule(mol)  # Generate a conformer if none exists

        # Extract coordinate
        ligand_conformer = mol.GetConformer()
        atom_coordinates = np.array([
            [ligand_conformer.GetAtomPosition(atom_id)[0], ligand_conformer.GetAtomPosition(atom_id)[1],
             ligand_conformer.GetAtomPosition(atom_id)[2]]], dtype=np.float32)
        
        atom_distances = np.sqrt(np.sum(np.square(atom_coordinates.reshape(-1, 1, 3) - atom_coordinates.reshape(1, -1, 3)), axis=-1))
        interaction = np.where(atom_distances < 4.0, 1., 0.)

        mol_interactions = [interaction.reshape(-1)]

        atom_features = np.concatenate(atom_features).astype(np.float32)
        mol_interactions = np.concatenate(mol_interactions).astype(np.float32)

        all_atom_features.append(atom_features)
        all_atom_features.append(mol_interactions)

    return NodeFeatures(
        real_valued_features=all_atom_features,
        categorical_features=all_atom_class_ids,
        num_categorical_classes=num_atom_classes,
    )

kmaziarz · 2024-08-22T17:51:59Z

The all_atom_features list is supposed to hold one feature vector per atom, yet in your changed code you append to it twice when iterating over the atoms. If you want to add extra atom features, you would need to add them to the atom_features list (before the np.concatenate(atom_features).astype(np.float32) is called), and keep the single append to all_atom_features.

Note that this function deals with features for atoms, not atom pairs or bonds. An example atom feature would be atom_coordinates computed in your code. The atom_distances and interaction matrices you construct are for pairs of atoms, which doesn't really fit as atom features. You could start by using atom_coordinates (only the part corresponding to the particular atom that is being considered in the loop), and perhaps also some atom-level statistics of the interaction matrix (e.g. number of other atoms closer than a given distance), and see if then you can get the code to run.

kmaziarz self-assigned this Aug 2, 2024

kmaziarz added the question Request for help or information label Aug 2, 2024

kmaziarz changed the title ~~direct sum Atom conformer in atom_feature_utils.py~~ How to integrate custom atom features? Aug 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to integrate custom atom features? #75

How to integrate custom atom features? #75

feyhong1112 commented Aug 1, 2024 •

edited

Loading

kmaziarz commented Aug 2, 2024

feyhong1112 commented Aug 6, 2024

feyhong1112 commented Aug 22, 2024

feyhong1112 commented Aug 22, 2024

kmaziarz commented Aug 22, 2024 •

edited

Loading

How to integrate custom atom features? #75

How to integrate custom atom features? #75

Comments

feyhong1112 commented Aug 1, 2024 • edited Loading

kmaziarz commented Aug 2, 2024

feyhong1112 commented Aug 6, 2024

feyhong1112 commented Aug 22, 2024

feyhong1112 commented Aug 22, 2024

kmaziarz commented Aug 22, 2024 • edited Loading

feyhong1112 commented Aug 1, 2024 •

edited

Loading

kmaziarz commented Aug 22, 2024 •

edited

Loading