A way to represent MOF structures as binary vectors.
Written by Eric Taw
It takes MOFids, as generated by the mofid package and transforms them into fixed-length vectors of bits akin to fingerprinting molecules. In fact, under the hood, this package makes use of RDKit and its fingerprinting functions for part of its representations.
Bit vectors are simply vectors of zeros and ones. We use RDKit's implementation of ExplicitBitVect
as a memory-efficient way to store such vectors. As for why we use them, the cheminformatics literature has long used these to do similarity searches and machine learning. Moreover, a slight modification of common similarity metrics in a supervised machine learning context can provide importance information for certain features, automatically making machine learning models interpretable.
- rdkit (note that
pip
will not check for this. Install this fromconda
instead) - tqdm
Clone this repo into a directory of your choice, navigate to it, and do pip install .
Thanks to the developers of tobacco_3.0 for releasing their topology files! Fingerprinting topologies otherwise would've been much more difficult.