Skip to content

tawe141/mof-bits

Repository files navigation

MOFBits

A way to represent MOF structures as binary vectors.

Written by Eric Taw

What does this do?

It takes MOFids, as generated by the mofid package and transforms them into fixed-length vectors of bits akin to fingerprinting molecules. In fact, under the hood, this package makes use of RDKit and its fingerprinting functions for part of its representations.

What are bit-vectors and why use them?

Bit vectors are simply vectors of zeros and ones. We use RDKit's implementation of ExplicitBitVect as a memory-efficient way to store such vectors. As for why we use them, the cheminformatics literature has long used these to do similarity searches and machine learning. Moreover, a slight modification of common similarity metrics in a supervised machine learning context can provide importance information for certain features, automatically making machine learning models interpretable.

Dependencies

  • rdkit (note that pip will not check for this. Install this from conda instead)
  • tqdm

Installation

Clone this repo into a directory of your choice, navigate to it, and do pip install .

Acknowledgements

Thanks to the developers of tobacco_3.0 for releasing their topology files! Fingerprinting topologies otherwise would've been much more difficult.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published