NOTE: This package is no longer maintained and trained SOMs may have issues with stability. I will refrain from archiving this repository for the time being because I may end up releasing an updated version for PyTorch which I will link to before archiving. Like everything else, use this code at your own risk and please do some sanity checks. Thanks!
An implementation of the Kohonen self-organizing map1 for TensorFlow 1.5 and Python 3.6. A Tensorflow V2 version has been contributed by Dragan Avramovski and is in the tfv2 branch. (Thanks Dragan!) This was initially based off of Sachin Joglekar's code but has a few key modifications:
- Uses TensorFlow broadcasting semantics instead of
tf.pack
andfor
loops. - Input data is expected from a
Tensor
rather than atf.placeholder
, allowing for use with faster and more complex input data pipelines. - Training uses the batch algorithm rather than the online one, providing a major speed boost if you have the GPU RAM. Also, as a result of that, I added...
- Multi-GPU support (for single machines with multiple GPUs, it doesn't have multi-node training).
- Some summary operations for Tensorboard visualization
example.py
contains a simple example of its usage by training a SOM on a 3 cluster toy dataset. The resulting
u-matrix should look something like this:
Note that the example requires scikit-learn to be installed.
I was going to write a blog post about this but I ended up just repeating everything I wrote in the comments, so please read them if you'd like to understand the code. For reference, the batch formula for SOMs is
where theta is the neighborhood function and x is the input vector.
The activity function turns the distance between each of the weights and an input vector into a value between 0 and 1, i.e. similar weights elicit a higher activity.
The activity function is parameterized with the output_sensitivity
variable.
When this value is close to zero the range of distances that elicit high activity is wider, and vice versa.
Here is an example of a few different values of the output sensitivity (-c
here):
There was a really dumb bug in commits before 2a0ee25 where the learning rate (alpha) was set extremely incorrectly. Instead of shrinking from n to 0, it grew from n to 1. If you had bad luck with this module previously, this may fix it. Sorry for the goof!