This repo was used to prepare the talk given by Alex Mann at Cognitect's 2016 Conj Conference. It includes a standard implementation of tSNE, examples of data rendered this way, a novel implementation of interop between Clojure and Python, a number of datasets which can be rendered into Clojure objects, and some examples of generatives testing.
I want to start by citing the sources that helped me get this far. This list is by no means exhaustive as there are many blogs and whitepapers I consumed where the information remains and the name has fled.
- Original whitepaper by Hinton and van der Maaten
- Laurens van der Maaten's tSNE resource website
- Joseph Turian's modifications/code for tSNE
- Original whitepaper detailing architecture of SENNA by Collobert and Weston
- SENNA website
I lifted datasets from the following places:
- MNIST from Turian's github repo (link above)
- 130000 Word embeddings from Collobert's SENNA site download (link above)
- Places from hiiamrohit's countries-states-cities-database github repo
- 3000 most common words were copy and pasted from http://www.ef.com/english-resources/english-vocabulary/top-3000-words/
lein test
I got sick of starting a headless repl, so the following will start a session at port 54321.
lein nrepl
There are examples of SVG rendering presented in the core
namespace in the comments below. The gist is though, to run data through tSNE
, then pipe it into spit-svg
. Pretty straightforward!