- installed dependencies
tensorflow
andopencv
libraries.
pip install tensorflow
conda install opencv
- correct
PYTHONPATH=full_path_to_folder/scr
- pretrained FaceNet model
in the folder
./models/
. You can download it here. - images dataset in the
./data/folder_with_images/
. I used LFW dataset. Unzip it to./data/lfw/
. The directory should be in the correct openface format:
my_database
└───a_person
│ │ image00.jpg
│ │ image01.jpg
│
└───b_person
│ │ image00.jpg
│
└───c_person
│ │ image00.jpg
│ │ image01.jpg
│ │ image02.jpg
│ │ image03.jpg
- Clean folders
./np_embeddings
,./data/clustered
,./data/sorted
-
Run
export_embeddings.py
This will generate embeddings and labels for images.
To make it working addPYTHONPATH=full_path_to_folder/scr
to you sources. If you are using PyCharm simply add to Enviroment VarianblesPYTHONPATH
valuefull_path_to_folder/scr
. -
Run
Distance_matrix.py
.
This step will give you the matrix with all of the Euclidean distances between faces. Numpy array is saved to./np_embeddings/embeddings.npy
It will take some time because the matrix size ofNxN
with zeros on the main diagonal. This file is quiet big - 1.4GB. We need it to sort faces.
- Run
Cluster_faces.py
.
First, it will sort all off the face images based on the closest distance and save sorted images to ./data/sorted. Second, it will cluster images using Kmeans algorithm. Number of cluster by default is 30. You can change it if you like.
This is what you get in the end. KMeans does a pretty good job to cluster 128-dimentional image embeddings.
This work is based on FaceNet achievement. You can check FaceNet model and papers here.