Using Genetic programming to optimize the distance function for Clustering with k-means. The algorithm present in this repository was able to improve the results by up to 458% (v_score) in relation to the commonly used Euclidean distance. The algorithm is capable of adapting to any dataset with little customization.
- Pytho 3.8.2
- Venv (sudo apt install python3.8-venv)
- pip (sudo apt install python3-pip)
- Create enviroment:
python3 -m venv ./code/venv
- Activate enviroment:
cd code && source venv/bin/activate
- Install requirements:
pip install -r requirements.txt
The optimized parameters are at the beginning of the main file.
- Activate enviroment:
cd code && source venv/bin/activate
- Execute:
python3 main.py
- Exit:
deactivate
When executing the algorithm, the function of the best individual will be printed using variables x, y and z, in addition to the individual v-scored on test of the chosen dataset. A training_v-score.json file will also be generated, where all training results per generation will be recorded with the following order: best value, the average and the worst value.