The objective of openpharma is to provide a neutral home for open source software related to pharmaceutical industry that is not tied to one company or institution. http://openpharma.pharmaverse.org/
📨 For any questions, feel free to reach me out at the email adress : [email protected]
You are in the front-end repository of openpharma. The global project include 3 repositories :
- ⚙️ Data crawler : https://github.com/openpharma/openpharma.github.io
- 🤖 ML for search bar and data categorization : https://github.com/openpharma/openpharma_ml
- 📊 Front-end : https://github.com/openpharma/opensource_dashboard
We divided our list of packages into 5 main categories : Plots, Tables, Stats, CDISC and Utilities. For the classification, I use the title and the description of the package. To clean the data, I use the library Spacy. The classification method is based on binary matching between the list of keywords for a category and the description/title of the package.
We measure the performance using a test dataset containing 115 examples : 10 Plots, 8 Tables, 88 Stats, 2 CDISC and 15 Utilities (sum ≠ 115 bcz it's a multilabel classification). You have the accuracy on the following figure. !!! As we have a strong imbalanced dataset, accuracy is not always relevant. To have better insights, you can calculate Precision, Recall and F1-score.