This repository contains the source code for a chatbot application built using OpenAI's Completions and Embedding APIs, as well as a dataset of 257k doctor-patient dialogs. Access the chatbot application here: https://docgpt.herokuapp.com/
DocGPT is currently only a prototype. The truth value of its responses still needs to be thoroughly validated. The project was conceived of as a potential intervention to be piloted and evaluated for the final project of a graduate course on Big Data and Development taught at UChicago in Winter 2023.
The accompanying research proposal can be accessed here: https://github.com/dustinmarshall/DocGPT/blob/main/research_design.pdf
To replicate this chatbot application, follow these steps (requires a Kaggle account for access to the dataset, a Pinecone account for storing the embeddings, and a Heroku account for hosting the application):
- Clone this repository to your local machine
- Open a terminal and navigate to where the repository is saved
- Run the following code from the command line to securely save private variables associated with your Kaggle, OpenAI, and Pinecone account to your local environment:
export KAGGLE_KEY=YOUR-KEY-HERE
export OPENAI_API_KEY=YOUR-KEY-HERE
export PINECONE_API_KEY=YOUR-KEY-HERE
export PINECONE_ENVIRONMENT=YOUR-ENVIRONMENT-HERE
- To download the dataset from Kaggle, run the following code from the command line:
kaggle datasets download -d dsxavier/diagnoise-me
- To clean the doctor-patient dialog data, run the following code from the command line:
python3 /embeddings/clean_data.py
- To compute the embeddings and store them in your Pinecone Index, run the following code from the command line:
python3 /embeddings/compute_embeddings.py
- To create the app on Heroku and link it to your existing GitHub repo, run the following code:
python3 /application/create_app.py
If you have any questions or concerns, please feel free to reach out to Dustin Marshall ([email protected]).