Skip to content

batmen-lab/BioMANIA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BioMANIA Logo

BioMANIA

Demo Docker Version Paper GitHub stars License Documentation Status

Welcome to the BioMANIA! This guide provides detailed instructions on how to set up, run, and interact with the BioMANIA chatbot interface, which connects seamlessly with various APIs to deliver information across numerous libraries and frameworks.

Project Overview:

🌟 We warmly invite you to share your trained models and datasets in our issues section, making it easier for others to utilize and extend your work, thus amplifying its impact. Feel free to explore and provide feedback on tools shared by other contributors as well! 🚀🔍

We welcome 🤗 you to refer to the Q&A section if you encounter any problems during your exploration and contribute some issues for discussion! 🧐 👨‍💻

Video demo

Our demonstration showcases how to utilize a chatbot to simultaneously use scanpy and squidpy in a single conversation, including loading data, invoking functions for analysis, and presenting outputs in the form of code, images, and tables

We also offer a command-line interface (CLI) demo through the terminal.

Web access online demo

We provide Online Demo hosted on our server!

(240929-For Online Demo, note that when multiple user are using, there might be delay in connection. We will check the demo running everyday, issue (if any) will be fixed in the next day. It is recommended to ask question in English in this time, as the corpus is designed for English and thus results will be more accurate.)

Quick start

We provide several ways to run the service: python script, terminal CLI, Docker, colab demo. Among those, terminal CLI is the easiest way to start. \

Setup dataset and models

# setup the environment
pip install git+https://github.com/batmen-lab/BioMANIA.git  --index-url https://pypi.org/simple
# setup OPENAI_API_KEY
echo 'OPENAI_API_KEY="sk-proj-xxxx"' >> .env
# (optional) setup github token
echo "GITHUB_TOKEN=your_github_token" >> .env
# download data, retriever, and resources from drive, and put them to the 
# - data/standard_process/{LIB} and 
# - hugging_models/retriever_model_finetuned/{LIB} and 
# - ../../resources/
pip install gdown
gdown https://drive.google.com/uc?id=1nT28pIJ_dsdvi2yD8ffWt_aePXsSWdqI
sh download_data_model.sh
# setup the PYTHONPATH
export PYTHONPATH=$PYTHONPATH:$(pwd)

Run with terminal CLI or gradio app (stable on Linux)

# CLI service quick start!
pip install gradio
python -m BioMANIA.deploy.cli_demo
# or gradio app. (TODO 240509: Images showing are under developing!)
#python -m BioMANIA.deploy.cli_gradio

Run with Docker

For ease of use, we provide Docker image containing scanpy, squidpy, ehrapy, snapatac2. You can refer the detailed tools list from dockerhub.

# Pull back-end service and front-end UI service with:
# 241016 updated
sudo docker pull chatbotuibiomania/biomania-together:v1.1.12-cuda12.6-ubuntu22.04

Start service with

# run on gpu
sudo docker run -e LIB=scanpy -e OPENAI_API_KEY=[your_openai_api_key] -e GITHUB_TOKEN=[github_pat_xxx] --gpus all -d -p 3000:3000 chatbotuibiomania/biomania-together:v1.1.12-cuda12.6-ubuntu22.04
# or on cpu
sudo docker run -e LIB=scanpy -e OPENAI_API_KEY=[your_openai_api_key] -e GITHUB_TOKEN=[github_pat_xxx] -d -p 3000:3000 chatbotuibiomania/biomania-together:v1.1.12-cuda12.6-ubuntu22.04

Then check UI service with http://localhost:3000/en.

Important Tips for Running Docker Without Bugs:

  • To run docker on GPU, you need to install nvidia-docker and nvidia container toolkit. Run docker info | grep "Default Runtime" to check if your device can run docker with gpu.
  • Feel free to adjust the cuda image version inside the Dockerfile to configure it for different CUDA settings which is compatible for your device.

We understand the desire to run the service on a server and visualize locally. You can initiate the ngrok service by running this script on your server:

ngrok http 3000

then get the url like https://[ngrok_id].ngrok-free.app and copy it to chrome to start!

Run with script

This section is provided for user who want DIY more flexible function.

For instance, let's take scanpy as an example. Detailed library support information can be found in the Q&A

Setting up for environment

To prepare your environment for the BioMANIA project, follow these steps:

  1. Clone the repository and install dependencies:
git clone https://github.com/batmen-lab/BioMANIA.git
cd BioMANIA
conda create -n biomania python=3.9
conda activate biomania
pip install -r requirements.txt --index-url https://pypi.org/simple
export PYTHONPATH=$PYTHONPATH:$(pwd)
  1. Set up your OpenAI API key in the BioMANIA/.env file.
echo 'OPENAI_API_KEY="sk-proj-xxxx"' >> .env
  • For inference purposes, a standard OpenAI API key is sufficient.
  • If you intend to use functionalities such as instruction generation or GPT API predictions, a paid OpenAI account is required as it may reach rate limit.
  • Feel free to switch to model_name='gpt-3.5-turbo-0125' or gpt-4-0125-preview in src/models/model.py if you want.

Prepare for Data and Model

Download the necessary data and models from our Google Drive link. For those library data, you can download only the one you need.

We provide a script for downloading models and datas from Google Drive for scanpy as an example. This works if you are accessible to google.

gdown https://drive.google.com/uc?id=1nT28pIJ_dsdvi2yD8ffWt_aePXsSWdqI
sh download_data_model.sh

Organize the downloaded files at BioMANIA/data or BioMANIA/hugging_models as follows (base are necessary):

data
├── conversations
├── others-data
└── standard_process
    ├── base
    │   ├── API_composite.json
    │   └── ...
    ├── scanpy
    │   ├── API_composite.json
    │   └── ...
    ├── {LIB}
    │   ├── API_composite.json
    │   └── ...
    └── ...

hugging_models
└── retriever_model_finetuned
    ├── {LIB}
    └── ...

../../resources

By meticulously following the steps above, you'll have all the essential data and models perfectly organized for the project.

We also offer some demo chat, you can find them in ./examples. Notice that these demo chat are converted from the PyPI readthedoc tutorials. You can check the original tutorial link through the tutorial_links.txt.

Prepare for front-end UI service

This is compatible with Node.js version 19.

# Under folder BioMANIA/chatbot_ui_biomania
npm install && npm run build

Inference with pretrained models

Start both services for back-end and front-end UI with:

# Under folder `BioMANIA/`
# backend, in one terminal
python -m src.deploy.inference_dialog_server
# frontend, in another terminal
cd chatbot_ui_biomania/
npm run dev 

Your chatbot server is now operational at http://localhost:3000/en, primed to process user queries.

When selecting different libraries on the UI page, the retriever's path will automatically be changed based on the library selected

DIY

For users who wish to customize functionality more deeply, we provide a script example that demonstrates direct interaction with the BioMANIA library via a Python script. In this example, users can

  • switch different initial loaded library
  • change the llm type by either ollama supported models i.e. llama3, or openai supported models i.e. gpt-3.5-turbo
  • manage the conversation state, either continue the previous saved session, or start a new conversation This method is particularly suited for developers and researchers who want to quickly adjust and test different data processing strategies based on specific research needs.
# under BioMANIA/
from src.deploy.model import Model
conversation_started = True
model = Model(logger=None, device='cpu', model_llm_type='llama3')
user_input = "Could you load the built in dataset?"
library = "scanpy"
# for the first turn of a dialog, use conversation_started=True, then use conversation_started=False for the following dialogs
# if you want to use previous session, use the same session_id as before and conversation_started = False
model.run_pipeline(user_input, library, top_k=1, files=[], conversation_started=conversation_started, session_id="")

Build your APP!

Please refer to the separate README for tutorials that supporting converting different coding tools to our APP.

Share your APP!

If you want to share your pretrained APP to others, there are two ways.

Share docker

You can build docker and push to dockerhub, and share your docker image url in our issue. For environment setting of your tool, please refer to BioMANIA/docker_utils/{LIB}/ to add the env files, or modify the Dockerfile to build your environment.

# cd BioMANIA
sudo docker build --build-arg LIB=[your_tool_name] -t [docker_image_name] -f Dockerfile ./
# (optional)push to docker
sudo docker push [your_docker_repo]/[docker_image_name]:[tag]

Notice if you want to include some data inside the docker, please modify the Dockerfile carefully to copy the folders to /app. Also add your PyPI or Git pip install url to the requirements.txt before your packaging for docker.

Share data/models

You can just share your data and hugging_models folder and logo image by drive link to our issue.

Reference and Acknowledgments

We extend our gratitude to the following references:

Thank you for choosing BioMANIA. We hope this guide assists you in navigating through our project with ease.

Version History

  • v1.1.12 (2024-10-16)
    • Update code scripts & upload data and models & update docker which are aligned with paper.
    • Will renew the scripts for generating report, documents for Git2APP, R2APP soon.
    • Update report generation.
    • Update R2APP and Git2APP document.

view version_history for more details!

Star History

Star History Chart

Citation

Please cite our paper if you fine our data, model or code useful.

@article{dong2023biomania,
  title={BioMANIA: Simplifying bioinformatics data analysis through conversation},
  author={Dong, Zhengyuan and Zhong, Victor and Lu, Yang},
  journal={bioRxiv},
  pages={2023--10},
  year={2023},
  publisher={Cold Spring Harbor Laboratory}
}