This README framework provides a step-by-step guide for integrating R packages into the BioMANIA app. The process is streamlined for R, leveraging its capabilities for direct package usage and documentation access. Python scripts are still used for data preparation and model training.
Before starting, ensure that R and rpy2
are installed. rpy2
allows for running R within Python, bridging the two environments.
- Install R: Download and install the latest version of R from CRAN.
- Install rpy2: In your Python environment, run
pip install rpy2
Use a Python script to generate the ./data/R/{Your_R_Library_Name}/API_init.json
file. This file contains initial information about the APIs in the R package.
- Run the following command:
python dataloader/get_API_init_from_sourcecode_R.py --LIB [Your_R_Library_Name]
cp -r ./data/R/{Your_R_Library_Name}/API_init.json /data/R/{Your_R_Library_Name}/API_composite.json
Prepare the data required for the retriever model using another Python script. This process generates the instruction under ./data/R/{your_r_lib_name}/API_inquiry.json
file and ./data/R/{your_r_lib_name}/retriever_train_data/
folder.
- Execute the command:
python dataloader/prepare_retriever_data.py --LIB [Your_R_Library_Name]
Train the chat classification and retriever models. These models are crucial for the app's functionality.
- Chitchat Classification Model:
- Train the model using the following command:
python models/chitchat_classification.py --LIB ${LIB}
- Retriever Model:
- Train the model with this command (adjust the command with necessary parameters):
export LIB=scanpy
CUDA_VISIBLE_DEVICES=0
mkdir ./hugging_models/retriever_model_finetuned/${LIB}
python models/train_retriever.py \
--data_path ./data/standard_process/${LIB}/retriever_train_data/ \
--model_name bert-base-uncased \
--output_path ./hugging_models/retriever_model_finetuned/${LIB} \
--num_epochs 25 \
--train_batch_size 32 \
--learning_rate 1e-5 \
--warmup_steps 500 \
--max_seq_length 256 \
--optimize_top_k 3 \
--plot_dir ./plot/${LIB}/retriever/
Start back-end UI service with:
export LIB=ggplot2
CUDA_VISIBLE_DEVICES=0 \
python deploy/inference_dialog_server_R.py \
--retrieval_model_path ./hugging_models/retriever_model_finetuned/${LIB}/assigned \
--top_k 3
Install and start the front-end service in a new terminal with:
cd src/chatbot_ui_biomania
npm i # install
export BACKEND_URL="https://[ngrok_id].ngrok-free.app" # "http://localhost:5000";
npm run dev # run
Your chatbot server is now operational at http://localhost:3000/en
, primed to process user queries.
- Library Loading: In R, use
library(LIB)
to load packages directly. There's no need to modifyLib_cheatsheet.py
as in Python. - Documentation Access: R documentation can be accessed through
help()
,??
, or the.__doc__
attribute after converting R functions to Python viarpy2
. - Arguments Information: R documentation didn't always provide
type
information for parameters. - Simplified Process: The process for R integration is more straightforward, focusing primarily on data preparation and model training, without the need to adjust library settings extensively.
This framework outlines the key steps and differences for integrating an R package into BioMANIA. Adjust the Python commands and paths according to your package's specifics. If you have any questions or need assistance with specific steps, feel free to reach out!