We provide all the resources necessary to reproduce our results and evaluate any MLLMs compatible with vLLM.
Model | Repo ID in HF 🤗 | Domain | Base Model | Training Data | Evaluation Benchmark |
---|---|---|---|---|---|
AdaMLLM-med-2B | AdaptLLM/biomed-Qwen2-VL-2B-Instruct | Biomedicine | Qwen2-VL-2B-Instruct | biomed-visual-instructions | biomed-VQA-benchmark |
AdaMLLM-food-2B | AdaptLLM/food-Qwen2-VL-2B-Instruct | Food | Qwen2-VL-2B-Instruct | food-visual-instructions | food-VQA-benchmark |
AdaMLLM-med-8B | AdaptLLM/biomed-LLaVA-NeXT-Llama3-8B | Biomedicine | open-llava-next-llama3-8b | biomed-visual-instructions | biomed-VQA-benchmark |
AdaMLLM-food-8B | AdaptLLM/food-LLaVA-NeXT-Llama3-8B | Food | open-llava-next-llama3-8b | food-visual-instructions | food-VQA-benchmark |
AdaMLLM-med-11B | AdaptLLM/biomed-Llama-3.2-11B-Vision-Instruct | Biomedicine | Llama-3.2-11B-Vision-Instruct | biomed-visual-instructions | biomed-VQA-benchmark |
AdaMLLM-food-11B | AdaptLLM/food-Llama-3.2-11B-Vision-Instruct | Food | Llama-3.2-11B-Vision-Instruct | food-visual-instructions | food-VQA-benchmark |
To simplify the evaluation on domain-specific tasks, we have uploaded the templatized test sets for each task:
The dataset loading script is embedded in the inference code, so you can directly run the following commands to evaluate MLLMs.
Our code can directly evaluate models such as LLaVA-v1.6 (open-source version), Qwen2-VL, and Llama-3.2-Vision. To evaluate other MLLMs, refer to this guide for modifying the BaseTask
class in the vllm_inference/utils/task.py file. Feel free to reach out to us for assistance!
conda activate vllm
cd QA-Synthesizer/vllm_inference
RESULTS_DIR=./eval_results # Directory for saving evaluation scores
# Choose from ['med', 'PMC_VQA', 'VQA_RAD', 'SLAKE', 'PathVQA']
# 'med' runs inference on all biomedicine tasks; others run on a single task
DOMAIN='med'
# 1. LLaVA-v1.6-8B
MODEL_TYPE='llava'
MODEL=AdaptLLM/biomed-LLaVA-NeXT-Llama3-8B # HuggingFace repo ID for AdaMLLM-med-8B
OUTPUT_DIR=./output/AdaMLLM-med-LLaVA-8B_${DOMAIN}
# Run inference with data parallelism; adjust CUDA devices as needed
CUDA_VISIBLE_DEVICES='0,1,2,3,4,5,6,7' bash run_inference.sh ${MODEL} ${DOMAIN} ${MODEL_TYPE} ${OUTPUT_DIR} ${RESULTS_DIR}
# 2. Qwen2-VL-2B
MODEL_TYPE='qwen2_vl'
MODEL=Qwen/Qwen2-VL-2B-Instruct # HuggingFace repo ID for Qwen2-VL
OUTPUT_DIR=./output/Qwen2-VL-2B-Instruct_${DOMAIN}
CUDA_VISIBLE_DEVICES='0,1,2,3,4,5,6,7' bash run_inference.sh ${MODEL} ${DOMAIN} ${MODEL_TYPE} ${OUTPUT_DIR} ${RESULTS_DIR}
MODEL=AdaptLLM/biomed-Qwen2-VL-2B-Instruct # HuggingFace repo ID for AdaMLLM-med-2B
OUTPUT_DIR=./output/AdaMLLM-med-Qwen-2B_${DOMAIN}
CUDA_VISIBLE_DEVICES='0,1,2,3,4,5,6,7' bash run_inference.sh ${MODEL} ${DOMAIN} ${MODEL_TYPE} ${OUTPUT_DIR} ${RESULTS_DIR}
# 3. Llama-3.2-11B
MODEL_TYPE='mllama'
MODEL=meta-llama/Llama-3.2-11B-Vision-Instruct # HuggingFace repo ID for Llama3.2
OUTPUT_DIR=./output/Llama-3.2-11B-Vision-Instruct_${DOMAIN}
CUDA_VISIBLE_DEVICES='0,1,2,3,4,5,6,7' bash run_inference.sh ${MODEL} ${DOMAIN} ${MODEL_TYPE} ${OUTPUT_DIR} ${RESULTS_DIR}
MODEL=AdaptLLM/biomed-Llama-3.2-11B-Vision-Instruct # HuggingFace repo ID for AdaMLLM-11B
OUTPUT_DIR=./output/AdaMLLM-med-Llama3.2-11B_${DOMAIN}
CUDA_VISIBLE_DEVICES='0,1,2,3,4,5,6,7' bash run_inference.sh ${MODEL} ${DOMAIN} ${MODEL_TYPE} ${OUTPUT_DIR} ${RESULTS_DIR}
# Choose from ['food', 'Recipe1M', 'Nutrition5K', 'Food101', 'FoodSeg103']
# 'food' runs inference on all food tasks; others run on a single task
DOMAIN='food'
# 1. LLaVA-v1.6-8B
MODEL_TYPE='llava'
MODEL=AdaptLLM/food-LLaVA-NeXT-Llama3-8B # HuggingFace repo ID for AdaMLLM-food-8B
OUTPUT_DIR=./output/AdaMLLM-food-LLaVA-8B_${DOMAIN}
CUDA_VISIBLE_DEVICES='0,1,2,3,4,5,6,7' bash run_inference.sh ${MODEL} ${DOMAIN} ${MODEL_TYPE} ${OUTPUT_DIR} ${RESULTS_DIR}
# 2. Qwen2-VL-2B
MODEL_TYPE='qwen2_vl'
MODEL=Qwen/Qwen2-VL-2B-Instruct # HuggingFace repo ID for Qwen2-VL
OUTPUT_DIR=./output/Qwen2-VL-2B-Instruct_${DOMAIN}
CUDA_VISIBLE_DEVICES='0,1,2,3,4,5,6,7' bash run_inference.sh ${MODEL} ${DOMAIN} ${MODEL_TYPE} ${OUTPUT_DIR} ${RESULTS_DIR}
MODEL=AdaptLLM/food-Qwen2-VL-2B-Instruct # HuggingFace repo ID for AdaMLLM-food-2B
OUTPUT_DIR=./output/AdaMLLM-food-Qwen-2B_${DOMAIN}
CUDA_VISIBLE_DEVICES='0,1,2,3,4,5,6,7' bash run_inference.sh ${MODEL} ${DOMAIN} ${MODEL_TYPE} ${OUTPUT_DIR} ${RESULTS_DIR}
# 3. Llama-3.2-11B
MODEL_TYPE='mllama'
MODEL=meta-llama/Llama-3.2-11B-Vision-Instruct # HuggingFace repo ID for Llama3.2
OUTPUT_DIR=./output/Llama-3.2-11B-Vision-Instruct_${DOMAIN}
CUDA_VISIBLE_DEVICES='0,1,2,3,4,5,6,7' bash run_inference.sh ${MODEL} ${DOMAIN} ${MODEL_TYPE} ${OUTPUT_DIR} ${RESULTS_DIR}
MODEL=AdaptLLM/food-Llama-3.2-11B-Vision-Instruct # HuggingFace repo ID for AdaMLLM-food-11B
OUTPUT_DIR=./output/AdaMLLM-food-Llama3.2-2B_${DOMAIN}
CUDA_VISIBLE_DEVICES='0,1,2,3,4,5,6,7' bash run_inference.sh ${MODEL} ${DOMAIN} ${MODEL_TYPE} ${OUTPUT_DIR} ${RESULTS_DIR}
The evaluation results are stored in ./eval_results
, and the model prediction outputs are in ./output
.