Sahal Shaji Mullappilly*, Mohammed Irfan K*, Sara Pieri, Fahad Khan, Rao Muhammad Anwer, Salman Khan, Timothy Baldwin, and Hisham Cholakkal
*Equally contributing first authors
Introducing BiMediX2, the first bilingual medical LLM and LMM based on Llama3.1, designed for seamless interaction in both English and Arabic. BiMediX2 facilitates a wide range of medical interactions, including multi-turn chats, multiple-choice question answering, open-ended question answering, along with the ability to understand and analyze medical images. Our model outperforms GPT-4 on English medical benchmarks and achieves state-of-the-art results in various Medical Multimodal evaluations thanks to the high-quality Arabic-English bilingual healthcare dataset and instruction sets.
Our models and datasets will be publicly released in HuggingFace 🤗.
Our contributions are as follows:
Arabic-English Bilingual Healthcare LLM
- BiMediX2 is the first medical LLM based on Llama3.1 to achieve excellent results on English, Arabic, and bilingual text-based medical LLM benchmarks.
- Our BiMediX2 LLM outperform GPT4 more than 8% on the USMLE benchmark.
- We created high-quality Arabic-English bilingual medical instruction sets using a semi-automated translation pipeline with Llama3 and GPT-3.5, complemented by manual verification. BiMediX2 is instruction-tuned with these bilingual instruction sets.
- Similar to our previous BiMediX version, BiMediX2 supports soliciting follow-up questions to gather more information about patient symptoms.
BiMediX2 VLM: Extension to Medical Image Modalities
- We created an Arabic-English instruction set with 120k image-text pairs across different medical image modalities to train BiMediX2 VLM.
- Our model supports multiple medical image modalities as input, enabling users to upload medical images and clarify their questions in both Arabic and English.
- We developed the first Arabic medical VLM evaluation benchmark that evaluates VLMs across different imaging modalities.
- Our BiMediX2 VLM achieves state-of-the-art results on medical English and Arabic VLM evaluation benchmarks.