Mobile-MMLU is a comprehensive benchmark designed to evaluate mobile-compatible Large Language Models (LLMs) across 80 diverse fields including Education, Healthcare, and Technology. Our benchmark is redefining mobile intelligence evaluation for a smarter future, with a focus on real-world applicability and performance metrics that matter in mobile environments.
- Comprehensive Coverage: Spans 80 distinct fields with carefully curated questions
- Mobile-Optimized: Specifically designed for evaluating mobile-compatible LLMs
- 16,186 Questions: Extensive dataset including scenario-based questions
- Rigorous Evaluation: Systematic assessment of performance, efficiency, and accuracy
- Real-world Applications: Focus on practical use cases in everyday scenarios
Visit our live leaderboard to see the latest performance rankings of various mobile LLMs across different categories and metrics.
We currently support the following backends
for model inference:
hf
: HF Tranformersgptqmodel
: GPTQModel for gptq quantized models
- Install required packages:
pip install torch transformers datasets pandas tqdm
- Generate responses using your model:
python generate_answers.py \
--model_name your_model_name \
--batch_size 32 \
--device cuda
The script supports various arguments:
--model_name
: Name or path of the model (required)--batch_size
: Batch size for processing (default: 32)--device
: Device to run the model on (default:auto
= use cuda if available else cpu)--backend
: Load Model on (default:hf
). Usegptqmodel
for gptq quantized models.
The script will generate a CSV file with the following format:
question_id,predicted_answer
q1,A
q2,B
q3,C
...
Each row contains:
question_id
: The unique identifier for each questionpredicted_answer
: The model's prediction (A, B, C, or D)
- After generating the CSV file with your model's predictions, submit it through our evaluation portal at link