Mobile-MMLU: A Mobile Intelligence Language Understanding Benchmark

Overview

Mobile-MMLU is a comprehensive benchmark designed to evaluate mobile-compatible Large Language Models (LLMs) across 80 diverse fields including Education, Healthcare, and Technology. Our benchmark is redefining mobile intelligence evaluation for a smarter future, with a focus on real-world applicability and performance metrics that matter in mobile environments.

Key Features

Comprehensive Coverage: Spans 80 distinct fields with carefully curated questions
Mobile-Optimized: Specifically designed for evaluating mobile-compatible LLMs
16,186 Questions: Extensive dataset including scenario-based questions
Rigorous Evaluation: Systematic assessment of performance, efficiency, and accuracy
Real-world Applications: Focus on practical use cases in everyday scenarios

Leaderboard

Visit our live leaderboard to see the latest performance rankings of various mobile LLMs across different categories and metrics.

Getting Started

Backends

We currently support the following backends for model inference:

hf: HF Tranformers
gptqmodel: GPTQModel for gptq quantized models

Response Generation

Install required packages:

pip install torch transformers datasets pandas tqdm

Generate responses using your model:

python generate_answers.py \
    --model_name your_model_name \
    --batch_size 32 \
    --device cuda

The script supports various arguments:

--model_name: Name or path of the model (required)
--batch_size: Batch size for processing (default: 32)
--device: Device to run the model on (default: auto = use cuda if available else cpu)
--backend: Load Model on (default: hf). Use gptqmodel for gptq quantized models.

Response Format

The script will generate a CSV file with the following format:

question_id,predicted_answer
q1,A
q2,B
q3,C
...

Each row contains:

question_id: The unique identifier for each question
predicted_answer: The model's prediction (A, B, C, or D)

Submission

After generating the CSV file with your model's predictions, submit it through our evaluation portal at link

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
generate_answers.py		generate_answers.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mobile-MMLU: A Mobile Intelligence Language Understanding Benchmark

Overview

Key Features

Leaderboard

Getting Started

Backends

Response Generation

Response Format

Submission

About

Releases

Packages

Contributors 4

Languages

License

VILA-Lab/Mobile-MMLU

Folders and files

Latest commit

History

Repository files navigation

Mobile-MMLU: A Mobile Intelligence Language Understanding Benchmark

Overview

Key Features

Leaderboard

Getting Started

Backends

Response Generation

Response Format

Submission

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages