Skip to content

Latest commit

 

History

History
81 lines (59 loc) · 3.52 KB

README.md

File metadata and controls

81 lines (59 loc) · 3.52 KB

Owl

Mobile-MMLU: A Mobile Intelligence Language Understanding Benchmark

Overview

Mobile-MMLU is a comprehensive benchmark designed to evaluate mobile-compatible Large Language Models (LLMs) across 80 diverse fields including Education, Healthcare, and Technology. Our benchmark is redefining mobile intelligence evaluation for a smarter future, with a focus on real-world applicability and performance metrics that matter in mobile environments.

Key Features

  • Comprehensive Coverage: Spans 80 distinct fields with carefully curated questions
  • Mobile-Optimized: Specifically designed for evaluating mobile-compatible LLMs
  • 16,186 Questions: Extensive dataset including scenario-based questions
  • Rigorous Evaluation: Systematic assessment of performance, efficiency, and accuracy
  • Real-world Applications: Focus on practical use cases in everyday scenarios

Leaderboard

Visit our live leaderboard to see the latest performance rankings of various mobile LLMs across different categories and metrics.

Getting Started

Backends

We currently support the following backends for model inference:

Response Generation

  1. Install required packages:
pip install torch transformers datasets pandas tqdm
  1. Generate responses using your model:
python generate_answers.py \
    --model_name your_model_name \
    --batch_size 32 \
    --device cuda

The script supports various arguments:

  • --model_name: Name or path of the model (required)
  • --batch_size: Batch size for processing (default: 32)
  • --device: Device to run the model on (default: auto = use cuda if available else cpu)
  • --backend: Load Model on (default: hf). Use gptqmodel for gptq quantized models.

Response Format

The script will generate a CSV file with the following format:

question_id,predicted_answer
q1,A
q2,B
q3,C
...

Each row contains:

  • question_id: The unique identifier for each question
  • predicted_answer: The model's prediction (A, B, C, or D)

Submission

  1. After generating the CSV file with your model's predictions, submit it through our evaluation portal at link