Human-Centric-MLLM (HERM)

A Multimodal Large Language Model on human-centric tasks.

Introduction

By tuning on self-created human-centric annotations, our model can excel in a wide range of human-centric vision-language tasks, greatly surpassing the existing MLLMs on human-centric understanding.

Installation

Pre-requisites: Python 3.10, CUDA>=11.6 (We used 11.7)
Install PyTorch

pip install torch==2.0.0 torchvision==0.15.1 torchaudio==2.0.1

Install Flash-attention
Install the required packages:

pip install -r requirements.txt

Dataset Preparations

TBD

Training

We conduct a two-stage training: The first stage is pre-training on human-centric caption and grounding tasks, and the second stage is instruction tuning on free-style human-centric question-answering pairs.

Stage 1: Pre-training Set your configurations in train_configs/hcm_multitask/minigptv2_hcm_multitask.yaml

CUDA_VISIBLE_DEVICES=<your device numbers> torchrun \
  --master_port <your port> --nproc_per_node <your process numbers> \
  train.py --cfg-path train_configs/hcm_multitask/minigptv2_hcm_multitask.yaml

Stage 2: Instruction tuning Set your configurations in train_configs/hcm_multitask/minigptv2_hcm_instruct_tuning.yaml

CUDA_VISIBLE_DEVICES=<your device numbers> torchrun \
  --master_port <your port> --nproc_per_node <your process numbers> \
  train.py --cfg-path train_configs/hcm_multitask/minigptv2_hcm_instruct_tuning.yaml

Both single-gpu and multi-gpu training is supported.

Inference

We support batched inference on your own data. First, set your dataset and other configurations in eval_configs/minigptv2_free_evaluation.yaml Then, run the inference code:

CUDA_VISIBLE_DEVICES=<your device numbers> torchrun \
  --master_port <your port> --nproc_per_node <your process numbers> \
  eval_inference.py --cfg-path eval_configs/minigptv2_free_evaluation.yaml

Currently, we only support inference on a single GPU.

Citation

This project is based on the awesome codebase of MiniGPT-v2: Large Language Model as a Unified Interface for Vision-Language Multi-task Learning

Name		Name	Last commit message	Last commit date
Latest commit History 282 Commits
dataset		dataset
eval_configs		eval_configs
eval_scripts		eval_scripts
evaluation		evaluation
figs		figs
minigpt4		minigpt4
prompts		prompts
train_configs		train_configs
.gitignore		.gitignore
README.md		README.md
dataset_filter.py		dataset_filter.py
dataset_stats.py		dataset_stats.py
demo.py		demo.py
demo_v2.py		demo_v2.py
eval_inference.py		eval_inference.py
eval_mmbench.py		eval_mmbench.py
eval_ref.py		eval_ref.py
eval_refseg.py		eval_refseg.py
inference.py		inference.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Human-Centric-MLLM (HERM)

Introduction

Installation

Dataset Preparations

Training

Inference

Citation

About

Releases

Packages

Contributors 13

Languages

ZJHTerry18/Human-Centric-MLLM

Folders and files

Latest commit

History

Repository files navigation

Human-Centric-MLLM (HERM)

Introduction

Installation

Dataset Preparations

Training

Inference

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 13

Languages

Packages