Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis

🚀 Introduction

Meissonic is a non-autoregressive mask image modeling text-to-image synthesis model that can generate high-resolution images. It is designed to run on consumer graphics cards.

Key Features:

🖼️ High-resolution image generation (up to 1024x1024)
💻 Designed to run on consumer GPUs
🎨 Versatile applications: text-to-image, image-to-image

🛠️ Prerequisites

Step 1: Clone the repository

git clone https://github.com/viiika/Meissonic/
cd Meissonic

Step 2: Create virtual environment

conda create --name meissonic python
conda activate meissonic
pip install -r requirements.txt

Step 3: Install diffusers

git clone https://github.com/huggingface/diffusers.git
cd diffusers
pip install -e .

💡 Usage

Gradio Web UI

python app.py

Command-line Interface

Text-to-Image Generation

python inference.py --prompt "Your creative prompt here"

Inpainting and Outpainting

python inpaint.py --mode inpaint --input_image path/to/image.jpg
python inpaint.py --mode outpaint --input_image path/to/image.jpg

Advanced: FP8 Quantization

Optimize performance with FP8 quantization:

Requirements:

CUDA 12.4
PyTorch 2.4.1
TorchAO

Note: Windows users install TorchAO using

pip install --pre torchao --index-url https://download.pytorch.org/whl/nightly/cpu

Command-line inference

python inference_fp8.py --quantization fp8

Gradio for FP8 (Select Quantization Method in Advanced settings)

python app_fp8.py

Performance Benchmarks

Precision (Steps=64, Resolution=1024x1024)	Batch Size=1 (Avg. Time)	Memory Usage
FP32	13.32s	12GB
FP16	12.35s	9.5GB
FP8	12.93s	8.7GB

🎨 Showcase

"A pillow with a picture of a Husky on it."

"A white coffee mug, a solid black background"

📚 Citation

If you find this work helpful, please consider citing:

@article{bai2024meissonic,
  title={Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis},
  author={Bai, Jinbin and Ye, Tian and Chow, Wei and Song, Enxin and Chen, Qing-Guo and Li, Xiangtai and Dong, Zhen and Zhu, Lei and Yan, Shuicheng},
  journal={arXiv preprint arXiv:2410.08261},
  year={2024}
}

🙏 Acknowledgements

We thank the community and contributors for their invaluable support in developing Meissonic. We thank apolinario@multimodal.art for making Meissonic Demo. We thank @NewGenAI and @飛鷹しずか@自称文系プログラマの勉強 for making YouTube tutorials. We thank @pprp for making fp8 and int4 quantization. We thank @camenduru for making jupyter tutorial. We thank @chenxwh for making Replicate demo and api. We thank Collov Labs for reproducing Monetico. We thank Shitong et al. for identifying effective design choices for enhancing visual quality.

Made with ❤️ by the MeissonFlow Research

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis

🚀 Introduction

🛠️ Prerequisites

Step 1: Clone the repository

Step 2: Create virtual environment

Step 3: Install diffusers

💡 Usage

Gradio Web UI

Command-line Interface

Text-to-Image Generation

Inpainting and Outpainting

Advanced: FP8 Quantization

Performance Benchmarks

🎨 Showcase

📚 Citation

🙏 Acknowledgements

Files

README.md

Latest commit

History

README.md

File metadata and controls

Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis

🚀 Introduction

🛠️ Prerequisites

Step 1: Clone the repository

Step 2: Create virtual environment

Step 3: Install diffusers

💡 Usage

Gradio Web UI

Command-line Interface

Text-to-Image Generation

Inpainting and Outpainting

Advanced: FP8 Quantization

Performance Benchmarks

🎨 Showcase

📚 Citation

🙏 Acknowledgements