VLMEval

An example that:

You will need to set the Team on the VLMEval target in order to build and run on macOS.

Some notes about the setup:

This downloads models from hugging face so VLMEval -> Signing & Capabilities has the "Outgoing Connections (Client)" set in the App Sandbox
VLM models are large so this uses significant memory
The example processes images and provides detailed analysis

The example application uses Qwen-VL-2B model by default, see ContentView.swift:

self.modelContainer = try await VLMModelFactory.shared.loadContainer(
    configuration: ModelRegistry.qwen2VL2BInstruct4Bit)

The application:

Downloads a sample image
Processes it through the vision language model
Describes the images based on the prompt, providing detailed analysis of the content, objects, colors, and composition.

If the program crashes with a very deep stack trace you may need to build in Release configuration. This seems to depend on the size of the model.

There are a couple options:

Build Release
Force the model evaluation to run on the main thread, e.g. using @MainActor
Build Cmlx with optimizations by modifying mlx/Package.swift and adding .unsafeOptions(["-O3"]),

You may find that running outside the debugger boosts performance. You can do this in Xcode by pressing cmd-opt-r and unchecking "Debug Executable".

Provide feedback