Add openclip docs (#1131)

--------- Co-authored-by: Omar Sanseviero <[email protected]> Co-authored-by: Pedro Cuenca <[email protected]>
huggingface · Nov 27, 2023 · a3fb686 · a3fb686
1 parent f376c2a
commit a3fb686
Show file tree

Hide file tree

Showing 3 changed files with 79 additions and 0 deletions.
diff --git a/docs/hub/_toctree.yml b/docs/hub/_toctree.yml
@@ -75,6 +75,8 @@
         title: Keras
       - local: ml-agents
         title: ML-Agents
+      - local: open_clip
+        title: OpenCLIP
       - local: paddlenlp
         title: PaddleNLP
       - local: rl-baselines3-zoo

diff --git a/docs/hub/models-libraries.md b/docs/hub/models-libraries.md
@@ -20,6 +20,7 @@ The table below summarizes the supported libraries and their level of integratio
 | [MidiTok](https://github.com/Natooz/MidiTok)                                | Tokenizers for symbolic music / MIDI files.                                          | ❌ | ❌ | ✅ | ✅ |
 | [ML-Agents](https://github.com/huggingface/ml-agents)                       | Enables games and simulations made with Unity to serve as environments for training intelligent agents. | ❌ | ❌ | ✅ | ✅ |
 | [NeMo](https://github.com/NVIDIA/NeMo)                                      | Conversational AI toolkit built for researchers                                      | ✅ | ✅ | ✅ | ❌ |
+| [OpenCLIP](https://github.com/mlfoundations/open_clip)                      | Library for open-source implementation of OpenAI's CLIP                              | ❌ | ❌ | ✅ | ✅ |  
 | [PaddleNLP](https://github.com/PaddlePaddle/PaddleNLP)                      | Easy-to-use and powerful NLP library built on PaddlePaddle                           | ✅ | ✅ | ✅ | ✅ |
 | [Pyannote](https://github.com/pyannote/pyannote-audio)                      | Neural building blocks for speaker diarization.                                      | ❌ | ❌ | ✅ | ❌ |
 | [PyCTCDecode](https://github.com/kensho-technologies/pyctcdecode)           | Language model supported CTC decoding for speech recognition                         | ❌ | ❌ | ✅ | ❌ |

diff --git a/docs/hub/open_clip.md b/docs/hub/open_clip.md
@@ -0,0 +1,76 @@
+# Using OpenCLIP at Hugging Face
+
+[OpenCLIP](https://github.com/mlfoundations/open_clip) is an open-source implementation of OpenAI's CLIP.
+
+## Exploring OpenCLIP on the Hub
+
+You can find OpenCLIP models by filtering at the left of the [models page](https://huggingface.co/models?library=open_clip&sort=trending).
+
+OpenCLIP models hosted on the Hub have a model card with useful information about the models. Thanks to OpenCLIP Hugging Face Hub integration, you can load OpenCLIP models with a few lines of code. You can also deploy these models using [Inference Endpoints](https://huggingface.co/inference-endpoints).
+
+
+## Installation
+
+To get started, you can follow the [OpenCLIP installation guide](https://github.com/mlfoundations/open_clip#usage).
+You can also use the following one-line install through pip:
+
+```
+$ pip install open_clip_torch
+```
+
+## Using existing models
+
+All OpenCLIP models can easily be loaded from the Hub:
+
+```py
+import open_clip
+
+model, preprocess = open_clip.create_model_from_pretrained('hf-hub:laion/CLIP-ViT-g-14-laion2B-s12B-b42K')
+tokenizer = open_clip.get_tokenizer('hf-hub:laion/CLIP-ViT-g-14-laion2B-s12B-b42K')
+```
+
+Once loaded, you can encode the image and text to do [zero-shot image classification](https://huggingface.co/tasks/zero-shot-image-classification):
+
+```py
+import torch
+from PIL import Image
+
+url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
+image = Image.open(requests.get(url, stream=True).raw)
+image = preprocess(image).unsqueeze(0)
+text = tokenizer(["a diagram", "a dog", "a cat"])
+
+with torch.no_grad(), torch.cuda.amp.autocast():
+    image_features = model.encode_image(image)
+    text_features = model.encode_text(text)
+    image_features /= image_features.norm(dim=-1, keepdim=True)
+    text_features /= text_features.norm(dim=-1, keepdim=True)
+
+    text_probs = (100.0 * image_features @ text_features.T).softmax(dim=-1)
+
+print("Label probs:", text_probs) 
+```
+
+It outputs the probability of each possible class:
+
+```text
+Label probs: tensor([[0.0020, 0.0034, 0.9946]])
+```
+
+If you want to load a specific OpenCLIP model, you can click `Use in OpenCLIP` in the model card and you will be given a working snippet!
+
+<div class="flex justify-center">
+<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/openclip_repo_light.png"/>
+<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/openclip_repo.png"/>
+</div>
+<div class="flex justify-center">
+<img class="block dark:hidden" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/openclip_snippet_light.png"/>
+<img class="hidden dark:block" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/openclip_snippet.png"/>
+</div>
+
+
+## Additional resources
+
+* OpenCLIP [repository](https://github.com/mlfoundations/open_clip)
+* OpenCLIP [docs](https://github.com/mlfoundations/open_clip/tree/main/docs)
+* OpenCLIP [models in the Hub](https://huggingface.co/models?library=open_clip&sort=trending)