From 01d03838ce462dacb0e5656727f78af9e466d4c5 Mon Sep 17 00:00:00 2001 From: danieleguido Date: Tue, 29 Oct 2024 10:29:52 +0000 Subject: [PATCH] Apply automatic changes --- ...nguage-identification-with-impresso-hf.mdx | 103 +++++++++++------- .../ne-processing-with-impresso-api.mdx | 77 +++++++------ .../ne-processing-with-impresso-hf.mdx | 12 +- ...newsagency-processing-with-impresso-hf.mdx | 44 +++++--- .../search-multilingual-docs-impresso-hf.mdx | 6 +- 5 files changed, 141 insertions(+), 101 deletions(-) diff --git a/src/content/notebooks/language-identification-with-impresso-hf.mdx b/src/content/notebooks/language-identification-with-impresso-hf.mdx index 35d6497..45d521e 100644 --- a/src/content/notebooks/language-identification-with-impresso-hf.mdx +++ b/src/content/notebooks/language-identification-with-impresso-hf.mdx @@ -3,8 +3,8 @@ title: Language Identification using Floret githubUrl: https://github.com/impresso/impresso-datalab-notebooks/blob/main/annotate/language-identification_ImpressoHF.ipynb authors: - impresso-team -sha: d56a2704ce8f28179bd1c7ce592b0e55b6ac5866 -date: 2024-10-24T16:21:08Z +sha: 7b580a8b2a279aa6244afe7ff5d6dbc54fb11ac4 +date: 2024-10-27T14:04:06Z googleColabUrl: https://colab.research.google.com/github/impresso/impresso-datalab-notebooks/blob/main/annotate/language-identification_ImpressoHF.ipynb links: [] excerpt: This notebook demonstrates language identification using a pre-trained @@ -17,34 +17,49 @@ excerpt: This notebook demonstrates language identification using a pre-trained --- {/* cell:0 cell_type:markdown */} -Open In Colab +Open In Colab {/* cell:1 cell_type:markdown */} + +This notebook demonstrates how to use a pre-trained Floret language identification model downloaded from Hugging Face. +We'll load the model, input some text, and predict the language of the text. + +## What is this notebook about? +This notebook provides a hands-on demonstration of **language identification** (LID) using our Impresso LID model from Hugging Face. We will explore how to download and use this model to predict the language of Impresso-like text inputs. This notebook walks through the necessary steps to set up dependencies, load the model, and implement it for practical language identification tasks. + +## What will you learn in this notebook? +By the end of this notebook, you will: +- Understand how to install and configure the required libraries (`floret` and `huggingface_hub`). +- Learn to load our trained Floret language identification model from Hugging Face. +- Run the model to predict the dominant language (or the mix of languages) of a given text input. +- Gain insight into the core functionality of language identification using machine learning models. + +{/* cell:2 cell_type:markdown */} ## 1. Install Dependencies First, we need to install `floret` and `huggingface_hub` to work with the Floret language identification model and Hugging Face. -{/* cell:2 cell_type:code */} +{/* cell:3 cell_type:code */} ```python !pip install floret !pip install huggingface_hub ``` -{/* cell:3 cell_type:markdown */} +{/* cell:4 cell_type:markdown */} ## 2. Model Information In this example, we are using a language identification model hosted on the Hugging Face Hub: `impresso-project/impresso-floret-langident`. The model can predict the language of a given text of a reasonable length and supports the main impresso languages: German (de), French (fr), Luxemburgish (lb), Italian (it), English (en) -{/* cell:4 cell_type:markdown */} +{/* cell:5 cell_type:markdown */} ## 3. Defining the FloretLangIdentifier Class This class downloads the Floret model from Hugging Face and loads it for prediction. We use `huggingface_hub` to download the model locally. -{/* cell:5 cell_type:code */} +{/* cell:6 cell_type:code */} ```python from huggingface_hub import hf_hub_download import floret @@ -130,16 +145,16 @@ class FloretLangIdentifier: return None ``` -{/* cell:6 cell_type:markdown */} +{/* cell:7 cell_type:markdown */} ## 4. Using the Model for Prediction Now that the model is loaded, you can input your own text and predict the language. -{/* cell:7 cell_type:markdown */} +{/* cell:8 cell_type:markdown */} ### 4.1 Predict the main language of a document -{/* cell:8 cell_type:code */} +{/* cell:9 cell_type:code */} ```python # Define the repository and model file repo_id = "impresso-project/impresso-floret-langident" @@ -156,11 +171,11 @@ result = model.predict_language(text) print("Language:", result) ``` -{/* cell:9 cell_type:markdown */} +{/* cell:10 cell_type:markdown */} ### 4.2 Predict the language mix of a document -{/* cell:10 cell_type:code */} +{/* cell:11 cell_type:code */} ```python # Multi-output for predicting mixed-language documents # Example text for prediction @@ -171,10 +186,24 @@ result = model.predict_language_mix(text) print("Language mix:", result) ``` -{/* cell:11 cell_type:markdown */} -### 4.3 Interactive mode +{/* cell:12 cell_type:markdown */} +### 4.3 Predict the language mix of an impresso document + -{/* cell:12 cell_type:code */} +{/* cell:13 cell_type:code */} +```python +# source: https://impresso-project.ch/app/issue/onsjongen-1945-03-03-a/view?p=1&articleId=i0001&text=1 +text = " Lëtzeburger Zaldoten traine'èren an England Soldats luxembourgeois à l’entraînement en Angleterre" + +# Predict the language +result = model.predict_language_mix(text) +print("Language mix:", result) +``` + +{/* cell:14 cell_type:markdown */} +### 4.4 Interactive mode + +{/* cell:15 cell_type:code */} ```python # Interactive text input text = input("Enter a sentence for language identification: ") @@ -182,18 +211,18 @@ result = model.predict_language_mix(text) print("Prediction Result:", result) ``` -{/* cell:13 cell_type:markdown */} -## 4. Why is Language identification important? An example +{/* cell:16 cell_type:markdown */} +## 5. Why is Language identification important? An example Many NLP models are trained on data from certain languages. For applying any further NLP processing, we often need to know the language. Let us visit a concrete example: Say that we want to count the nouns in a text. For this we load a NLP-processor from the popular spacy-library, that (i.a.) splits the text and tags our words with so-called part-of-speech-tags. -{/* cell:14 cell_type:markdown */} -### 4.1 Build a simple Noun counter class +{/* cell:17 cell_type:markdown */} +### 5.1 Build a simple Noun counter class -{/* cell:15 cell_type:code */} +{/* cell:18 cell_type:code */} ```python class NounCounter: @@ -224,10 +253,10 @@ class NounCounter: return noun_count ``` -{/* cell:16 cell_type:markdown */} -### 4.2 Noun counter: A first naive test +{/* cell:19 cell_type:markdown */} +### 5.2 Noun counter: A first naive test -{/* cell:17 cell_type:code */} +{/* cell:20 cell_type:code */} ```python # Example text for prediction text = "Das ist ein Testdokument. Ein Mann geht mit einem Hund im Park spazieren." @@ -245,15 +274,15 @@ counter = NounCounter(nlp) print("Text: \"{}\"\nNoun-count: {}".format(text, counter.count_nouns(text))) ``` -{/* cell:18 cell_type:markdown */} -### 4.3 Noun counter: A second test +{/* cell:21 cell_type:markdown */} +### 5.3 Noun counter: A second test -{/* cell:19 cell_type:markdown */} +{/* cell:22 cell_type:markdown */} Now let us assume that we would know the language of the input document: German. This would let us load a default German spacy model. -{/* cell:20 cell_type:code */} +{/* cell:23 cell_type:code */} ```python # Need to download the German model spacy.cli.download("de_core_news_sm") @@ -268,14 +297,14 @@ counter = NounCounter(nlp) print("Text: \"{}\"\nNoun-count: {}".format(text, counter.count_nouns(text))) ``` -{/* cell:21 cell_type:markdown */} -### 4.4 Noun counter: Combining our knowledge +{/* cell:24 cell_type:markdown */} +### 5.4 Noun counter: Combining our knowledge -{/* cell:22 cell_type:markdown */} +{/* cell:25 cell_type:markdown */} We use our insights to build a language-informed spacy loader that uses our language identifier! -{/* cell:23 cell_type:code */} +{/* cell:26 cell_type:code */} ```python class LanguageAwareSpacyLoader: @@ -322,10 +351,10 @@ class LanguageAwareSpacyLoader: ``` -{/* cell:24 cell_type:markdown */} +{/* cell:27 cell_type:markdown */} Let's try it -{/* cell:25 cell_type:code */} +{/* cell:28 cell_type:code */} ```python # We initialize our language aware spacy loader loader = LanguageAwareSpacyLoader(model) @@ -340,11 +369,11 @@ counter = NounCounter(nlp) print("Noun-count: {}".format(counter.count_nouns(text))) ``` -{/* cell:26 cell_type:markdown */} +{/* cell:29 cell_type:markdown */} Let's start the interactive mode again. Input any text in some language, and the two-step model (lang-id + nlp) will count its nouns. -{/* cell:27 cell_type:code */} +{/* cell:30 cell_type:code */} ```python text = input("Enter a sentence for Noun counting: ") nlp = loader.load(text) @@ -352,8 +381,8 @@ counter = NounCounter(nlp) print("Noun-count: {}".format(counter.count_nouns(text))) ``` -{/* cell:28 cell_type:markdown */} -## 5. Summary and Next Steps +{/* cell:31 cell_type:markdown */} +## 6. Summary and Next Steps In this notebook, we used a pre-trained Floret language identification model to predict the language of input text. You can modify the input or explore other models from Hugging Face. diff --git a/src/content/notebooks/ne-processing-with-impresso-api.mdx b/src/content/notebooks/ne-processing-with-impresso-api.mdx index d366b41..226bc73 100644 --- a/src/content/notebooks/ne-processing-with-impresso-api.mdx +++ b/src/content/notebooks/ne-processing-with-impresso-api.mdx @@ -3,8 +3,8 @@ title: Named Entity Processing with Impresso Models through the Impresso API githubUrl: https://github.com/impresso/impresso-datalab-notebooks/blob/main/annotate/NE-processing_ImpressoAPI.ipynb authors: - impresso-team -sha: 44a3c9f14c74807de3722878701d97ed71fa3e05 -date: 2024-10-25T14:18:01Z +sha: cc20b1b70db4da2aea4042c0e8d82a52e6ffb762 +date: 2024-10-27T13:47:15Z googleColabUrl: https://colab.research.google.com/github/impresso/impresso-datalab-notebooks/blob/main/annotate/NE-processing_ImpressoAPI.ipynb links: - href: https://github.com/impresso/impresso-datalab-notebooks/blob/main/annotate/NE-processing_ImpressoHF.ipynb @@ -19,43 +19,49 @@ seealso: {/* cell:0 cell_type:markdown */} + + Open In Colab + + +{/* cell:1 cell_type:markdown */} ## What is this notebook about? -This notebook is similar to the [NE-processing_ImpressoHF](https://github.com/impresso/impresso-datalab-notebooks/blob/main/annotate/NE-processing_ImpressoHF.ipynb) one, except that instead of loading the model from Hugging Face and executing them locally (or on Colab), here we use the annotation functionalities provided by the Impresso API, using the Impresso Python Library. Behind the scene the same models are used. +This notebook is similar to the [NE-processing_ImpressoHF](https://github.com/impresso/impresso-datalab-notebooks/blob/main/annotate/NE-processing_ImpressoHF.ipynb) one, except that instead of loading the model from Hugging Face and executing them locally (or on Colab), here we use the annotation functionalities provided by the Impresso API, using the Impresso Python Library. Behind the scene the same models are used. For more information on the models, please refer to the [NE-processing_ImpressoHF](https://github.com/impresso/impresso-datalab-notebooks/blob/main/annotate/NE-processing_ImpressoHF.ipynb) notebook (we advised starting with it). For an introduction to the Impresso Python Library, please refer to the [basics_ImpressoAPI](https://github.com/impresso/impresso-datalab-notebooks/blob/main/starter/basics_ImpressoAPI.ipynb). ## What will you learn in this notebook? - By the end of this notebook, you will know how to call the NER and EL Impresso annotation services through the Impresso API, using the Impresso Python Library -{/* cell:1 cell_type:code */} - +{/* cell:2 cell_type:code */} ```python !pip install --upgrade --force-reinstall impresso from impresso import version print(version) ``` -{/* cell:2 cell_type:code */} - +{/* cell:3 cell_type:code */} ```python from impresso import connect impresso_session = connect() ``` -{/* cell:3 cell_type:markdown */} - +{/* cell:4 cell_type:markdown */} ## Named entity recognition -{/* cell:4 cell_type:code */} - +{/* cell:5 cell_type:code */} ```python -text = """ -Hugging Face will offer the product through Amazon and Google's cloud computing services for $1 per hour and on Digital Ocean, a specialty cloud computing company. Companies will also be able to download the Hugging Face offering to run in their own data centers. -""" +# We define some test input +text = """In the year 1789, King Louis XVI, ruler of France, convened the Estates-General at the Palace of Versailles, + where Marie Antoinette, the Queen of France, alongside Maximilien Robespierre, a leading member of the National Assembly, + debated with Jean-Jacques Rousseau, the famous philosopher, and Charles de Talleyrand, the Bishop of Autun, + regarding the future of the French monarchy. At the same time, across the Atlantic in Philadelphia, + George Washington, the first President of the United States, and Thomas Jefferson, the nation's Secretary of State, + were drafting policies for the newly established American government following the signing of the Constitution.""" + +print(text) result = impresso_session.tools.ner( text=text @@ -64,46 +70,39 @@ result = impresso_session.tools.ner( result.df.tail(10) ``` -{/* cell:5 cell_type:markdown */} - +{/* cell:6 cell_type:markdown */} ## Named entity linking -{/* cell:6 cell_type:code */} - +{/* cell:7 cell_type:code */} ```python -text = """ -Hugging Face will offer the product through [START] Amazon [END] and Google's cloud computing services for $1 per hour and on Digital Ocean, a specialty cloud computing company. Companies will also be able to download the Hugging Face offering to run in their own data centers. -""" -result = impresso_session.tools.nel( - text=text -) -result -``` +# We define some test input +text_with_markers = """In the year 1789, King Louis XVI, ruler of France, convened the Estates-General at the Palace of Versailles, + where [START] Marie Antoinette, the Queen of France [END], alongside Maximilien Robespierre, a leading member of the National Assembly, + debated with Jean-Jacques Rousseau, the famous philosopher, and Charles de Talleyrand, the Bishop of Autun, + regarding the future of the French monarchy. At the same time, across the Atlantic in Philadelphia, + George Washington, the first President of the United States, and Thomas Jefferson, the nation's Secretary of State, + were drafting policies for the newly established American government following the signing of the Constitution.""" -{/* cell:7 cell_type:code */} +print(text_with_markers) -```python -text = """ - Hugging Face proposera le produit via les services de cloud computing d'[START] Amazon [END] et de Google pour 1 dollar par heure, ainsi que sur Digital Ocean, une entreprise spécialisée dans le cloud computing. Les entreprises pourront également télécharger l'offre de Hugging Face pour l'exécuter dans leurs propres centres de données. - """ result = impresso_session.tools.nel( - text=text + text=text_with_markers ) -result.df +result ``` {/* cell:8 cell_type:markdown */} - ## Named entity processing {/* cell:9 cell_type:code */} - ```python -text = """ -Hugging Face will offer the product through Amazon and Google's cloud computing services for $1 per hour and on Digital Ocean, a specialty cloud computing company. Companies will also be able to download the Hugging Face offering to run in their own data centers. -""" result = impresso_session.tools.ner_nel( text=text ) result.df ``` + +{/* cell:10 cell_type:code */} +```python + +``` diff --git a/src/content/notebooks/ne-processing-with-impresso-hf.mdx b/src/content/notebooks/ne-processing-with-impresso-hf.mdx index 28c6ecd..3a2e226 100644 --- a/src/content/notebooks/ne-processing-with-impresso-hf.mdx +++ b/src/content/notebooks/ne-processing-with-impresso-hf.mdx @@ -7,8 +7,8 @@ excerpt: Trained on the [HIPE 2020](https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-hipe2020.md) dataset, the Impresso models recognize both coarse and fine-grained named entities, linking mentions to knowledge bases when possible. -sha: dd13ddcc0ba2f4a2b24face9790c46595dc2ebca -date: 2024-10-27T13:19:55Z +sha: cc20b1b70db4da2aea4042c0e8d82a52e6ffb762 +date: 2024-10-27T13:47:15Z googleColabUrl: https://colab.research.google.com/github/impresso/impresso-datalab-notebooks/blob/main/annotate/NE-processing_ImpressoHF.ipynb seealso: - ne-processing-with-impresso-api @@ -19,6 +19,10 @@ links: label: HIPE-2022 dataset - href: https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-hipe2020.md label: HIPE typology + - href: https://huggingface.co/dbmdz/bert-medium-historic-multilingual-cased + label: dbmdz/bert-medium-historic-multilingual-cased + - href: https://huggingface.co/facebook/mgenre-wiki + label: mGENRE, - href: https://huggingface.co/spaces/impresso-project/multilingual-named-entity-recognition label: NER space - href: https://huggingface.co/spaces/impresso-project/multilingual-entity-linking @@ -44,9 +48,9 @@ NER detects and classifies entities such as persons, locations, and organization In this notebook, both NER and EL are performed using models trained by Impresso and hosted on [Hugging Face](https://huggingface.co/impresso-project/) (thus the 'HF' suffix in the notebook name): -- The **Impresso NER model** is a Transformer model trained on the Impresso HIPE-2020 portion of the [HIPE-2022 dataset](https://github.com/hipe-eval/HIPE-2022-data). It recognizes entity types such as person, location, and organization while supporting the complete [HIPE typology](https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-hipe2020.md), including coarse and fine-grained entity types as well as components like names, titles, and roles. +- The **Impresso NER model** is a Transformer model trained on the Impresso HIPE-2020 portion of the [HIPE-2022 dataset](https://github.com/hipe-eval/HIPE-2022-data). It recognizes entity types such as person, location, and organization while supporting the complete [HIPE typology](https://github.com/hipe-eval/HIPE-2022-data/blob/main/documentation/README-hipe2020.md), including coarse and fine-grained entity types as well as components like names, titles, and roles. Additionally, the NER model's backbone ([dbmdz/bert-medium-historic-multilingual-cased](https://huggingface.co/dbmdz/bert-medium-historic-multilingual-cased)) was trained on various European historical datasets, giving it a broader language capability. This training included data from the Europeana and British Library collections across multiple languages: German, French, English, Finnish, and Swedish. Due to this multilingual backbone, the NER model may also recognize entities in other languages beyond French and German. -- The **Impresso NEL model** links detected entities to unique identifiers in Wikipedia and Wikidata or assigns a 'NIL' label (indicating "not in list" in NLP) if no reference is found. +- The **Impresso NEL model** links detected entities to unique identifiers in Wikipedia and Wikidata or assigns a 'NIL' label (indicating "not in list" in NLP) if no reference is found. The NEL model was trained on various historical datasets (AjMC, CLEF-HIPE-2020, LeTemps, Living with Machines, NewsEye, SoNAR) across multiple languages, including German, French, English, Finnish, and Swedish, to support comprehensive entity linking (EL) and named entity recognition (NER). Its backbone, [mGENRE,](https://huggingface.co/facebook/mgenre-wiki) uses a multilingual text generation approach for Wikipedia entity prediction, trained on 105 languages from Wikipedia. Both models can also be tested interactively in Hugging Face spaces: the [NER space](https://huggingface.co/spaces/impresso-project/multilingual-named-entity-recognition) and the [EL space](https://huggingface.co/spaces/impresso-project/multilingual-entity-linking). diff --git a/src/content/notebooks/newsagency-processing-with-impresso-hf.mdx b/src/content/notebooks/newsagency-processing-with-impresso-hf.mdx index 7f0086c..310525a 100644 --- a/src/content/notebooks/newsagency-processing-with-impresso-hf.mdx +++ b/src/content/notebooks/newsagency-processing-with-impresso-hf.mdx @@ -5,8 +5,8 @@ excerpt: Impresso BERT-based pipeline, trained on Swiss and Luxembourgish newspapers from the Impresso project, identifies and links news agencies in historical articles. This notebook guides you through setting up a workflow to detect these entities in your own text. -sha: cc67ffaed97d3d02719878f01ffdcadce3def6a1 -date: 2024-10-24T19:33:03Z +sha: cc20b1b70db4da2aea4042c0e8d82a52e6ffb762 +date: 2024-10-27T13:47:15Z googleColabUrl: https://colab.research.google.com/github/impresso/impresso-datalab-notebooks/blob/main/annotate/newsagency-processing_ImpressoHF.ipynb authors: - impresso-team @@ -28,43 +28,50 @@ seealso: --- {/* cell:0 cell_type:markdown */} + + +{/* cell:1 cell_type:markdown */} Delivering swift and reliable news since the 1830s and 1840s, news agencies have played a pivotal role both nationally and internationally. However, understanding their precise impact on shaping news content has remained somewhat elusive. Our goal is to illuminate this aspect by identifying news agencies within historical newspaper articles. Using data from newspapers in Switzerland and Luxembourg as part of the Impresso project, we've trained our pipeline to recognize these entities. If you're here, you likely seek to detect news agency entities in your own text. This notebook will guide you through the process of setting up a workflow to identify specific newspaper or agency mentions within your text. You can also access our [News Agency Recognition](https://huggingface.co/spaces/impresso-project/multilingual-news-agency-recognition) demo app through [HuggingFace Spaces](https://huggingface.co/docs/hub/en/spaces). -{/* cell:1 cell_type:markdown */} +{/* cell:2 cell_type:markdown */} Install necessary libraries (if not already installed) and download the necessary NLTK data. -{/* cell:2 cell_type:code */} +{/* cell:3 cell_type:code */} ```python !pip install transformers !pip install stopwordsiso !pip install nltk ``` -{/* cell:3 cell_type:markdown */} +{/* cell:4 cell_type:markdown */} Now the fun part, this function will download the requried model and gives you the keys to successfullly detect news agencies in your text. -{/* cell:4 cell_type:code */} +{/* cell:5 cell_type:code */} ```python from transformers import pipeline -newsagency_ner_pipeline = pipeline("newsagency-ner", model="impresso-project/ner-newsagency-bert-multilingual", trust_remote_code=True) +newsagency_ner_pipeline = pipeline("newsagency-ner", + model="impresso-project/ner-newsagency-bert-multilingual", + trust_remote_code=True, + device='cpu') ``` -{/* cell:5 cell_type:markdown */} +{/* cell:6 cell_type:markdown */} Run the example below to see how it works. -{/* cell:6 cell_type:code */} +{/* cell:7 cell_type:code */} ```python -sentence = """Apple est créée le 1er avril 1976 dans le garage de la maison - d'enfance de Steve Jobs à Los Altos en Californie par Steve Jobs, Steve Wozniak - et Ronald Wayne, puis constituée sous forme de société le 3 janvier 1977 à l'origine - sous le nom d'Apple Computer, mais pour ses 30 ans et pour refléter la diversification - de ses produits, le mot « computer » est retiré le 9 janvier 2015. (Reuter)""" +sentence = """In the year 1789, King Louis XVI, ruler of France, convened the Estates-General at the Palace of Versailles, + where Marie Antoinette, the Queen of France, alongside Maximilien Robespierre, a leading member of the National Assembly, + debated with Jean-Jacques Rousseau, the famous philosopher, and Charles de Talleyrand, the Bishop of Autun, + regarding the future of the French monarchy. At the same time, across the Atlantic in Philadelphia, + George Washington, the first President of the United States, and Thomas Jefferson, the nation's Secretary of State, + were drafting policies for the newly established American government following the signing of the Constitution. (Reuter)""" # Function to print each entry nicely def print_nicely(data): @@ -74,10 +81,11 @@ def print_nicely(data): print() # Blank line between entries news_agencies = newsagency_ner_pipeline(sentence) + print_nicely(news_agencies) ``` -{/* cell:7 cell_type:markdown */} +{/* cell:8 cell_type:markdown */} ## About Impresso @@ -109,17 +117,17 @@ v3 or later.

-{/* cell:8 cell_type:code */} +{/* cell:9 cell_type:code */} ```python ``` -{/* cell:9 cell_type:code */} +{/* cell:10 cell_type:code */} ```python ``` -{/* cell:10 cell_type:code */} +{/* cell:11 cell_type:code */} ```python ``` diff --git a/src/content/notebooks/search-multilingual-docs-impresso-hf.mdx b/src/content/notebooks/search-multilingual-docs-impresso-hf.mdx index b688c9b..6250329 100644 --- a/src/content/notebooks/search-multilingual-docs-impresso-hf.mdx +++ b/src/content/notebooks/search-multilingual-docs-impresso-hf.mdx @@ -2,9 +2,9 @@ githubUrl: https://github.com/impresso/impresso-datalab-notebooks/blob/main/annotate/search_multilingual_docs-ImpressoHF.ipynb authors: - impresso-team -title: Searching Relevant texts within an Embedding spaces -sha: 1b1cb5c32ccd6a4f594ddaedc5b8bb2003031de5 -date: 2024-10-29T10:20:36Z +title: Searching Relevant texts within an Embedding space +sha: 1184b255bf12b4ab79462f435b2a9fa442e5b78a +date: 2024-10-29T10:28:35Z links: [] googleColabUrl: https://colab.research.google.com/github/impresso/impresso-datalab-notebooks/blob/main/annotate/search_multilingual_docs-ImpressoHF.ipynb ---