Minor clarifications on transformations docs (run-llama#9044)

Arize-ai · Nov 21, 2023 · 6d11718 · 6d11718
1 parent dfea98a
commit 6d11718
Show file tree

Hide file tree

Showing 4 changed files with 23 additions and 11 deletions.
diff --git a/docs/module_guides/loading/ingestion_pipeline/root.md b/docs/module_guides/loading/ingestion_pipeline/root.md
@@ -4,7 +4,7 @@ An `IngestionPipeline` uses a concept of `Transformations` that are applied to i
 
 ## Usage Pattern
 
-At it's most basic level, you can quickly instantiate an `IngestionPipeline` like so:
+The simplest usage is to instantiate an `IngestionPipeline` like so:
 
 ```python
 from llama_index import Document
@@ -26,6 +26,8 @@ pipeline = IngestionPipeline(
 nodes = pipeline.run(documents=[Document.example()])
 ```
 
+Note that in a real-world scenario, you would get your documents from `SimpleDirectoryReader` or another reader from Llama Hub.
+
 ## Connecting to Vector Databases
 
 When running an ingestion pipeline, you can also chose to automatically insert the resulting nodes into a remote vector store.
@@ -63,6 +65,12 @@ from llama_index import VectorStoreIndex
 index = VectorStoreIndex.from_vector_store(vector_store)
 ```
 
+## Calculating embeddings in a pipeline
+
+Note that in the above example, embeddings are calculated as part of the pipeline. If you are connecting your pipeline to a vector store, embeddings must be a stage of your pipeline or your later instantiation of the index will fail.
+
+You can omit embeddings from your pipeline if you are not connecting to a vector store, i.e. just producing a list of nodes.
+
 ## Caching
 
 In an `IngestionPipeline`, each node + transformation combination is hashed and cached. This saves time on subsequent runs that use the same data.

diff --git a/docs/module_guides/loading/ingestion_pipeline/transformations.md b/docs/module_guides/loading/ingestion_pipeline/transformations.md
@@ -2,12 +2,12 @@
 
 A transformation is something that takes a list of nodes as an input, and returns a list of nodes. Each component that implements the `Transformation` base class has both a synchronous `__call__()` definition and an async `acall()` definition.
 
-Current;y, the following components are `Transformation` objects:
+Currently, the following components are `Transformation` objects:
 
-- `TextSplitter`
-- `NodeParser`
-- `MetadataExtractor`
-- `Embeddings`model
+- [`TextSplitter`](text_splitters)
+- [`NodeParser`](/module_guides/loading/node_parsers/modules.md)
+- [`MetadataExtractor`](/module_guides/loading/documents_and_nodes/usage_metadata_extractor.md)
+- `Embeddings`model (check our [list of supported embeddings](list_of_embeddings))
 
 ## Usage Pattern
 

diff --git a/docs/module_guides/loading/node_parsers/modules.md b/docs/module_guides/loading/node_parsers/modules.md
@@ -57,7 +57,9 @@ parser = MarkdownNodeParser()
 nodes = parser.get_nodes_from_documents(markdown_docs)
 ```
 
-## Text-Based Node Parsers
+(text_splitters)=
+
+## Text-Splitters
 
 ### CodeSplitter
 
@@ -66,7 +68,7 @@ Splits raw code-text based on the language it is written in.
 Check the full list of [supported languages here](https://github.com/grantjenks/py-tree-sitter-languages#license).
 
 ```python
-from llama_index.node_parser import CodeSplitter
+from llama_index.text_splitter import CodeSplitter
 
 splitter = CodeSplitter(
     language="python",
@@ -94,7 +96,7 @@ nodes = parser.get_nodes_from_documents(documents)
 The `SentenceSplitter` attempts to split text while respecting the boundaries of sentences.
 
 ```python
-from llama_index.node_parser import SentenceSplitter
+from llama_index.text_splitter import SentenceSplitter
 
 splitter = SentenceSplitter(
     chunk_size=1024,
@@ -132,7 +134,7 @@ A full example can be found [here in combination with the `MetadataReplacementNo
 The `TokenTextSplitter` attempts to split text while respecting the boundaries of sentences.
 
 ```python
-from llama_index.node_parser import TokenTextSplitter
+from llama_index.text_splitter import TokenTextSplitter
 
 splitter = TokenTextSplitter(
     chunk_size=1024,

diff --git a/docs/module_guides/models/embeddings.md b/docs/module_guides/models/embeddings.md
@@ -188,7 +188,9 @@ embeddings = embed_model.get_text_embedding(
 )
 ```
 
-## Modules
+(list_of_embeddings)=
+
+## List of supported embeddings
 
 We support integrations with OpenAI, Azure, and anything LangChain offers.