diff --git a/.gitignore b/.gitignore index c2658d7..2752eb9 100644 --- a/.gitignore +++ b/.gitignore @@ -1 +1,2 @@ node_modules/ +.DS_Store diff --git a/trace/README.md b/trace/README.md deleted file mode 100644 index 3582827..0000000 --- a/trace/README.md +++ /dev/null @@ -1,15 +0,0 @@ -# OpenInference Tracing - -> [!IMPORTANT] -> OpenInference Tracing is now being actively worked on via: [OpenInference](https://github.com/Arize-ai/openinference) -> This repository is maintained for historical purposes only. - -The OpenInference Tracing specification is edited in markdown files found in the [spec directory](./spec/README.md). - -OpenInference Tracing is a specification for capturing and storing LLM application executions. It's designed to provide insight into the invocation of LLMs and the surrounding application context such as retrieval from vector stores and the usage of external tools such as search engines or APIs. The specification is transport and file-format agnostic, and is intended to be used in conjunction with other specifications such as JSON, ProtoBuf, and DataFrames. - -Observability lets us understand a system from the outside, by letting us ask questions about that system without knowing its inner workings. Furthermore, it allows us to easily troubleshoot and handle novel problems (i.e. “unknown unknowns”), and helps us answer the question, “Why is this happening?” - -In order to be able to ask those questions of a system, the application must be properly instrumented. That is, the application code must emit signals such as traces, metrics, and logs. An application is properly instrumented when developers don’t need to add more instrumentation to troubleshoot an issue, because they have all of the information they need. - -OpenInference tracing is the mechanism by which an LLM application is instrumented, to help make a system observable. diff --git a/trace/spec/README.md b/trace/spec/README.md deleted file mode 100644 index 2bf3758..0000000 --- a/trace/spec/README.md +++ /dev/null @@ -1,48 +0,0 @@ -# OpenInference Tracing Specification - -This specification covers the OpenInference Tracing specification for capturing and storing LLM application executions. It is designed to be a category of telemetry data that is used to understand the execution of LLMs and the surrounding application context such as retrieval from vector stores and the usage of external tools such as search engines or APIs. - -## Spans - -A span represents a unit of work or operation. It tracks specific operations that a request makes, painting a picture of what happened during the time in which that operation was executed. - -A span contains name, time-related data, structured log messages, and other metadata (that is, Attributes) to provide information about the operation it tracks. - -## Traces - -A trace records the paths taken by requests (made by an application or end-user) as they propagate through multiple steps. - -Without tracing, it is challenging to pinpoint the cause of performance problems in a system. - -It improves the visibility of our application or system’s health and lets us debug behavior that is difficult to reproduce locally. Tracing is essential for LLM applications, which commonly have nondeterministic problems or are too complicated to reproduce locally. - -Tracing makes debugging and understanding LLM applications less daunting by breaking down what happens within a request as it flows through a system. - -A trace is made of one or more spans. The first span represents the root span. Each root span represents a request from start to finish. The spans underneath the parent provide a more in-depth context of what occurs during a request (or what steps make up a request). - -## Specifications - -- [Traces](./traces.md) -- [Semantic Conventions](./semantic_conventions.md) - -## Notation Conventions and Compliance - -The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", -"SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in the -specification are to be interpreted as described in [BCP -14](https://tools.ietf.org/html/bcp14) -[[RFC2119](https://tools.ietf.org/html/rfc2119)] -[[RFC8174](https://tools.ietf.org/html/rfc8174)] when, and only when, they -appear in all capitals, as shown here. - -An implementation of the specification is not compliant if it fails to -satisfy one or more of the "MUST", "MUST NOT", "REQUIRED", "SHALL", or "SHALL -NOT" requirements defined in the specification. Conversely, an -implementation of the specification is compliant if it satisfies all the -"MUST", "MUST NOT", "REQUIRED", "SHALL", and "SHALL NOT" requirements defined in -the specification. - -## Project Naming - -- The official project name is "OpenInference Tracing" (with no space between "Open" and - "Inference"). diff --git a/trace/spec/semantic_conventions.md b/trace/spec/semantic_conventions.md deleted file mode 100644 index 7f04e84..0000000 --- a/trace/spec/semantic_conventions.md +++ /dev/null @@ -1,55 +0,0 @@ -# Semantic Conventions - -The **Semantic Conventions** define the keys and values which describe commonly observed concepts, protocols, and operations used by applications. These conventions are used to populate the `attributes` of `spans` and span `events`. - -## Reserved Attributes - -The following attributes are reserved and MUST be supported by all OpenInference Tracing SDKs: - -| Attribute | Type | Example | Description | -| -------------------------------------- | --------------- | -------------------------------------------------------------------------- | ------------------------------------------------------------- | -| `exception.type` | String | `"NullPointerException"` | The type of exception that was thrown | -| `exception.message` | String | `"Null value encountered"` | Detailed message describing the exception | -| `exception.escaped` | Boolean | `true` | Indicator if the exception has escaped the span's scope | -| `exception.stacktrace` | String | `"at app.main(app.java:16)"` | The stack trace of the exception | -| `output.value` | String | `"Hello, World!"` | The output value of an operation | -| `output.mime_type` | String | `"text/plain"` or `"application/json"` | MIME type representing the format of `output.value` | -| `input.value` | String | `"{'query': 'What is the weather today?'}"` | The input value to an operation | -| `input.mime_type` | String | `"text/plain"` or `"application/json"` | MIME type representing the format of `input.value` | -| `embedding.embeddings` | List of objects | `[{"embeeding.vector": [...], "embedding.text": "hello"}]` | List of embedding objects including text and vector data | -| `embedding.model_name` | String | `"BERT-base"` | Name of the embedding model used | -| `embedding.text` | String | `"hello world"` | The text represented in the embedding | -| `embedding.vector` | List of floats | `[0.123, 0.456, ...]` | The embedding vector consisting of a list of floats | -| `llm.function_call` | JSON String | `"{function_name: 'add', args: [1, 2]}"` | Object recording details of a function call in models or APIs | -| `llm.invocation_parameters` | JSON string | `"{model_name: 'gpt-3', temperature: 0.7}"` | Parameters used during the invocation of an LLM or API | -| `llm.input_messages` | List of objects | `[{"message.role": "user", "message.content": "hello"}]` | List of messages sent to the LLM in a chat API request | -| `llm.output_messages` | List of objects | `[{"message.role": "user", "message.content": "hello"}]` | List of messages received from the LLM in a chat API request | -| `message.role` | String | `"user"` or `"system"` | Role of the entity in a message (e.g., user, system) | -| `message.function_call_name` | String | `"multiply"` or `"subtract"` | Function call function name | -| `message.function_call_arguments_json` | JSON String | `"{ 'x': 2 }"` | The arguments to the function call in JSON | -| `message.content` | String | `"What's the weather today?"` | The content of a message in a chat | -| `message.tool_calls` | List of objects | `[{"tool_call.function.name": "get_current_weather"}]` | List of tool calls (e.g. function calls) generated by the LLM | -| `tool_call.function.name` | String | `get_current_weather` | The name of the function being invoked by a tool call | -| `tool_call.function.arguments` | JSON string | `"{'city': 'London'}"` | The arguments for the function being invoked by a tool call | -| `llm.model_name` | String | `"gpt-3.5-turbo"` | The name of the language model being utilized | -| `llm.prompt_template.template` | String | `"Weather forecast for {city} on {date}"` | Template used to generate prompts as Python f-strings | -| `llm.prompt_template.variables` | JSON String | `{ context: "", subject: "math" }` | JSON of key value pairs applied to the prompt template | -| `llm.prompt_template.version` | String | `"v1.0"` | The version of the prompt template | -| `llm.token_count.prompt` | Integer | `5` | The number of tokens in the prompt | -| `llm.token_count.completion` | Integer | `15` | The number of tokens in the completion | -| `llm.token_count.total` | Integer | `20` | Total number of tokens, including prompt and completion | -| `tool.name` | String | `"WeatherAPI"` | The name of the tool being utilized | -| `tool.description` | String | `"An API to get weather data."` | Description of the tool's purpose and functionality | -| `tool.parameters` | JSON string | `"{ 'a': 'int' }"` | The parameters definition for invoking the tool | -| `retrieval.documents` | List of objects | `[{"document.id": "1", "document.score": 0.9, "document.content": "..."}]` | List of retrieved documents | -| `document.id` | String/Integer | `"1234"` or `1` | Unique identifier for a document | -| `document.score` | Float | `0.98` | Score representing the relevance of a document | -| `document.content` | String | `"This is a sample document content."` | The content of a retrieved document | -| `document.metadata` | Object | `{"author": "John Doe", "date": "2023-09-09"}` | Metadata associated with a document | -| `reranker.input_documents` | List of objects | `[{"document.id": "1", "document.score": 0.9, "document.content": "..."}]` | List of documents as input to the reranker | -| `reranker.output_documents` | List of objects | `[{"document.id": "1", "document.score": 0.9, "document.content": "..."}]` | List of documents outputted by the reranker | -| `reranker.query` | String | `"How to format timestamp?"` | Query parameter of the reranker | -| `reranker.model_name` | String | `"cross-encoder/ms-marco-MiniLM-L-12-v2"` | Model name of the reranker | -| `reranker.top_k` | Integer | 3 | Top K parameter of the reranker | - -Note: the `object` type refers to a set of key-value pairs also known as a `struct`, `mapping`, `dictionary`, etc. diff --git a/trace/spec/traces.md b/trace/spec/traces.md deleted file mode 100644 index 40ecabf..0000000 --- a/trace/spec/traces.md +++ /dev/null @@ -1,192 +0,0 @@ -# Traces - -Traces give us the big picture of what happens when a request is made to an LLM application. Whether your application is an agent or a chatbot a, traces are essential to understanding the full "path" a request takes in your application. - -Let's explore this with three units of work, represented as Spans: - -query span: - -```json -{ - "name": "query", - "context": { - "trace_id": "ed7b336d-e71a-46f0-a334-5f2e87cb6cfc", - "span_id": "f89ebb7c-10f6-4bf8-8a74-57324d2556ef" - }, - "span_kind": "CHAIN", - "parent_id": null, - "start_time": "2023-09-07T12:54:47.293922-06:00", - "end_time": "2023-09-07T12:54:49.322066-06:00", - "status_code": "OK", - "status_message": "", - "attributes": { - "input.value": "Is anybody there?", - "input.mime_type": "text/plain", - "output.value": "Yes, I am here.", - "output.mime_type": "text/plain" - }, - "events": [] -} -``` - -This is the root span, denoting the beginning and end of the entire operation. Note that it has a trace_id field indicating the trace, but has no parent_id. That's how you know it's the root span. - -LLM span: - -```json -{ - "name": "llm", - "context": { - "trace_id": "ed7b336d-e71a-46f0-a334-5f2e87cb6cfc", - "span_id": "ad67332a-38bd-428e-9f62-538ba2fa90d4" - }, - "span_kind": "LLM", - "parent_id": "f89ebb7c-10f6-4bf8-8a74-57324d2556ef", - "start_time": "2023-09-07T12:54:47.597121-06:00", - "end_time": "2023-09-07T12:54:49.321811-06:00", - "status_code": "OK", - "status_message": "", - "attributes": { - "llm.input_messages": [ - { - "message.role": "system", - "message.content": "You are an expert Q&A system that is trusted around the world.\nAlways answer the query using the provided context information, and not prior knowledge.\nSome rules to follow:\n1. Never directly reference the given context in your answer.\n2. Avoid statements like 'Based on the context, ...' or 'The context information ...' or anything along those lines." - }, - { - "message.role": "user", - "message.content": "Hello?" - } - ], - "output.value": "assistant: Yes I am here", - "output.mime_type": "text/plain" - }, - "events": [], -} -``` - -This span encapsulates a sub task, like invoking an LLM, and its parent is the hello span. Note that it shares the same trace_id as the root span, indicating it's a part of the same trace. Additionally, it has a parent_id that matches the span_id of the query span. - -These two blocks of JSON all share the same trace_id, and the parent_id field represents a hierarchy. That makes it a Trace! - -Another thing you'll note is that each Span looks like a structured log. That's because it kind of is! One way to think of Traces is that they're a collection of structured logs with context, correlation, hierarchy, and more baked in. However, these "structured logs" can come from different parts of your application stack such as a vector store retriever or a langchain tool. This is what allows tracing to represent an end-to-end view of any system. - -To understand how tracing in OpenInference works, let's look at a list of components that will play a part in instrumenting our code. - -Tracer -A Tracer creates spans containing more information about what is happening for a given operation, such as a request in a service. - -Trace Exporters -Trace Exporters send traces to a consumer. This consumer can be standard output for debugging and development-time or a OpenInference Collector. - -## Spans - -A span represents a unit of work or operation. Spans are the building blocks of Traces. In OpenInference, they include the following information: - -- Name -- Parent span ID (empty for root spans) -- Start and End Timestamps -- Span Context -- Attributes -- Span Events -- Span Status -- Sample span: - -```json -{ - "name": "query", - "context": { - "trace_id": "ed7b336d-e71a-46f0-a334-5f2e87cb6cfc", - "span_id": "f89ebb7c-10f6-4bf8-8a74-57324d2556ef" - }, - "span_kind": "CHAIN", - "parent_id": null, - "start_time": "2023-09-07T12:54:47.293922-06:00", - "end_time": "2023-09-07T12:54:49.322066-06:00", - "status_code": "OK", - "status_message": "", - "attributes": { - "input.value": "Hello?", - "input.mime_type": "text/plain", - "output.value": "I am here.", - "output.mime_type": "text/plain" - }, - "events": [] -} -``` - -Spans can be nested, as is implied by the presence of a parent span ID: child spans represent sub-operations. This allows spans to more accurately capture the work done in an application. - -### Span Context - -Span context is an immutable object on every span that contains the following: - -- The Trace ID representing the trace that the span is a part of -- The span's Span ID - -Because Span Context contains the Trace ID, it is used when creating Span Links. - -### Attributes - -Attributes are key-value pairs that contain metadata that you can use to annotate a Span to carry information about the operation it is tracking. - -For example, if a span invokes an LLM, you can capture the model name, the invocation parameters, the token count, and so on. - -Attributes have the following rules: - -- Keys must be non-null string values -- Values must be a non-null string, boolean, floating point value, integer, or an array of these values - Additionally, there are Semantic Attributes, which are known naming conventions for metadata that is typically present in common operations. It's helpful to use semantic attribute naming wherever possible so that common kinds of metadata are standardized across systems. See [semantic conventions](./semantic_conventions.md) for more information. - -### Span Events - -A Span Event can be thought of as a structured log message (or annotation) on a Span, typically used to denote a meaningful, singular point in time during the Span's duration. - -For example, consider two scenarios with an LLM: - -- Tracking a LLM execution time -- Denoting when the first token is sent - -A Span is best used to the first scenario because it's an operation with a start and an end. - -A Span Event is best used to track the second scenario because it represents a meaningful, singular point in time. - -### Span Status - -A status will be attached to a span. Typically, you will set a span status when there is a known error in the application code, such as an exception. A Span Status will be tagged as one of the following values: - -- Unset -- Ok -- Error -- When an exception is handled, a Span status can be set to Error. - -### Span Kind - -When a span is created, it is one of Chain, Retriever, LLM, Embedding, Agent, or Tool. This span kind provides a hint to the tracing backend as to how the trace should be assembled. - -#### Chain - -A Chain is a starting point or a link between different LLM application steps. For example, a Chain span could be used to represent the beginning of a request to an LLM application or the glue code that passes context from a retriever to and LLM call. - -#### Retriever - -A Retriever is a span that represents a data retrieval step. For example, a Retriever span could be used to represent a call to a vector store or a database. - -#### Reranker - -A Reranker is a span that represents the reranking of a set of input documents. For example, a cross-encoder may be used to compute the input documents' relevance scores with respect to a user query, and the top K documents with the highest scores are then returned by the Reranker. - -#### LLM - -An LLM is a span that represents a call to an LLM. For example, an LLM span could be used to represent a call to OpenAI or Llama. - -#### Embedding - -An Embedding is a span that represents a call to an LLM for an embedding. For example, an Embedding span could be used to represent a call OpenAI to get an ada-2 embedding for retrieval. - -#### Tool - -A Tool is a span that represents a call to an external tool such as a calculator or a weather API. - -#### Agent - -A span that encompasses calls to LLMs and Tools. An agent describes a reasoning block that acts on tools using the guidance of an LLM.