Skip to content

Latest commit

 

History

History
148 lines (117 loc) · 5.53 KB

MIDDLEWARE.md

File metadata and controls

148 lines (117 loc) · 5.53 KB

Middleware

Middleware is a mechansim to have fine control over everything between calling .chat and Haverscript calling the LLM. As a example, consider the creation of a session.

session = connect("mistral") | echo()

We use | to create a pipeline for actions to be taken. The reply from the LLM flows left to right, and the prompt to invoke the LLM flows from right to left. In this case, there is only echo, but consider two examples using cache.

#  LLM <---- PROMPT GOES RIGHT TO LEFT <----- chat
connect("mistral") | echo() | cache("cache.db") 
#  LLM ----> REPLY GOES LEFT TO RIGHT ------> Response

In this example, we check the cache for any existing answer, then call echo only if we do not have a cache hit.

#  LLM <---- PROMPT GOES RIGHT TO LEFT <----- chat
connect("mistral") | cache("cache.db") | echo() 
#  LLM ----> REPLY GOES LEFT TO RIGHT ------> Response

In this second example, we always echo, even if the cache hits.

The user has the capability to choose the middleware stack they want for their application.

There are two ways of defining middleware. First as part of the overall session (as above), or as part of a chat. In this second case, the middleware is only append for this chat, and is not part of any context.

session.chat("Hello", middlware=echo())

Middleware in Haverscript

Haverscript provides following middleware

Middleware Purpose Class
model Request a specific model be used configuration
options Set specific LLM options (such as seed) configuration
format Set specific format for output configuration
dedent Remove spaces from prompt configuration
echo Print prompt and reply observation
stats Print basic stats about LLM observation
trace Log requests and responses observation
transcript Store a complete transcript of every call observation
retry retry on failure (using tenacity) reliablity
validation Fail under given condition reliablity
cache Store and/or query prompt-reply pairs in DB efficency
fresh Request a fresh reply (not cached) efficency
meta Support for generalized prompt and response transformations generalization

Configuration Middleware

def model(model_name: str) -> Middleware: 
    """Set the name of the model to use. Typically this is automatically set inside connect."""
def options(**kwargs) -> Middleware:
    """Options to pass to the model, such as temperature and seed."""
def format(schema: Type[BaseModel] | None = None) -> Middleware:
    """Request the output in JSON, or parsed JSON."""
def dedent() -> Middleware:
    """Remove unnecessary spaces from the prompt
  • model is automatically appended to the start of the middleware by the call to connect.
  • options allows options, such as temperature, the seed, and top_k, to be set.
  • format requests the the output be formatted in JSON, and automatically parse the output. If no type is provided, then the result is a JSON dict. If a BaseModel (from pydantic) type is provided then the schema of this specific BaseModel class is used, and the class is parsed. In either case, Response.value is used to access the parsed result.
  • detent removes excess spaces from the prompt.

See options for an example of using options. See formatting for an example of using format.

Observation Middleware

There are four middleware adapters for observation.

def echo(width: int = 78, prompt: bool = True, spinner: bool = True) -> Middleware:
    """echo prompts and responses to stdout."""
def stats() -> Middleware:
    """print stats to stdout."""
def trace(level: int = log.DEBUG) -> Middleware:
    """Log all requests and responses."""
def transcript(dirname: str) -> Middleware:
    """write a full transcript of every interaction, in a subdirectory."""
  • echo turns of echo of prompt and reply. There is a spinner (⠧) which is used when waiting for a response from the LLM.
  • stats prints based stats (token counts, etc) to the screen
  • trace uses pythons logging to log all prompts and responses.
  • transcript stores all prompt-response pairs, including context, in a sub-directory.

Reliablity Middleware

def retry(**options) -> Middleware:
    """retry uses tenacity to wrap the LLM request-response action in retry options."""
def validate(predicate: Callable[[str], bool]) -> Middleware:
    """validate the response as middleware. Can raise as LLMResultError"""

Efficency Middleware

def cache(filename: str, mode: str | None = "a+") -> Middleware:
    """Set the cache filename for this model."""
def fresh() -> Middleware:
    """require any cached reply be ignored, and a fresh reply be generated."""

Generalized Middleware

class MetaModel(BaseModel):
    system: str | None

    @abstractmethod
    def chat(self, prompt, next: LanguageModel) -> Reply:
        """Promote a chat-with-prompt into a follow-on call of the next model."""

def meta(model: MetaModel) -> Middleware:
    """provide a meta model as middleware"""

See meta model for a full example. The meta middleware is really powerful and general, and can be used to build models that use compute to generate useful answers.