Maximum context length exceeded #95

ashryanbeats · 2024-07-20T15:55:08Z

Background

From the readme:

When the number of documents fetched leads to a request above the token limit, the library uses the following strategy -

It runs a preprocessing step to select relevant sections from each document until the total number of tokens is less than the maximum number of tokens allowed by the model. It then uses the transformed documents as context to answer the question.

Issue

I'm not finding that this preprocessing step is happening. I've run into these context length errors on both GPT 3.5 Turbo and GPT 4o.

I understand I can do these workarounds:

Set the search result limit with setSearchResultCount()
Roll my own preprocessor

But in order to take advantage of the described preprocessor, is there something specific I need to do?

My setup

I can confirm my setup works when I load less data. Essentially, I'm loading sanitized emails as a JSON objects. With 5 emails loaded, it's fine. With 10 emails loaded, I'm hitting the token limit.

My builder setup:

const ragInstance = await new RAGApplicationBuilder()
    .setModel(SIMPLE_MODELS.OPENAI_GPT4_O)
    .setEmbeddingModel(new OpenAi3SmallEmbeddings())
    .setVectorDb(new HNSWDb())
    .setCache(new MemoryCache())
    .build();

My loader:

// for each `message` object...
const loaderSummary = await ragApplication.addLoader(
   new JsonLoader({ object: message })
);

My EmbedJS version:

"@llm-tools/embedjs": "^0.0.91",

The text was updated successfully, but these errors were encountered:

ashryanbeats · 2024-07-21T06:22:06Z

Actually, it seems like the error is happening when I load resources. Here is how I am loading the resources:

const loadResources = async (ragApplication, messages) => {
  console.log("RAG Application:", ragApplication);

  const loaderSummaries = await Promise.all(
    messages.map(async (message) => {
      console.log("Adding loader for:", message.subject);

      const loaderSummary = await ragApplication.addLoader(
        new JsonLoader({ object: message })
      );

      return loaderSummary;
    })
  );

  console.log(
    "\nLoader summaries:\n",
    loaderSummaries.map((summary) => JSON.stringify(summary)).join("\n")
  );

  return loaderSummaries;
};

The final console log is never called, so the error must be triggered during the addLoader() calls.

Adding a stack trace in case that's useful:

BadRequestError: 400 This model's maximum context length is 8192 tokens, however you requested 10387 tokens (10387 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.
    at APIError.generate (file:///Users/ash/dev/email-rag/node_modules/openai/error.mjs:41:20)
    at OpenAI.makeStatusError (file:///Users/ash/dev/email-rag/node_modules/openai/core.mjs:268:25)
    at OpenAI.makeRequest (file:///Users/ash/dev/email-rag/node_modules/openai/core.mjs:311:30)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async file:///Users/ash/dev/email-rag/node_modules/@langchain/openai/dist/embeddings.js:268:29
    at async RetryOperation._fn (/Users/ash/dev/email-rag/node_modules/p-retry/index.js:50:12)

ashryanbeats · 2024-07-21T07:20:24Z

I think I'm zeroing in on the issue.

I'm not exceeding the context limit for the main model, but for the embedding model. This implied to me that the preprocessing step doesn't apply to the embedding process.

I'll keep poking.

adhityan · 2024-07-21T08:54:20Z

Hey @ashryanbeats, yes - the preprocessing is not done for the embeddings. In embeding, its either all or nothing right now. The library usually breaks the loaded content sent into smaller chunks but that is not done for JSON loader.

I am thinking, we should have it auto break JSON into smaller embedding documents if the text is too large. But what chunking strategy to use needs more thought.

adhityan · 2024-10-09T18:06:18Z

I have thought about this and discussed with other library maintainers for similar projects in other languages. I think the best strategy is to break the JSON at the application end outside the library. But if you have more thoughts, let's open a discussion thread on this.

github-actions bot assigned adhityan Jul 20, 2024

adhityan closed this as completed Oct 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maximum context length exceeded #95

Maximum context length exceeded #95

ashryanbeats commented Jul 20, 2024 •

edited

Loading

ashryanbeats commented Jul 21, 2024

ashryanbeats commented Jul 21, 2024

adhityan commented Jul 21, 2024

adhityan commented Oct 9, 2024

Maximum context length exceeded #95

Maximum context length exceeded #95

Comments

ashryanbeats commented Jul 20, 2024 • edited Loading

Background

Issue

My setup

ashryanbeats commented Jul 21, 2024

ashryanbeats commented Jul 21, 2024

adhityan commented Jul 21, 2024

adhityan commented Oct 9, 2024

ashryanbeats commented Jul 20, 2024 •

edited

Loading