Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maximum context length exceeded #95

Closed
ashryanbeats opened this issue Jul 20, 2024 · 4 comments
Closed

Maximum context length exceeded #95

ashryanbeats opened this issue Jul 20, 2024 · 4 comments
Assignees

Comments

@ashryanbeats
Copy link

ashryanbeats commented Jul 20, 2024

Background

From the readme:

When the number of documents fetched leads to a request above the token limit, the library uses the following strategy -

It runs a preprocessing step to select relevant sections from each document until the total number of tokens is less than the maximum number of tokens allowed by the model. It then uses the transformed documents as context to answer the question.

Issue

I'm not finding that this preprocessing step is happening. I've run into these context length errors on both GPT 3.5 Turbo and GPT 4o.

I understand I can do these workarounds:

  • Set the search result limit with setSearchResultCount()
  • Roll my own preprocessor

But in order to take advantage of the described preprocessor, is there something specific I need to do?

My setup

I can confirm my setup works when I load less data. Essentially, I'm loading sanitized emails as a JSON objects. With 5 emails loaded, it's fine. With 10 emails loaded, I'm hitting the token limit.

My builder setup:

const ragInstance = await new RAGApplicationBuilder()
    .setModel(SIMPLE_MODELS.OPENAI_GPT4_O)
    .setEmbeddingModel(new OpenAi3SmallEmbeddings())
    .setVectorDb(new HNSWDb())
    .setCache(new MemoryCache())
    .build();

My loader:

// for each `message` object...
const loaderSummary = await ragApplication.addLoader(
   new JsonLoader({ object: message })
);

My EmbedJS version:

"@llm-tools/embedjs": "^0.0.91",
@ashryanbeats
Copy link
Author

Actually, it seems like the error is happening when I load resources. Here is how I am loading the resources:

const loadResources = async (ragApplication, messages) => {
  console.log("RAG Application:", ragApplication);

  const loaderSummaries = await Promise.all(
    messages.map(async (message) => {
      console.log("Adding loader for:", message.subject);

      const loaderSummary = await ragApplication.addLoader(
        new JsonLoader({ object: message })
      );

      return loaderSummary;
    })
  );

  console.log(
    "\nLoader summaries:\n",
    loaderSummaries.map((summary) => JSON.stringify(summary)).join("\n")
  );

  return loaderSummaries;
};

The final console log is never called, so the error must be triggered during the addLoader() calls.

Adding a stack trace in case that's useful:

BadRequestError: 400 This model's maximum context length is 8192 tokens, however you requested 10387 tokens (10387 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.
    at APIError.generate (file:///Users/ash/dev/email-rag/node_modules/openai/error.mjs:41:20)
    at OpenAI.makeStatusError (file:///Users/ash/dev/email-rag/node_modules/openai/core.mjs:268:25)
    at OpenAI.makeRequest (file:///Users/ash/dev/email-rag/node_modules/openai/core.mjs:311:30)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async file:///Users/ash/dev/email-rag/node_modules/@langchain/openai/dist/embeddings.js:268:29
    at async RetryOperation._fn (/Users/ash/dev/email-rag/node_modules/p-retry/index.js:50:12)

@ashryanbeats
Copy link
Author

I think I'm zeroing in on the issue.

I'm not exceeding the context limit for the main model, but for the embedding model. This implied to me that the preprocessing step doesn't apply to the embedding process.

I'll keep poking.

@adhityan
Copy link
Collaborator

Hey @ashryanbeats, yes - the preprocessing is not done for the embeddings. In embeding, its either all or nothing right now. The library usually breaks the loaded content sent into smaller chunks but that is not done for JSON loader.

I am thinking, we should have it auto break JSON into smaller embedding documents if the text is too large. But what chunking strategy to use needs more thought.

@adhityan
Copy link
Collaborator

adhityan commented Oct 9, 2024

I have thought about this and discussed with other library maintainers for similar projects in other languages. I think the best strategy is to break the JSON at the application end outside the library. But if you have more thoughts, let's open a discussion thread on this.

@adhityan adhityan closed this as completed Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants