Skip to content

RAGControl.cs

suncloudsmoon edited this page Dec 8, 2024 · 1 revision

Overview

This file defines a custom Windows Forms user control named RAGControl within the TextForge namespace. The control is designed to handle operations related to indexing PDF files and managing their content for retrieval-augmented generation (RAG) tasks. It integrates with various libraries for PDF processing, vector database operations, and interaction with Microsoft Word documents.

Public Members

Variables

  • CHUNK_LEN: A constant integer representing the length of chunks into which PDF content is split during indexing. It's derived from token counts using CommonUtils.TokensToCharCount(256).

Functions

  • RAGControl(): The constructor for the user control. It initializes components and sets up event handlers for loading the control and initializing its bindings.
  • AddButton_Click(object sender, EventArgs e): Handles the click event for adding PDF files. It opens a file dialog to select PDFs, adds them to a list, and initiates the indexing process.
  • RemoveButton_Click(object sender, EventArgs e): Handles the click event for removing selected documents from the list.
  • FileListBox_KeyDown(object sender, KeyEventArgs e): Handles key down events on the file list box, allowing deletion of selected documents via the Delete key.
  • FileListBox_MouseMove(object sender, MouseEventArgs e): Updates a tooltip to show the full path of the PDF file when the user hovers over an item in the file list box.
  • GetRAGContext(string query, int maxTokens): Retrieves contextually relevant information from the indexed documents based on a given query, limiting the response to a specified number of tokens.
  • AskQuestion(SystemChatMessage systemPrompt, IEnumerable messages, Word.Range context, float temperature, Word.Document doc = null): An asynchronous method that constructs a chat history and sends a question to a chat client, receiving a streaming response.
  • AskQuestionForImage(SystemChatMessage systemPrompt, IEnumerable messages, Word.Range context, Word.Document doc = null): Similar to AskQuestion, but generates an image response instead of text.
  • ProcessInformation(SystemChatMessage systemPrompt, IEnumerable messages, Word.Range context, Word.Document doc = null): Processes and constructs chat history including system prompts, user messages, and contextual information from a Word document range.
  • GetUserPromptLen(IEnumerable messageList): Calculates the total length of user prompt messages in characters.
  • OptimizeConstraint(float maxPercentage, int contextLength, int promptTokenLen, int documentContentTokenLen): Computes constraints for the amount of document content and RAG context to include based on token limits.
  • GetWordDocumentAsRAG(string query, Word.Range context): Creates a RAG context from a Word document by splitting its content into chunks, indexing them, and retrieving relevant chunks based on a query.

Private Members

Variables

  • _cultureHelper: An instance of CultureLocalizationHelper for handling localized strings.
  • _fileToolTip: A ToolTip control for displaying file paths.
  • _removalQueue: A queue for tracking documents scheduled for removal during indexing.
  • _indexFileCount: A concurrent dictionary tracking the number of files being indexed.
  • _fileList: A binding list of key-value pairs representing file labels and paths, bound to the file list box.
  • _db: An instance of HyperVectorDB.HyperVectorDB for managing the vector database.
  • _isIndexing: A boolean indicating whether the indexing process is ongoing.
  • preciseProgressBar: A float to track the progress bar's value with higher precision.
  • progressBarLock: An object used to synchronize access to the progress bar.

Functions

  • InitializeComponent(): Automatically generated to initialize the control's components.
  • InitializeRAGControl(): Initializes bindings and UI components, including setting up the file list box and tooltip.
  • IndexDocumentAsync(string filePath): Asynchronously indexes the content of a PDF file into the vector database.
  • AddDocument(string filePath, string content): Adds a document's content to the vector database.
  • DeleteDocument(string filePath): Deletes a document's index from the vector database.
  • AutoHideRemoveButton(): Enables or disables the remove button based on whether there are files in the list.
  • ChangeProgressBarVisibility(bool val): Shows or hides the progress bar.
  • ResetProgressBar(): Resets the progress bar's value to zero.
  • GetProgressBarValue(): Gets the current value of the progress bar as a float between 0 and 1.
  • SetProgressBarValue(float val): Sets the progress bar's value based on a float between 0 and 1.
  • UpdateProgressBar(float val): Updates the progress bar's value, considering the number of files being indexed.
  • GetIndexFileCount(): Calculates the total number of files being indexed by summing up counts from _indexFileCount.
  • ProcessRemovalQueue(): Processes queued removal requests for documents, attempting to delete them from the database.
  • ChangeProgressBarVisibilityAfterSleep(int seconds, bool val): Asynchronously changes the visibility of the progress bar after a specified delay.
  • RemoveSelectedDocument(): Removes the selected document from the file list and the database if not currently indexing.
  • ReadPdfFileAsync(string filePath, int chunkLen): Reads and splits the content of a PDF file into chunks of specified length, handling encryption prompts if necessary.
  • IteratePdfFile(ref PdfDocument document, ref List chunks, int chunkLen): Recursively iterates through a PDF document and its embedded PDF files to extract and chunk their text content.
  • IterateInnerPdfFile(ref PdfDocument doc, ref List chunks, int chunkLen): Extracts text from the pages of a PDF document, splits it into blocks, and further into chunks.

Events

  • Load: Triggered when the user control is loaded, initiating the setup of bindings and UI components.
  • MouseMove on FileListBox: Updates the tooltip to display the full path of the PDF file when the mouse hovers over an item in the list box.
  • KeyDown on FileListBox: Allows deletion of selected documents using the Delete key.
Clone this wiki locally