-
Notifications
You must be signed in to change notification settings - Fork 4
RAGControl.cs
suncloudsmoon edited this page Dec 8, 2024
·
1 revision
This file defines a custom Windows Forms user control named RAGControl
within the TextForge
namespace. The control is designed to handle operations related to indexing PDF files and managing their content for retrieval-augmented generation (RAG) tasks. It integrates with various libraries for PDF processing, vector database operations, and interaction with Microsoft Word documents.
-
CHUNK_LEN: A constant integer representing the length of chunks into which PDF content is split during indexing. It's derived from token counts using
CommonUtils.TokensToCharCount(256)
.
- RAGControl(): The constructor for the user control. It initializes components and sets up event handlers for loading the control and initializing its bindings.
- AddButton_Click(object sender, EventArgs e): Handles the click event for adding PDF files. It opens a file dialog to select PDFs, adds them to a list, and initiates the indexing process.
- RemoveButton_Click(object sender, EventArgs e): Handles the click event for removing selected documents from the list.
- FileListBox_KeyDown(object sender, KeyEventArgs e): Handles key down events on the file list box, allowing deletion of selected documents via the Delete key.
- FileListBox_MouseMove(object sender, MouseEventArgs e): Updates a tooltip to show the full path of the PDF file when the user hovers over an item in the file list box.
- GetRAGContext(string query, int maxTokens): Retrieves contextually relevant information from the indexed documents based on a given query, limiting the response to a specified number of tokens.
- AskQuestion(SystemChatMessage systemPrompt, IEnumerable messages, Word.Range context, float temperature, Word.Document doc = null): An asynchronous method that constructs a chat history and sends a question to a chat client, receiving a streaming response.
-
AskQuestionForImage(SystemChatMessage systemPrompt, IEnumerable messages, Word.Range context, Word.Document doc = null): Similar to
AskQuestion
, but generates an image response instead of text. - ProcessInformation(SystemChatMessage systemPrompt, IEnumerable messages, Word.Range context, Word.Document doc = null): Processes and constructs chat history including system prompts, user messages, and contextual information from a Word document range.
- GetUserPromptLen(IEnumerable messageList): Calculates the total length of user prompt messages in characters.
- OptimizeConstraint(float maxPercentage, int contextLength, int promptTokenLen, int documentContentTokenLen): Computes constraints for the amount of document content and RAG context to include based on token limits.
- GetWordDocumentAsRAG(string query, Word.Range context): Creates a RAG context from a Word document by splitting its content into chunks, indexing them, and retrieving relevant chunks based on a query.
-
_cultureHelper: An instance of
CultureLocalizationHelper
for handling localized strings. -
_fileToolTip: A
ToolTip
control for displaying file paths. - _removalQueue: A queue for tracking documents scheduled for removal during indexing.
- _indexFileCount: A concurrent dictionary tracking the number of files being indexed.
- _fileList: A binding list of key-value pairs representing file labels and paths, bound to the file list box.
-
_db: An instance of
HyperVectorDB.HyperVectorDB
for managing the vector database. - _isIndexing: A boolean indicating whether the indexing process is ongoing.
- preciseProgressBar: A float to track the progress bar's value with higher precision.
- progressBarLock: An object used to synchronize access to the progress bar.
- InitializeComponent(): Automatically generated to initialize the control's components.
- InitializeRAGControl(): Initializes bindings and UI components, including setting up the file list box and tooltip.
- IndexDocumentAsync(string filePath): Asynchronously indexes the content of a PDF file into the vector database.
- AddDocument(string filePath, string content): Adds a document's content to the vector database.
- DeleteDocument(string filePath): Deletes a document's index from the vector database.
- AutoHideRemoveButton(): Enables or disables the remove button based on whether there are files in the list.
- ChangeProgressBarVisibility(bool val): Shows or hides the progress bar.
- ResetProgressBar(): Resets the progress bar's value to zero.
- GetProgressBarValue(): Gets the current value of the progress bar as a float between 0 and 1.
- SetProgressBarValue(float val): Sets the progress bar's value based on a float between 0 and 1.
- UpdateProgressBar(float val): Updates the progress bar's value, considering the number of files being indexed.
-
GetIndexFileCount(): Calculates the total number of files being indexed by summing up counts from
_indexFileCount
. - ProcessRemovalQueue(): Processes queued removal requests for documents, attempting to delete them from the database.
- ChangeProgressBarVisibilityAfterSleep(int seconds, bool val): Asynchronously changes the visibility of the progress bar after a specified delay.
- RemoveSelectedDocument(): Removes the selected document from the file list and the database if not currently indexing.
- ReadPdfFileAsync(string filePath, int chunkLen): Reads and splits the content of a PDF file into chunks of specified length, handling encryption prompts if necessary.
- IteratePdfFile(ref PdfDocument document, ref List chunks, int chunkLen): Recursively iterates through a PDF document and its embedded PDF files to extract and chunk their text content.
- IterateInnerPdfFile(ref PdfDocument doc, ref List chunks, int chunkLen): Extracts text from the pages of a PDF document, splits it into blocks, and further into chunks.
- Load: Triggered when the user control is loaded, initiating the setup of bindings and UI components.
-
MouseMove on
FileListBox
: Updates the tooltip to display the full path of the PDF file when the mouse hovers over an item in the list box. -
KeyDown on
FileListBox
: Allows deletion of selected documents using the Delete key.