Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding loading (tried web url), adds 3 other duplicates #61

Closed
converseKarl opened this issue May 19, 2024 · 8 comments
Closed

Adding loading (tried web url), adds 3 other duplicates #61

converseKarl opened this issue May 19, 2024 · 8 comments
Assignees

Comments

@converseKarl
Copy link

converseKarl commented May 19, 2024

before it would add one to the loader list, now in v0.78 it adds 3 other duplicates.

and on the debug console we get all this
1|platform | Error adding URL: [Error: lance error: Commit conflict for version 1195: There was a concurrent commit that conflicts with this one and it cannot be automatically resolved. Please rerun the operation off the latest version of the table.
1|platform | Transaction: Transaction { read_version: 1194, uuid: "b317b32c-fa93-4c90-aec7-c26663444642", operation: Delete { updated_fragments: [], deleted_fragment_ids: [598], predicate: "uniqueLoaderId = "WebLoader_7e57a7538098320500365556c0c96ea1"" }, tag: None }
1|platform | Conflicting Transaction: Some(Transaction { read_version: 1193, uuid: "eb1a0a1d-176e-4e0d-840c-8e8c949b02c1", operation: Append { fragments: [Fragment { id: 0, files: [DataFile { path: "8cccacb1-74c0-4190-bec8-d425d2736c52.lance", fields: [0, 1, 2, 3, 4, 5], column_indices: [], file_major_version: 0, file_minor_version: 0 }], deletion_file: None, physical_rows: Some(4) }] }, tag: None }), /home/build_user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/lance-0.10.16/src/io/commit.rs:107:23]
1|platform | Error adding URL: [Error: lance error: Commit conflict for version 1336: There was a concurrent commit that conflicts with this one and it cannot be automatically resolved. Please rerun the operation off the latest version of the table.
1|platform | Transaction: Transaction { read_version: 1335, uuid: "036ad30e-4c63-4768-8984-4d74cf92f546", operation: Delete { updated_fragments: [], deleted_fragment_ids: [669], predicate: "uniqueLoaderId = "WebLoader_7e57a7538098320500365556c0c96ea1"" }, tag: None }
1|platform | Conflicting Transaction: Some(Transaction { read_version: 1334, uuid: "5ce59ff0-fc6d-4824-9068-d5131f6e6f36", operation: Append { fragments: [Fragment { id: 0, files: [DataFile { path: "fb294721-dfd9-4732-848f-92efcb7db637.lance", fields: [0, 1, 2, 3, 4, 5], column_indices: [], file_major_version: 0, file_minor_version: 0 }], deletion_file: None, physical_rows: Some(4) }] }, tag: None }), /home/build_user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/lance-0.10.16/src/io/commit.rs:107:23]
^[[A^[[A

@adhityan
Copy link
Collaborator

Looks like a race condition with lanceDb. I will investigate and address this. But this is not a new issue, a rare one.

In the meantime, could you check if this issue is recurring? A few runs would help.

@converseKarl
Copy link
Author

Will do and let you know.

@converseKarl
Copy link
Author

converseKarl commented May 19, 2024

I cleared everything down in the cache (including vector db - all clean) , everything under it removed, ran the same code in v0.78 that worked in v0.77 as before. I can confirm. I am using Lance and LMCache and and GTP3 Embedded Large.

when you add a URL via web loader

  1. 4 entries appear from the loaders method when retrieving this list (only 1 used to appear in v0.77 or less)
  2. These entries now have the same web loader id causing conflicts
  3. I notice many many documents (100's created in lance but this might be from a delete of the key from the loaders list) that i tried deleting one.

When i try delete one with deleteLoader(webloaderid, true) now there are conflicts i get
1|platform | Error removing embedding: [Error: lance error: Commit conflict for version 2: There was a concurrent commit that conflicts with this one and it cannot be automatically resolved. Please rerun the operation off the latest version of the table.
1|platform | Transaction: Transaction { read_version: 1, uuid: "bd1fa996-606c-415e-806a-11857e9ba3da", operation: Delete { updated_fragments: [], deleted_fragment_ids: [], predicate: "uniqueLoaderId = "WebLoader_1d0f5cba74525f0006ef1b1ae043b010"" }, tag: None }
1|platform | Conflicting Transaction: Some(Transaction { read_version: 1, uuid: "db01c718-4391-4bee-866a-506dd56bfd71", operation: Append { fragments: [Fragment { id: 0, files: [DataFile { path: "d809dd5c-be76-4172-bd88-9e59a2536f5a.lance", fields: [0, 1, 2, 3, 4, 5], column_indices: [], file_major_version: 0, file_minor_version: 0 }], deletion_file: None, physical_rows: Some(4) }] }, tag: None }), /home/build_user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/lance-0.10.16/src/io/commit.rs:107:23]

I refresh the loaders list and now100's of entires appears all duplicates which is really bad. None of this behaviour happened in 0.77 or before.

Some idea on the code that's running

let ragApplicationBuilder, ragApplication;
Code - API Setup
function setup() {
ragApplicationBuilder = new RAGApplicationBuilder().setQueryTemplate(prompt);

// Add other loaders and configurations as needed

ragApplication = await ragApplicationBuilder
.setTemperature(0.2)
.setEmbeddingModel(new OpenAi3LargeEmbeddings())
.setVectorDb(new LanceDb({ path: './db' }))
.setCache(new LmdbCache({ path: './llmcache'}))
.build();
}

// Add URL
function addURL(url) {
ragApplicationBuilder.addLoader(new WebLoader({ url: url }));
ragApplication = await ragApplicationBuilder
.build();
}

@adhityan
Copy link
Collaborator

I am looking at the changes between versions 0.77 and 0.78 to identify what the issue could be.

@converseKarl
Copy link
Author

Thanks Kindly, really need a fix for this. I just know when resources are adding they should appear in the "loaders" list from the ragApplication object, and not be duplicated, when deleting one, the id for that loader type is removed and also from the vector index/llm cache. The ragApplication.loaders list should reflect that state, Many thanks!

@adhityan
Copy link
Collaborator

I couldn't find anything in this single commit 3090f76 that could have caused an issue like this. Are you sure this issue does not occur in version 0.77?

@adhityan
Copy link
Collaborator

Can you test the latest version 0.79 and let me know if you still see this issue? Thank you

@converseKarl
Copy link
Author

converseKarl commented May 24, 2024

I've moved to v0.79 and no longer see duplicates. Also when i delete an item from the loader, it gets removed, and not the squillions of entries in the loader or rag no longer appear.

Great Job! thank you kindly!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants