-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add embedding search #17
Comments
Logged an issue over in SQLite.swift repo, but that doesn't mean we can't fork / add support / open a PR there to fulfill the issue! stephencelis/SQLite.swift#1232 |
Exploring embedding generation from swift… seems like a good candidate would be using candle (rust) with a sentence transformer, and building a binary that takes in text and outputs embeddings. or explore CoreML and look into transformer or ONNX conversion |
I'm really bad at C bindings stuff but i tried to put together a candle text -> embeddings binary that we can talk to via FFI |
from rust_embedding_lib README.md
You might be. :) (edit: although it doesn't seem like there's a ton actually present in that library right now) I was noodling on this and I was prepared to try and embed a Python interpreter into this binary to get access to the whole ecosystem of Python modules there... ; didn't realize Swift was an option there. (Also the idea of embedding a Python interpreter into something seems kind of insane, so I just wanted to try it.) Do you have an idea of which model embeddings you want to use for search? I've played with a couple of other projects that defaulted to bge-small-en-v1.5 -- #15 or all-mpnet-base-v2 -- #45 from HF leaderboard: https://huggingface.co/spaces/mteb/leaderboard Both are pretty small, and "seem" good for RAG based on the limited poking I've done with them. I've never tried to use them outside of python though. edit: n/m, I see |
gte-small feels like a good balance between quality and size from manual experimentation, but totally open to suggestion and / or making it so people can use whatever they want |
It looks like somebody already posted a coreml conversion of I have no experience w/ this, so I don't know if that's a format we can use but I found it while researching conversion options. I also found https://github.com/huggingface/exporters, but they appear to not support embedding models (plus I tried to do the conversion using their tool and it fails a validation step because some math is coming up with |
Theoretically, what I built should work, we just need to build the swift framework |
I guess that's a question I should have asked initially -- is the FFI bridge + rust lib the way you'd prefer to go? Or something more native like CoreML? |
😅 rust embeddings approach means any safetensors model with config and tokenizers should work, which feels like a very good thing. But if you can get CoreML working- that's awesome. I did noticed they were strangely large - like double the size for gte-small |
Agreed. The "run anything on the internet" was one of the reasons I felt like my awful embed-Python approach could almost be justifiable. I'm agnostic either way re: rust lib vs coreml, just having fun soaking all this stuff up. For my own entertainment I'll probably throw up a branch on my fork illustrating the coreml approach, but I've got no attachment to it. I've just never played w/ CoreML before. |
Please! That would be awesome! Thank you- I can't wait. |
Not having great luck with prebuilt coreml model. Will post more later on that. re: rust/candle - I did notice that candle doesn't support metal acceleration yet, only the 'accelerate' framework. I'm not sure if that's a concern with the embedding part, but I could imagine it will be with local LLMs |
You got this!
Problem for another day. Don't need the best solution, just need one that works for now. |
Hi, @jasonjmcghee |
update (repo here: https://github.com/jasonjmcghee/ragpipe): This script:
$ ./askRem "Which GitHub issues have I read recently?" <(sqlite3 db 'select text from allText order by frameId desc limit 1000')
Batches: 100%|███████████████████████████████| 19/19 [00:11<00:00, 1.65it/s] You have recently read issues: #3 (dark mode icons), #9 (login item - Rem will run on boot), and #11 (icon looks kinda weird when active in dark mode). total duration: 26.622822625s
load duration: 5.327591125s
prompt eval count: 1933 token(s)
prompt eval duration: 17.73078s
prompt eval rate: 109.02 tokens/s
eval count: 41 token(s)
eval duration: 3.554184s
eval rate: 11.54 tokens/s |
@vkehfdl1 - definitely want to make it easy to ingest from rem. You can query the sqlite file right now, which will give you the path to the ffmpeg file + frame offset too, so you can get the text and image. I'd love to simplify this though / make it easy to just ask |
@jasonjmcghee Great! I'd love to make data loader from |
@jasonjmcghee Now, I'll try to make some kind of demo that using |
@vkehfdl1 that looks very cool! Not knowing too much about |
@seletz I just made simple example running |
@jasonjmcghee @seletz |
Cool!
Did you try writing a custom prompt for the use-case? |
Could be reading into this the wrong way, but I'd want to make sure it's a client-agnostic approach and ideally, One of my concerns right now though is network access related stuff. Seems like the smart way (from an eng arch perspective) is to have an API for providing access to data and for talking to agents. but that unlocks "network access" stuff in App Sandbox - which... idk I feel many folks would feel better with a "absolutely no network access" approach. Maybe there could be 2 builds? One with network access entitlements and one without? |
@jasonjmcghee @vkehfdl1 I think a "no network connection" policy is very cool. We could use triggers as mentioned in #14 for this. Maybe it would be OK for now to just call a user-provided script which gets the path to the SQLite DB as argument? The DB tables would be the API, then ... |
I will try your great prompt! Plus, I will try some experiments for improving answer quality. |
@seletz It will be cool! I agree |
@jasonjmcghee
But, I tried this like 2 minutes recording only. I'm recording few hours for real use-cases. |
Update. However, raw passage (OCR result) is pretty unprocessed, so LLM can't recognize and extract information easily. |
I think this looks super promising: |
rem should index all text via embedding store.
We could use something like https://github.com/asg017/sqlite-vss
If we go this route we should fork / open a PR to add the extension https://github.com/stephencelis/SQLite.swift/tree/3d25271a74098d30f3936d84ec1004d6b785d6cd/Sources/SQLite/Extensions
This way we can search without needing verbatim matches.
We'll need to see what the RAM footprint and insertion time is.
More out of the box solutions appear to be available now:
https://github.com/ashvardanian/SwiftSemanticSearch
We'd need to see how long insertion / index updates take, but seems super promising.
The text was updated successfully, but these errors were encountered: