-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated llama to the latest GGML commit #21
Open
polkaulfield
wants to merge
8
commits into
Bip-Rep:main
Choose a base branch
from
polkaulfield:main
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Set the main default prompt to chat-with-bob from llama.cpp. This seems to produce much more useful conversations with llama-7b and orca-mini-3b models that I have tested. Also make the reverse prompt consistently "User:" in both default prompt options, and set the default reverse prompt detection to the same value.
llama.cpp doesn't build for ARM32 because it calls into 64 bit neon intrinsics. Not worth fixing that; lets just not offer this app on ARM32.
Rather than using prebuilt libraries, build the llama.cpp git submodule during the regular app build process. The library will now be installed in a standard location, which simplifies the logic needed to load it at runtime; there is no need to ship it as an asset. This works on Android, and also enables the app to build and run on Linux. Windows build is untested. One unfortunate side effect is that when building the app in Flutter's debug mode, the llama lib is built unoptimized and it works very very slowly, to the point where you might suspect the app is broken. However release mode seems as fast as before.
Update llama.cpp to the latest version as part of an effort to make this app usable on my Samsung Galaxy S10 smartphone. The newer llama.cpp includes a double-close fix which was causing the app to immediately crash upon starting the AI conversation (llama.cpp commit 47f61aaa5f76d04). It also adds support for 3B models, which are considerably smaller. The llama-7B models were causing Android's low memory killer to terminate Sherpa after just a few words of conversation, whereas new models such as orca-mini-3b.ggmlv3.q4_0.bin work on this device without quickly exhausting all available memory. llama.cpp's model compatibility has changed within this update, so ggml files that were working in the previous version are unlikely to work now; they need converting. However the orca-mini offering is already in the new format and works out of the box. llama.cpp's API has changed in this update. Rather than rework the Dart code, I opted to leave it in C++, using llama.cpp's example code as a base. This solution is included in a new "llamasherpa" library which calls into llama.cpp. Since lots of data is passed around in large arrays, I expect running this in Dart had quite some overhead, and this native approach should perform considerably faster. This eliminates the need for Sherpa's Dart code to call llama.cpp directly, so there's no need to separately maintain a modified version of llama.cpp and we can use the official upstream.
On first run on my Android device, the pre-prompt is empty, it does not get initialized to any value. This is because SharedPreferences performs asynchronous disk I/O, and initDefaultPrompts() uses a different SharedPreferences instance from getPrePrompts(). There's no guarantee that a preferences update on one instance will become immediately available in another. Tweak the logic to not depend on synchronization between two SharedPreferences instances.
The llama.cpp logic is built around the prompt ending with the reverse-prompt and the actual user input being passed separately. Adjust Sherpa to do the same, rather than appending the first line of user input to the prompt.
I think this repo has been abandoned. Ive taken over development and updated the app with a new UI and added support for GGUF models. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.