Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

setup of ChatRWKV #29

Closed
bello7777 opened this issue Mar 9, 2023 · 6 comments
Closed

setup of ChatRWKV #29

bello7777 opened this issue Mar 9, 2023 · 6 comments

Comments

@bello7777
Copy link

bello7777 commented Mar 9, 2023

Hey guys great stuff, can we have very easy setup step process to install ChatRWKV on a ubuntu server for example?

@BlinkDL
Copy link
Owner

BlinkDL commented Mar 9, 2023

python 3.8/3.9/3.10

pip install numpy tokenizers prompt_toolkit ninja
pip install torch --extra-index-url https://download.pytorch.org/whl/cu117 --upgrade (use 1.13.1)
pip install rwkv --upgrade

:)

@soulteary
Copy link

There is no need to toss the environment, just use the container @bello7777

#58

@bello7777
Copy link
Author

bello7777 commented Mar 25, 2023

Thanks mate, I will do it .
for the moment I just launch it on AWS ubuntu 18.04 ,

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 200.00 MiB (GPU 0; 14.62 GiB total capacity; 13.77 GiB already allocated; 163.94 MiB free; 13.97 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

can I try to reduce the batch sizes to smaller values if yes where are they?

@bello7777
Copy link
Author

@soulteary
i tried to access your blog and guidelines but i could not,
could you give me the steps and version of Docker container so i can deploy it on ec2 server as a still have a problem with memory

@KerfuffleV2
Copy link
Contributor

@bello7777 You probably need to adjust the strategy. If you're using the pull request:

    model = RWKV(model=model_path, strategy='cuda fp16i8 *20 -> cuda fp16')

That's around line 28 in webui.py from that pull. You didn't say what you're actually doing, so there's no way to know if you're saying it failed when trying to load the model for inference, when converting, whatever.

But the most likely solution is to find whatever is running and how it's setting the strategy and reduce the number of layers it will send to the GPU. For example, in the line above you could try using cuda fp16i8 *10 -> cuda fp16 instead which should roughly half the required GPU memory.

After you get it going, you can use other tools to see how much GPU memory you have available and adjust the setting according.

@bello7777
Copy link
Author

thanks solved and working
trstchat
now moving to train the model

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants