Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[refactor] Make llm.c modular #1

Merged
merged 1 commit into from
Aug 6, 2024
Merged

Conversation

junaire
Copy link

@junaire junaire commented Jul 15, 2024

@junaire
Copy link
Author

junaire commented Jul 22, 2024

after spending some time messing around with llm.c, i feel like the author doesn't intend to make this good for engineering, but more like an example for demonstrating how large and complex modern llm works under the hood. thus embedding llm.c into WasmEdge is not as easy as ggml. There are at least 4 problems i can think of:

  1. llm.c is using plain Makefile as its build system and does a bad job at managing dependencies. For example, llm.c depends on cudnn-frontend, and you have to manually tell it where to search that (by setting CUDNN_FRONTEND_PATH environment variable, or it will simply look at your home directory)

    llm.c/Makefile

    Lines 103 to 105 in bdb0fb5

    # Check and include cudnn if available
    # You can override the path to cudnn frontend by setting CUDNN_FRONTEND_PATH on the make command line
    # By default, we look for it in HOME/cudnn-frontend/include and ./cudnn-frontend/include

  2. llm.c is just an executable train_gpt2 and train_gpt2cu. In this PR i separated some functions and made it a shared lib. well, that probably WORKS, but llm.c is changing every day, it will take a lot of effort to rebase if we can't upstream our changes, or we have to lock the version (lost the ability to improve in the future!)

  3. llm.c supports training on the cpu and gpu. HOWEVER, they seem just to copy & paste the old code and rewrite some logic, so no abstractions and a lot of code doing very similar things, even with SAME symbols... that said, we can't link with train_gpt2_cpu.so and train_gpt2_gpu.so together, cuz that will cause symbol conflicts. my current plan is to use dynamic library loading (dlopen), only loading the corresponding lib based on what users have chosen. the drawback is the performance penalty, and we have to ship these libs, together with WasmEdge. Or we only support cpu training atm.

  4. well, llm.c can't train gpt2 from scratch but needs an existing checkpoint. if you take a look at their examples, they first train gpt2 using pytorch for a few epochs, and then llm.c will take over the process. that said, even we successfully exposed these interfaces to wasm or rust, we still need python.

@hydai
Copy link
Member

hydai commented Jul 22, 2024

  1. llm.c is using plain Makefile as its build system and does a bad job at managing dependencies. For example, llm.c depends on cudnn-frontend, and you have to manually tell it where to search that (by setting CUDNN_FRONTEND_PATH environment variable, or it will simply look at your home directory)

    llm.c/Makefile

    Lines 103 to 105 in bdb0fb5

    # Check and include cudnn if available
    # You can override the path to cudnn frontend by setting CUDNN_FRONTEND_PATH on the make command line
    # By default, we look for it in HOME/cudnn-frontend/include and ./cudnn-frontend/include

Let's create a CMakefile for it. So we can reduce several dependencies searching issue.

  1. llm.c is just an executable train_gpt2 and train_gpt2cu. In this PR i separated some functions and made it a shared lib. well, that probably WORKS, but llm.c is changing every day, it will take a lot of effort to rebase if we can't upstream our changes, or we have to lock the version (lost the ability to improve in the future!)

Since there is an issue talked about the CMake support, I don't think the upstream would like to merge our changes.
Instead, let's lock the version. Then, if we need new features or improvements from upstream, we can manually improve it by rebasing.

  1. llm.c supports training on the cpu and gpu. HOWEVER, they seem just to copy & paste the old code and rewrite some logic, so no abstractions and a lot of code doing very similar things, even with SAME symbols... that said, we can't link with train_gpt2_cpu.so and train_gpt2_gpu.so together, cuz that will cause symbol conflicts. my current plan is to use dynamic library loading (dlopen), only loading the corresponding lib based on what users have chosen. the drawback is the performance penalty, and we have to ship these libs, together with WasmEdge. Or we only support cpu training atm.

IMO, we should have a flag in the CMakefile, so users can choose which backend they really need. I expect that there will be two plugins, one for CPU and another for GPU(cuda). This should be easier for the installer, and let users select which version they want.

  1. well, llm.c can't train gpt2 from scratch but needs an existing checkpoint. if you take a look at their examples, they first train gpt2 using pytorch for a few epochs, and then llm.c will take over the process. that said, even we successfully exposed these interfaces to wasm or rust, we still need python.

Oh, this is a big issue. However, we can have a tutorial to tell users how to get the first step done. We can also provide an existing checkpoint in the example and share it with users.

@junaire
Copy link
Author

junaire commented Jul 29, 2024

There's a bug in cudnn-frontend (NVIDIA/cudnn-frontend#90) that blocks us from supporting cuda training. maybe we can just land the CPU support in this pr? @hydai

@hydai
Copy link
Member

hydai commented Jul 30, 2024

Hi @junaire
Let's focus on the CPU version first. It seems like the CUDANN has a small issue you mentioned, but it should be easier to enable once the problem is resolved.

@junaire junaire force-pushed the jun/wasmedge branch 3 times, most recently from 4016420 to b52413a Compare July 31, 2024 02:29
@junaire
Copy link
Author

junaire commented Jul 31, 2024

I did some local testing and it works, I feel good to land this now. @hydai

@hydai
Copy link
Member

hydai commented Jul 31, 2024

@junaire
Could you add a workflow file to build and do some simple tests on it? So we can ensure the building process and executing behavior are correct.

@junaire
Copy link
Author

junaire commented Aug 1, 2024

@junaire Could you add a workflow file to build and do some simple tests on it? So we can ensure the building process and executing behavior are correct.

I added a workflow but it didn't run as expected, possibly some permission issues, can you plz check? @hydai

@hydai
Copy link
Member

hydai commented Aug 1, 2024

Hi @junaire
I just granted the workflows permission, please try again.

@junaire
Copy link
Author

junaire commented Aug 1, 2024

Hi @junaire I just granted the workflows permission, please try again.

can you approve the workflow?

@junaire junaire force-pushed the jun/wasmedge branch 7 times, most recently from 146d90a to d709e38 Compare August 1, 2024 14:37
@junaire
Copy link
Author

junaire commented Aug 1, 2024

i may need some help here... any ideas about why the windows shit keeps failing? @hydai

@junaire junaire force-pushed the jun/wasmedge branch 2 times, most recently from 33c932f to 399e56c Compare August 1, 2024 15:02
@hydai
Copy link
Member

hydai commented Aug 1, 2024

Hi @junaire
We don't use mingw64 to build wasmedge, so removing such Windows-related workflows for now is fine. I don't want you to spend lots of time supporting multiple platforms. Let's take care of the Linux first.
If you are writing an open-source contribution, please avoid using dirty words in the commit history; they will be kept on this PR just as you can see now.

@junaire
Copy link
Author

junaire commented Aug 5, 2024

gpu tests failing seem to be unrelated, as they are required to be run on self hosted runners.

CMakeLists.txt Show resolved Hide resolved
CMakeLists.txt Show resolved Hide resolved
CMakeLists.txt Show resolved Hide resolved
CMakeLists.txt Show resolved Hide resolved
llmc/dataloader.h Outdated Show resolved Hide resolved
@hydai
Copy link
Member

hydai commented Aug 5, 2024

gpu tests failing seem to be unrelated, as they are required to be run on self hosted runners.

Let's remove these self-hosted workflow jobs for now. Since we are not able to deploy such runners inside this repo.

@junaire
Copy link
Author

junaire commented Aug 5, 2024

gpu tests failing seem to be unrelated, as they are required to be run on self hosted runners.

Let's remove these self-hosted workflow jobs for now. Since we are not able to deploy such runners inside this repo.

Done.

@hydai hydai merged commit ddb5b4f into WasmEdge:master Aug 6, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants