[refactor] Make llm.c modular #1

junaire · 2024-07-15T03:51:53Z

junaire · 2024-07-22T16:10:35Z

after spending some time messing around with llm.c, i feel like the author doesn't intend to make this good for engineering, but more like an example for demonstrating how large and complex modern llm works under the hood. thus embedding llm.c into WasmEdge is not as easy as ggml. There are at least 4 problems i can think of:

llm.c is using plain Makefile as its build system and does a bad job at managing dependencies. For example, llm.c depends on cudnn-frontend, and you have to manually tell it where to search that (by setting CUDNN_FRONTEND_PATH environment variable, or it will simply look at your home directory)

llm.c/Makefile

Lines 103 to 105 in bdb0fb5

    
           # Check and include cudnn if available 
        
           # You can override the path to cudnn frontend by setting CUDNN_FRONTEND_PATH on the make command line 
        
           # By default, we look for it in HOME/cudnn-frontend/include and ./cudnn-frontend/include

llm.c is just an executable train_gpt2 and train_gpt2cu. In this PR i separated some functions and made it a shared lib. well, that probably WORKS, but llm.c is changing every day, it will take a lot of effort to rebase if we can't upstream our changes, or we have to lock the version (lost the ability to improve in the future!)
llm.c supports training on the cpu and gpu. HOWEVER, they seem just to copy & paste the old code and rewrite some logic, so no abstractions and a lot of code doing very similar things, even with SAME symbols... that said, we can't link with train_gpt2_cpu.so and train_gpt2_gpu.so together, cuz that will cause symbol conflicts. my current plan is to use dynamic library loading (dlopen), only loading the corresponding lib based on what users have chosen. the drawback is the performance penalty, and we have to ship these libs, together with WasmEdge. Or we only support cpu training atm.
well, llm.c can't train gpt2 from scratch but needs an existing checkpoint. if you take a look at their examples, they first train gpt2 using pytorch for a few epochs, and then llm.c will take over the process. that said, even we successfully exposed these interfaces to wasm or rust, we still need python.

hydai · 2024-07-22T17:51:26Z

llm.c is using plain Makefile as its build system and does a bad job at managing dependencies. For example, llm.c depends on cudnn-frontend, and you have to manually tell it where to search that (by setting CUDNN_FRONTEND_PATH environment variable, or it will simply look at your home directory)

llm.c/Makefile

Lines 103 to 105 in bdb0fb5

# Check and include cudnn if available

# You can override the path to cudnn frontend by setting CUDNN_FRONTEND_PATH on the make command line

# By default, we look for it in HOME/cudnn-frontend/include and ./cudnn-frontend/include

Let's create a CMakefile for it. So we can reduce several dependencies searching issue.

llm.c is just an executable train_gpt2 and train_gpt2cu. In this PR i separated some functions and made it a shared lib. well, that probably WORKS, but llm.c is changing every day, it will take a lot of effort to rebase if we can't upstream our changes, or we have to lock the version (lost the ability to improve in the future!)

Since there is an issue talked about the CMake support, I don't think the upstream would like to merge our changes.
Instead, let's lock the version. Then, if we need new features or improvements from upstream, we can manually improve it by rebasing.

llm.c supports training on the cpu and gpu. HOWEVER, they seem just to copy & paste the old code and rewrite some logic, so no abstractions and a lot of code doing very similar things, even with SAME symbols... that said, we can't link with train_gpt2_cpu.so and train_gpt2_gpu.so together, cuz that will cause symbol conflicts. my current plan is to use dynamic library loading (dlopen), only loading the corresponding lib based on what users have chosen. the drawback is the performance penalty, and we have to ship these libs, together with WasmEdge. Or we only support cpu training atm.

IMO, we should have a flag in the CMakefile, so users can choose which backend they really need. I expect that there will be two plugins, one for CPU and another for GPU(cuda). This should be easier for the installer, and let users select which version they want.

well, llm.c can't train gpt2 from scratch but needs an existing checkpoint. if you take a look at their examples, they first train gpt2 using pytorch for a few epochs, and then llm.c will take over the process. that said, even we successfully exposed these interfaces to wasm or rust, we still need python.

Oh, this is a big issue. However, we can have a tutorial to tell users how to get the first step done. We can also provide an existing checkpoint in the example and share it with users.

junaire · 2024-07-29T08:14:19Z

There's a bug in cudnn-frontend (NVIDIA/cudnn-frontend#90) that blocks us from supporting cuda training. maybe we can just land the CPU support in this pr? @hydai

hydai · 2024-07-30T06:22:33Z

Hi @junaire
Let's focus on the CPU version first. It seems like the CUDANN has a small issue you mentioned, but it should be easier to enable once the problem is resolved.

junaire · 2024-07-31T02:39:12Z

I did some local testing and it works, I feel good to land this now. @hydai

hydai · 2024-07-31T05:51:47Z

@junaire
Could you add a workflow file to build and do some simple tests on it? So we can ensure the building process and executing behavior are correct.

junaire · 2024-08-01T02:27:12Z

@junaire Could you add a workflow file to build and do some simple tests on it? So we can ensure the building process and executing behavior are correct.

I added a workflow but it didn't run as expected, possibly some permission issues, can you plz check? @hydai

hydai · 2024-08-01T05:53:30Z

Hi @junaire
I just granted the workflows permission, please try again.

junaire · 2024-08-01T05:56:34Z

Hi @junaire I just granted the workflows permission, please try again.

can you approve the workflow?

junaire · 2024-08-01T14:48:16Z

i may need some help here... any ideas about why the windows shit keeps failing? @hydai

hydai · 2024-08-01T16:08:29Z

Hi @junaire
We don't use mingw64 to build wasmedge, so removing such Windows-related workflows for now is fine. I don't want you to spend lots of time supporting multiple platforms. Let's take care of the Linux first.
If you are writing an open-source contribution, please avoid using dirty words in the commit history; they will be kept on this PR just as you can see now.

junaire · 2024-08-05T02:07:43Z

gpu tests failing seem to be unrelated, as they are required to be run on self hosted runners.

CMakeLists.txt

llmc/dataloader.h

hydai · 2024-08-05T05:30:20Z

gpu tests failing seem to be unrelated, as they are required to be run on self hosted runners.

Let's remove these self-hosted workflow jobs for now. Since we are not able to deploy such runners inside this repo.

Signed-off-by: Jun Zhang <[email protected]>

junaire · 2024-08-05T06:17:57Z

gpu tests failing seem to be unrelated, as they are required to be run on self hosted runners.

Let's remove these self-hosted workflow jobs for now. Since we are not able to deploy such runners inside this repo.

Done.

junaire mentioned this pull request Jul 15, 2024

OSPP Workspace: Support llm.c as a WasmEdge plugin WasmEdge/WasmEdge#3523

Closed

6 tasks

junaire force-pushed the jun/wasmedge branch from 54f52e1 to 5e92bd4 Compare July 29, 2024 08:16

junaire force-pushed the jun/wasmedge branch 3 times, most recently from 4016420 to b52413a Compare July 31, 2024 02:29

junaire force-pushed the jun/wasmedge branch from b52413a to 7147edc Compare August 1, 2024 02:24

junaire force-pushed the jun/wasmedge branch from 7147edc to c643bdb Compare August 1, 2024 05:55

junaire force-pushed the jun/wasmedge branch 7 times, most recently from 146d90a to d709e38 Compare August 1, 2024 14:37

junaire force-pushed the jun/wasmedge branch 2 times, most recently from 33c932f to 399e56c Compare August 1, 2024 15:02

junaire force-pushed the jun/wasmedge branch from e657ea5 to bdb57c2 Compare August 2, 2024 02:09

hydai requested changes Aug 5, 2024

View reviewed changes

CMakeLists.txt Show resolved Hide resolved

CMakeLists.txt Show resolved Hide resolved

CMakeLists.txt Show resolved Hide resolved

CMakeLists.txt Show resolved Hide resolved

llmc/dataloader.h Outdated Show resolved Hide resolved

[refactor] Make llm.c modular

bc8da64

Signed-off-by: Jun Zhang <[email protected]>

junaire force-pushed the jun/wasmedge branch from bdb57c2 to bc8da64 Compare August 5, 2024 06:17

hydai approved these changes Aug 6, 2024

View reviewed changes

hydai merged commit ddb5b4f into WasmEdge:master Aug 6, 2024
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[refactor] Make llm.c modular #1

[refactor] Make llm.c modular #1

junaire commented Jul 15, 2024 •

edited

Loading

junaire commented Jul 22, 2024

hydai commented Jul 22, 2024

junaire commented Jul 29, 2024

hydai commented Jul 30, 2024

junaire commented Jul 31, 2024

hydai commented Jul 31, 2024

junaire commented Aug 1, 2024

hydai commented Aug 1, 2024

junaire commented Aug 1, 2024

junaire commented Aug 1, 2024

hydai commented Aug 1, 2024

junaire commented Aug 5, 2024

hydai commented Aug 5, 2024

junaire commented Aug 5, 2024

[refactor] Make llm.c modular #1

[refactor] Make llm.c modular #1

Conversation

junaire commented Jul 15, 2024 • edited Loading

junaire commented Jul 22, 2024

hydai commented Jul 22, 2024

junaire commented Jul 29, 2024

hydai commented Jul 30, 2024

junaire commented Jul 31, 2024

hydai commented Jul 31, 2024

junaire commented Aug 1, 2024

hydai commented Aug 1, 2024

junaire commented Aug 1, 2024

junaire commented Aug 1, 2024

hydai commented Aug 1, 2024

junaire commented Aug 5, 2024

hydai commented Aug 5, 2024

junaire commented Aug 5, 2024

junaire commented Jul 15, 2024 •

edited

Loading