In this example, we utilize NeMo's PEFT methods to showcase how to adapt a large language model (LLM) to a downstream task, such as financial sentiment predictions.
With one line configuration change, you can try different PEFT techniques such as p-tuning, adapters, or LoRA, which add a small number of trainable parameters to the LLM that condition the model to produce the desired output for the downstream task.
For more details, see the PEFT script in NeMo, which we adapt using NVFlare's Lightning client API to run in a federated scenario.
The example was tested with the NeMo 23.10 container.
In the following, we assume this example folder of the container is mounted to /workspace
and all downloading, etc. operations are based on this root path.
Note in the following, mount both the current directory and the job_templates directory to locations inside the docker container. Please make sure you have cloned the full NVFlare repo.
Start the docker container from this directory using
# cd NVFlare/integration/nemo/examples/peft
DOCKER_IMAGE="nvcr.io/nvidia/nemo:23.10"
docker run --runtime=nvidia -it --rm --shm-size=16g -p 8888:8888 -p 6006:6006 --ulimit memlock=-1 --ulimit stack=67108864 \
-v ${PWD}/../../../../job_templates:/job_templates -v ${PWD}:/workspace -w /workspace ${DOCKER_IMAGE}
Next, install NVFlare.
pip install nvflare~=2.5.0rc
We use JupyterLab for this example. To start JupyterLab, run
jupyter lab .
and open peft.ipynb.
This example requires a GPU with at least 24GB memory to run three clients in parallel on the same GPU.