Update documentation to reflect CPU-only execution mode (#1924)

* Documents writing a stage that supports CPU execution mode * Updates `docs/source/developer_guide/contributing.md` cleaning up build and troubleshooting sections. Requires PRs #1851 & #1906 to be merged first Closes [#1737](#1737) ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/nv-morpheus/Morpheus/blob/main/docs/source/developer_guide/contributing.md). - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. Authors: - David Gardner (https://github.com/dagardner-nv) - Yuchen Zhang (https://github.com/yczhang-nv) Approvers: - Michael Demoret (https://github.com/mdemoret-nv) URL: #1924
nv-morpheus · Oct 18, 2024 · 47841d6 · 47841d6
1 parent 85d5ad4
commit 47841d6
Show file tree

Hide file tree

Showing 31 changed files with 426 additions and 169 deletions.
diff --git a/conda/environments/examples_cuda-125_arch-x86_64.yaml b/conda/environments/examples_cuda-125_arch-x86_64.yaml
@@ -42,6 +42,7 @@ dependencies:
 - pip
 - pluggy=1.3
 - pydantic
+- pynvml=11.4
 - pypdf=3.17.4
 - pypdfium2=4.30
 - python-confluent-kafka>=1.9.2,<1.10.0a0

diff --git a/dependencies.yaml b/dependencies.yaml
@@ -150,6 +150,7 @@ files:
       arch: [x86_64]
     includes:
       - cve-mitigation
+      - example-abp-nvsmi
       - example-dfp-prod
       - example-gnn
       - example-llms
@@ -442,6 +443,12 @@ dependencies:
             - dgl==2.0.0
             - dglgo
 
+  example-abp-nvsmi:
+    common:
+      - output_types: [conda]
+        packages:
+          - pynvml=11.4
+
   example-llms:
     common:
       - output_types: [conda]

diff --git a/docs/README.md b/docs/README.md
@@ -17,18 +17,10 @@
 
 # Building Documentation
 
-Additional packages required for building the documentation are defined in `./conda_docs.yml`.
-
-## Install Additional Dependencies
-From the root of the Morpheus repo:
-```bash
-conda env update --solver=libmamba -n morpheus --file conda/environments/dev_cuda-125_arch-x86_64.yaml --prune
-```
-
 ## Build Morpheus and Documentation
 ```
 CMAKE_CONFIGURE_EXTRA_ARGS="-DMORPHEUS_BUILD_DOCS=ON" ./scripts/compile.sh --target morpheus_docs
 ```
 Outputs to `build/docs/html`
-  
+
 If the documentation build is unsuccessful, refer to the **Out of Date Build Cache** section in [Troubleshooting](./source/extra_info/troubleshooting.md) to troubleshoot.
diff --git a/docs/source/basics/building_a_pipeline.md b/docs/source/basics/building_a_pipeline.md
@@ -107,7 +107,7 @@ morpheus --log_level=DEBUG run pipeline-other \
 
 Then the following error displays:
 ```
-RuntimeError: The to-file stage cannot handle input of <class 'morpheus._lib.messages.ControlMessage'>. Accepted input types: (<class 'morpheus.messages.message_meta.MessageMeta'>,)
+RuntimeError: The to-file stage cannot handle input of <class 'morpheus.messages.control_message.ControlMessage'>. Accepted input types: (<class 'morpheus.messages.message_meta.MessageMeta'>,)
 ```
 
 This indicates that the ``to-file`` stage cannot accept the input type of `morpheus.messages.ControlMessage`. This is because the ``to-file`` stage has no idea how to write that class to a file; it only knows how to write instances of `morpheus.messages.message_meta.MessageMeta`. To ensure you have a valid pipeline, examine the `Accepted input types: (<class 'morpheus.messages.message_meta.MessageMeta'>,)` portion of the message. This indicates you need a stage that converts from the output type of the `deserialize` stage, `ControlMessage`, to `MessageMeta`, which is exactly what the `serialize` stage does.
@@ -207,7 +207,7 @@ This example shows an NLP Pipeline which uses several stages available in Morphe
 #### Launching Triton
 Run the following to launch Triton and load the `sid-minibert` model:
 ```bash
-docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 nvcr.io/nvidia/morpheus/morpheus-tritonserver-models:24.10 --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model sid-minibert-onnx
+docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 nvcr.io/nvidia/morpheus/morpheus-tritonserver-models:24.10 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model sid-minibert-onnx
 ```
 
 #### Launching Kafka
@@ -216,15 +216,15 @@ Follow steps 1-8 in [Quick Launch Kafka Cluster](../developer_guide/contributing
 ![../img/nlp_kitchen_sink.png](../img/nlp_kitchen_sink.png)
 
 ```bash
-morpheus  --log_level=INFO run --num_threads=8 --pipeline_batch_size=1024 --model_max_batch_size=32 \
+morpheus  --log_level=INFO run  --pipeline_batch_size=1024 --model_max_batch_size=32 \
    pipeline-nlp --viz_file=.tmp/nlp_kitchen_sink.png  \
    from-file --filename examples/data/pcap_dump.jsonlines \
    deserialize \
    preprocess \
-   inf-triton --model_name=sid-minibert-onnx --server_url=localhost:8001 \
+   inf-triton --model_name=sid-minibert-onnx --server_url=localhost:8000 \
    monitor --description "Inference Rate" --smoothing=0.001 --unit "inf" \
    add-class \
-   filter --threshold=0.8 \
+   filter --filter_source=TENSOR --threshold=0.8 \
    serialize --include 'timestamp' --exclude '^_ts_' \
    to-kafka --bootstrap_servers localhost:9092 --output_topic "inference_output" \
    monitor --description "ToKafka Rate" --smoothing=0.001 --unit "msg"

diff --git a/docs/source/basics/overview.rst b/docs/source/basics/overview.rst
@@ -39,16 +39,22 @@ run:
    $ morpheus run --help
    Usage: morpheus run [OPTIONS] COMMAND [ARGS]...
 
+   Run subcommand, used for running a pipeline
+
    Options:
-   --num_threads INTEGER RANGE     Number of internal pipeline threads to use  [default: 12; x>=1]
+   --num_threads INTEGER RANGE     Number of internal pipeline threads to use  [default: 64; x>=1]
    --pipeline_batch_size INTEGER RANGE
                                     Internal batch size for the pipeline. Can be much larger than the model batch size. Also used for Kafka consumers  [default: 256; x>=1]
    --model_max_batch_size INTEGER RANGE
                                     Max batch size to use for the model  [default: 8; x>=1]
    --edge_buffer_size INTEGER RANGE
-                                    The size of buffered channels to use between nodes in a pipeline. Larger values reduce backpressure at the cost of memory. Smaller values will push
-                                    messages through the pipeline quicker. Must be greater than 1 and a power of 2 (i.e. 2, 4, 8, 16, etc.)  [default: 128; x>=2]
-   --use_cpp BOOLEAN               Whether or not to use C++ node and message types or to prefer python. Only use as a last resort if bugs are encountered  [default: True]
+                                    The size of buffered channels to use between nodes in a pipeline. Larger values reduce backpressure at the cost of memory. Smaller
+                                    values will push messages through the pipeline quicker. Must be greater than 1 and a power of 2 (i.e. 2, 4, 8, 16, etc.)  [default:
+                                    128; x>=2]
+   --use_cpp BOOLEAN               [Deprecated] Whether or not to use C++ node and message types or to prefer python. Only use as a last resort if bugs are encountered.
+                                    Cannot be used with --use_cpu_only  [default: True]
+   --use_cpu_only                  Whether or not to run in CPU only mode, setting this to True will disable C++ mode. Cannot be used with --use_cpp
+   --manual_seed INTEGER RANGE     Manually seed the random number generators used by Morpheus, useful for testing.  [x>=1]
    --help                          Show this message and exit.
 
    Commands:
@@ -57,6 +63,7 @@ run:
    pipeline-nlp    Run the inference pipeline with a NLP model
    pipeline-other  Run a custom inference pipeline without a specific model type
 
+
 Currently, Morpheus pipeline can be operated in four different modes.
 
  * ``pipeline-ae``

diff --git a/docs/source/cloud_deployment_guide.md b/docs/source/cloud_deployment_guide.md
@@ -434,11 +434,9 @@ Inference and training based on a user ID (`user123`). The model is trained once
 ```bash
 helm install --set ngc.apiKey="$API_KEY" \
     --set sdk.args="morpheus --log_level=DEBUG run \
-      --num_threads=2 \
       --edge_buffer_size=4 \
       --pipeline_batch_size=1024 \
       --model_max_batch_size=1024 \
-      --use_cpp=False \
       pipeline-ae \
         --columns_file=data/columns_ae_cloudtrail.txt \
         --userid_filter=user123 \
@@ -480,11 +478,9 @@ Pipeline example to read data from a file, run inference using a `phishing-bert-
 ```bash
 helm install --set ngc.apiKey="$API_KEY" \
     --set sdk.args="morpheus --log_level=DEBUG run \
-      --num_threads=2 \
       --edge_buffer_size=4 \
       --pipeline_batch_size=1024 \
       --model_max_batch_size=32 \
-      --use_cpp=True \
       pipeline-nlp \
         --model_seq_length=128 \
         --labels_file=data/labels_phishing.txt \
@@ -510,11 +506,9 @@ Pipeline example to read messages from an input Kafka topic, run inference using
 ```bash
 helm install --set ngc.apiKey="$API_KEY" \
     --set sdk.args="morpheus --log_level=DEBUG run \
-      --num_threads=2 \
       --edge_buffer_size=4 \
       --pipeline_batch_size=1024 \
       --model_max_batch_size=32 \
-      --use_cpp=True \
       pipeline-nlp \
         --model_seq_length=128 \
         --labels_file=data/labels_phishing.txt \
@@ -557,9 +551,7 @@ Pipeline example to read data from a file, run inference using a `sid-minibert-o
 ```bash
 helm install --set ngc.apiKey="$API_KEY" \
     --set sdk.args="morpheus --log_level=DEBUG run \
-      --num_threads=3 \
       --edge_buffer_size=4 \
-      --use_cpp=True \
       --pipeline_batch_size=1024 \
       --model_max_batch_size=32 \
       pipeline-nlp \
@@ -586,9 +578,7 @@ Pipeline example to read messages from an input Kafka topic, run inference using
 ```bash
 helm install --set ngc.apiKey="$API_KEY" \
     --set sdk.args="morpheus --log_level=DEBUG run \
-        --num_threads=3 \
         --edge_buffer_size=4 \
-        --use_cpp=True \
         --pipeline_batch_size=1024 \
         --model_max_batch_size=32 \
         pipeline-nlp \
@@ -631,11 +621,9 @@ Pipeline example to read data from a file, run inference using an `abp-nvsmi-xgb
 ```bash
 helm install --set ngc.apiKey="$API_KEY" \
     --set sdk.args="morpheus --log_level=DEBUG run \
-        --num_threads=3 \
         --edge_buffer_size=4 \
         --pipeline_batch_size=1024 \
         --model_max_batch_size=64 \
-        --use_cpp=True \
         pipeline-fil --columns_file=data/columns_fil.txt \
           from-file --filename=./examples/data/nvsmi.jsonlines \
           monitor --description 'FromFile Rate' --smoothing=0.001 \
@@ -657,10 +645,8 @@ Pipeline example to read messages from an input Kafka topic, run inference using
 ```bash
 helm install --set ngc.apiKey="$API_KEY" \
     --set sdk.args="morpheus --log_level=DEBUG run \
-        --num_threads=3 \
         --pipeline_batch_size=1024 \
         --model_max_batch_size=64 \
-        --use_cpp=True \
         pipeline-fil --columns_file=data/columns_fil.txt \
           from-kafka --input_topic <YOUR_INPUT_KAFKA_TOPIC> --bootstrap_servers broker:9092 \
           monitor --description 'FromKafka Rate' --smoothing=0.001 \

diff --git a/docs/source/developer_guide/contributing.md b/docs/source/developer_guide/contributing.md
@@ -153,48 +153,42 @@ This workflow utilizes a Docker container to set up most dependencies ensuring a
 
 If a Conda environment on the host machine is preferred over Docker, it is relatively easy to install the necessary dependencies (In reality, the Docker workflow creates a Conda environment inside the container).
 
-Note: These instructions assume the user is using `mamba` instead of `conda` since its improved solver speed is very helpful when working with a large number of dependencies. If you are not familiar with `mamba` you can install it with `conda install -n base -c conda-forge mamba` (Make sure to only install into the base environment). `mamba` is a drop in replacement for `conda` and all Conda commands are compatible between the two.
-
 #### Prerequisites
 
 - Volta architecture GPU or better
 - [CUDA 12.1](https://developer.nvidia.com/cuda-12-1-0-download-archive)
-- `conda` and `mamba`
-  - If `conda` and `mamba` are not installed, we recommend using the MiniForge install guide which is located [here](https://github.com/conda-forge/miniforge). This will install both `conda` and `mamba` and set the channel default to use `conda-forge`.
+- `conda`
+  - If `conda` is not installed, we recommend using the [MiniForge install guide](https://github.com/conda-forge/miniforge). This will install `conda` and set the channel default to use `conda-forge`.
 
 1. Set up environment variables and clone the repo:
    ```bash
    export MORPHEUS_ROOT=$(pwd)/morpheus
    git clone https://github.com/nv-morpheus/Morpheus.git $MORPHEUS_ROOT
    cd $MORPHEUS_ROOT
    ```
-
-2. Ensure all submodules are checked out:
-
-```bash
-git submodule update --init --recursive
-```
-
+1. Ensure all submodules are checked out:
+   ```bash
+   git submodule update --init --recursive
+   ```
 1. Create the Morpheus Conda environment
    ```bash
    conda env create --solver=libmamba -n morpheus --file conda/environments/dev_cuda-125_arch-x86_64.yaml
    conda activate morpheus
    ```
 
    This creates a new environment named `morpheus`, and activates that environment.
-1. Build Morpheus
+
+   > **Note**: The `dev_cuda-121_arch-x86_64.yaml` Conda environment file specifies all of the dependencies required to build Morpheus and run Morpheus. However many of the examples, and optional packages such as `morpheus_llm` require additional dependencies. Alternately the following command can be used to create the Conda environment:
    ```bash
-   ./scripts/compile.sh
+   conda env create --solver=libmamba -n morpheus --file conda/environments/all_cuda-121_arch-x86_64.yaml
+   conda activate morpheus
    ```
-   This script will run both CMake Configure with default options and CMake build.
-1. Install Morpheus
+1. Build Morpheus
    ```bash
-   pip install -e ${MORPHEUS_ROOT}/python/morpheus
-   pip install -e ${MORPHEUS_ROOT}/python/morpheus_llm
-   pip install -e ${MORPHEUS_ROOT}/python/morpheus_dfp
+   ./scripts/compile.sh
    ```
-   Once Morpheus has been built, it can be installed into the current virtual environment.
-1. Test the build (Note: some tests will be skipped)\
+   This script will build and install Morpheus into the Conda environment.
+1. Test the build (Note: some tests will be skipped)
    Some of the tests will rely on external data sets.
    ```bash
    MORPHEUS_ROOT=${PWD}
@@ -213,15 +207,26 @@ git submodule update --init --recursive
       npm install -g [email protected]
       ```
 
-   Run all tests:
-   ```bash
-   pytest --run_slow
-   ```
-1. Optional: Install cuML
-   - Many users may wish to install cuML. Due to the complex dependency structure and versioning requirements, we need to specify exact versions of each package. The command to accomplish this is:
+   - Run end-to-end (aka slow) tests:
+      ```bash
+      pytest --run_slow
+      ```
+1. Optional: Run Kafka and Milvus tests
+   - Download Kafka:
       ```bash
-      mamba install -c rapidsai -c nvidia -c conda-forge cuml=23.06
+      python ./ci/scripts/download_kafka.py
       ```
+
+   - Run all tests (this will skip over tests that require optional dependencies which are not installed):
+      ```bash
+      pytest --run_slow --run_kafka --run_milvus
+      ```
+
+   - Run all tests including those that require optional dependencies:
+      ```bash
+      pytest --fail_missing --run_slow --run_kafka --run_milvus
+      ```
+
 1. Run Morpheus
    ```bash
    morpheus run pipeline-nlp ...
@@ -372,6 +377,36 @@ Due to the large number of dependencies, it's common to run into build issues. T
  - Message indicating `git apply ...` failed
    - Many of the dependencies require small patches to make them work. These patches must be applied once and only once. If this error displays, try deleting the offending package from the `build/_deps/<offending_package>` directory or from `.cache/cpm/<offending_package>`.
    - If all else fails, delete the entire `build/` directory and `.cache/` directory.
+ - Older build artifacts when performing an in-place build.
+   - When built with `MORPHEUS_PYTHON_INPLACE_BUILD=ON` compiled libraries will be deployed in-place in the source tree, and older build artifacts exist in the source tree. Remove these with:
+       ```bash
+       find ./python -name "*.so" -delete
+       find ./examples -name "*.so" -delete
+       ```
+ - Issues building documentation
+   - Intermediate documentation build artifacts can cause errors for Sphinx. To remove these, run:
+       ```bash
+       rm -rf build/docs/ docs/source/_modules docs/source/_lib
+       ```
+ - CI Issues
+   - To run CI locally, the `ci/scripts/run_ci_local.sh` script can be used. For example to run a local CI build:
+      ```bash
+      ci/scripts/run_ci_local.sh build
+      ```
+      - Build artifacts resulting from a local CI run can be found in the `.tmp/local_ci_tmp/` directory.
+   - To troubleshoot a particular CI stage it can be helpful to run:
+      ```bash
+      ci/scripts/run_ci_local.sh bash
+      ```
+
+      This will open a bash shell inside the CI container with all of the environment variables typically set during a CI run. From here you can run the commands that would typically be run by one of the CI scripts in `ci/scripts/github`.
+
+      To run a CI stage requiring a GPU (ex: `test`), set the `USE_GPU` environment variable to `1`:
+      ```bash
+      USE_GPU=1 ci/scripts/run_ci_local.sh bash
+      ```
+
+Refer to the [troubleshooting guide](../extra_info/troubleshooting.md) for more information on common issues and how to resolve them.
 
 ## Licensing
 Morpheus is licensed under the Apache v2.0 license. All new source files including CMake and other build scripts should contain the Apache v2.0 license header. Any edits to existing source code should update the date range of the copyright to the current year. The format for the license header is:
@@ -401,7 +436,7 @@ Third-party code included in the source tree (that is not pulled in as an extern
 Ex:
 ```
 /**
- * SPDX-FileCopyrightText: Copyright (c) 2018-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-FileCopyrightText: Copyright (c) <year>, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
  * SPDX-License-Identifier: Apache-2.0
  *
  * Licensed under the Apache License, Version 2.0 (the "License");

diff --git a/docs/source/developer_guide/guides/10_modular_pipeline_digital_fingerprinting.md b/docs/source/developer_guide/guides/10_modular_pipeline_digital_fingerprinting.md
@@ -539,7 +539,6 @@ To run the DFP pipelines with the example datasets within the container, run the
     ```bash
     python dfp_integrated_training_batch_pipeline.py \
         --log_level DEBUG \
-        --use_cpp=true \
         --source duo \
         --start_time "2022-08-01" \
         --duration "60d" \
@@ -551,7 +550,6 @@ To run the DFP pipelines with the example datasets within the container, run the
     ```bash
     python dfp_integrated_training_batch_pipeline.py \
         --log_level DEBUG \
-        --use_cpp=true \
         --source duo \
         --start_time "2022-08-30" \
         --input_file "./control_messages/duo_payload_inference.json"
@@ -561,7 +559,6 @@ To run the DFP pipelines with the example datasets within the container, run the
     ```bash
     python dfp_integrated_training_batch_pipeline.py \
         --log_level DEBUG \
-        --use_cpp=true \
         --source duo \
         --start_time "2022-08-01" \
         --duration "60d" \
@@ -573,7 +570,6 @@ To run the DFP pipelines with the example datasets within the container, run the
     ```bash
     python dfp_integrated_training_batch_pipeline.py \
         --log_level DEBUG \
-        --use_cpp=true \
         --source azure \
         --start_time "2022-08-01" \
         --duration "60d" \
@@ -585,7 +581,6 @@ To run the DFP pipelines with the example datasets within the container, run the
     ```bash
     python dfp_integrated_training_batch_pipeline.py \
         --log_level DEBUG \
-        --use_cpp=true \
         --source azure \
         --start_time "2022-08-30" \
         --input_file "./control_messages/azure_payload_inference.json"
@@ -595,7 +590,6 @@ To run the DFP pipelines with the example datasets within the container, run the
     ```bash
     python dfp_integrated_training_batch_pipeline.py \
         --log_level DEBUG \
-        --use_cpp=true \
         --source azure \
         --start_time "2022-08-01" \
         --duration "60d" \