Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Morpheus docs update post compartmentalization #1964

Merged
merged 7 commits into from
Oct 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions ci/vale/styles/config/vocabularies/morpheus/accept.txt
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ LLM(s?)
# https://github.com/logpai/loghub/
Loghub
Milvus
PyPI
[Mm]ixin
MLflow
Morpheus
Expand Down
4 changes: 2 additions & 2 deletions docs/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -30,15 +30,15 @@ add_custom_target(${PROJECT_NAME}_docs
BUILD_DIR=${CMAKE_CURRENT_BINARY_DIR} ${SPHINX_EXECUTABLE} ${SPHINX_HTML_ARGS} ${SPHINX_SOURCE} ${SPHINX_BUILD}
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
COMMENT "Generating documentation with Sphinx"
DEPENDS morpheus-package-outputs morpheus_llm-package-outputs
DEPENDS morpheus-package-outputs morpheus_llm-package-outputs morpheus_dfp-package-outputs
)

add_custom_target(${PROJECT_NAME}_docs_linkcheck
COMMAND
BUILD_DIR=${CMAKE_CURRENT_BINARY_DIR} ${SPHINX_EXECUTABLE} ${SPHINX_LINKCHECK_ARGS} ${SPHINX_SOURCE} ${SPHINX_LINKCHECK_OUT}
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
COMMENT "Checking documentation links with Sphinx"
DEPENDS morpheus-package-outputs morpheus_llm-package-outputs
DEPENDS morpheus-package-outputs morpheus_llm-package-outputs morpheus_dfp-package-outputs
)

list(POP_BACK CMAKE_MESSAGE_CONTEXT)
128 changes: 128 additions & 0 deletions docs/source/conda_packages.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
<!--
SPDX-FileCopyrightText: Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

# Morpheus Conda Packages
The Morpheus stages are the building blocks for creating pipelines. The stages are organized into libraries by use case. The current libraries are:
- `morpheus-core`
- `morpheus-dfp`
- `morpheus-llm`

The libraries are hosted as Conda packages on the [`nvidia`](https://anaconda.org/nvidia/) channel.

The split into multiple libraries allows for a more modular approach to using the Morpheus stages. For example, if you are building an application for Digital Finger Printing, you can install just the `morpheus-dfp` library. This reduces the size of the installed package. It also limits the dependencies eliminating unnecessary version conflicts.


## Morpheus Core
The `morpheus-core` library contains the core stages that are common across all use cases. The Morpheus core library is built from the source code in the `python/morpheus` directory of the Morpheus repository. The core library is installed as a dependency when you install any of the other Morpheus libraries.
To set up a Conda environment with the [`morpheus-core`](https://anaconda.org/nvidia/morpheus-core) library you can run the following commands:
### Create a Conda environment
```bash
export CONDA_ENV_NAME=morpheus
conda create -n ${CONDA_ENV_NAME} python=3.10
conda activate ${CONDA_ENV_NAME}
```
### Add Conda channels
These channel are required for installing the runtime dependencies
```bash
conda config --env --add channels conda-forge &&\
conda config --env --add channels nvidia &&\
conda config --env --add channels rapidsai &&\
conda config --env --add channels pytorch
```
### Install the `morpheus-core` library
```bash
conda install -c nvidia morpheus-core
```
The `morpheus-core` Conda package installs the `morpheus` python package. It also pulls down all the necessary Conda runtime dependencies for the core stages including [`mrc`](https://anaconda.org/nvidia/mrc) and [`libmrc`](https://anaconda.org/nvidia/libmrc).
### Install additional PyPI dependencies
Some of the stages in the core library require additional dependencies that are hosted on PyPI. These dependencies are included as a requirements file in the `morpheus` python package. The requirements files can be located and installed by running the following command:
```bash
MORPHEUS_CORE_PKG_DIR=$(dirname $(python -c "import morpheus; print(morpheus.__file__)"))
pip install -r ${MORPHEUS_CORE_PKG_DIR}/requirements_morpheus_core.txt
```

## Morpheus DFP
Digital Finger Printing (DFP) is a technique used to identify anomalous behavior and uncover potential threats in the environment​. The `morpheus-dfp` library contains stages for DFP. It is built from the source code in the `python/morpheus_dfp` directory of the Morpheus repository. To set up a Conda environment with the [`morpheus-dfp`](https://anaconda.org/nvidia/morpheus-dfp) library you can run the following commands:
### Create a Conda environment
```bash
export CONDA_ENV_NAME=morpheus-dfp
conda create -n ${CONDA_ENV_NAME} python=3.10
conda activate ${CONDA_ENV_NAME}
```
### Add Conda channels
These channel are required for installing the runtime dependencies
```bash
conda config --env --add channels conda-forge &&\
conda config --env --add channels nvidia &&\
conda config --env --add channels rapidsai &&\
conda config --env --add channels pytorch
```
### Install the `morpheus-dfp` library
```bash
conda install -c nvidia morpheus-dfp
```
The `morpheus-dfp` Conda package installs the `morpheus_dfp` python package. It also pulls down all the necessary Conda runtime dependencies including [`morpheus-core`](https://anaconda.org/nvidia/morpheus-core).
### Install additional PyPI dependencies
Some of the DFP stages in the library require additional dependencies that are hosted on PyPI. These dependencies are included as a requirements file in the `morpheus_dfp` python package. And can be installed by running the following command:
```bash
MORPHEUS_DFP_PKG_DIR=$(dirname $(python -c "import morpheus_dfp; print(morpheus_dfp.__file__)"))
pip install -r ${MORPHEUS_DFP_PKG_DIR}/requirements_morpheus_dfp.txt
```

## Morpheus LLM
The `morpheus-llm` library contains stages for Large Language Models (LLM) and Vector Databases. These stages are used for setting up Retrieval Augmented Generation (RAG) pipelines. The `morpheus-llm` library is built from the source code in the `python/morpheus_llm` directory of the Morpheus repository.
AnuradhaKaruppiah marked this conversation as resolved.
Show resolved Hide resolved
To set up a Conda environment with the [`morpheus-llm`](https://anaconda.org/nvidia/morpheus-dfp) library you can run the following commands:
### Create a Conda environment
```bash
export CONDA_ENV_NAME=morpheus-llm
conda create -n ${CONDA_ENV_NAME} python=3.10
conda activate ${CONDA_ENV_NAME}
```
### Add Conda channels
These channel are required for installing the runtime dependencies
```bash
conda config --env --add channels conda-forge &&\
conda config --env --add channels nvidia &&\
conda config --env --add channels rapidsai &&\
conda config --env --add channels pytorch
```
### Install the `morpheus-llm` library
```bash
conda install -c nvidia morpheus-llm
```
The `morpheus-llm` Conda package installs the `morpheus_llm` python package. It also pulls down all the necessary Conda packages including [`morpheus-core`](https://anaconda.org/nvidia/morpheus-core).
### Install additional PyPI dependencies
Some of the stages in the library require additional dependencies that are hosted on PyPI. These dependencies are included as a requirements file in the `morpheus_llm` python package. And can be installed by running the following command:
```bash
MORPHEUS_LLM_PKG_DIR=$(dirname $(python -c "import morpheus_llm; print(morpheus_llm.__file__)"))
pip install -r ${MORPHEUS_LLM_PKG_DIR}/requirements_morpheus_llm.txt
```

## Miscellaneous
### Morpheus Examples
The Morpheus examples are not included in the Morpheus Conda packages. To use them you need to clone the Morpheus repository and run the examples from source. For details refer to the [Morpheus Examples](./examples.md).

### Namespace Updates
If you were using a Morpheus release prior to 24.10 you may need to update the namespace for the DFP, LLM and vector database stages.

A script, `scripts/morpheus_namespace_update.py`, has been provide to help with that and can be run as follows:
```bash
python scripts/morpheus_namespace_update.py --directory <directory> --dfp
```
```bash
python scripts/morpheus_namespace_update.py --directory <directory> --llm
```
7 changes: 7 additions & 0 deletions docs/source/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ limitations under the License.

There are three ways to get started with Morpheus:
- [Using pre-built Docker containers](#using-pre-built-docker-containers)
- [Using the Morpheus Conda packages](#using-morpheus-conda-packages)
- [Building the Morpheus Docker container](#building-the-morpheus-container)
- [Building Morpheus from source](./developer_guide/contributing.md#building-from-source)

Expand Down Expand Up @@ -78,6 +79,12 @@ Once launched, users wishing to launch Triton using the included Morpheus models

Skip ahead to the [Acquiring the Morpheus Models Container](#acquiring-the-morpheus-models-container) section.

## Using Morpheus Conda Packages
The Morpheus stages are available as libraries that are hosted on the [`nvidia`](https://anaconda.org/nvidia) Conda channel. The Morpheus Conda packages are:
[`morpheus-core`](https://anaconda.org/nvidia/morpheus-core), [`morpheus-dfp`](https://anaconda.org/nvidia/morpheus-dfp) and [`morpheus-llm`](https://anaconda.org/nvidia/morpheus-llm)

For details on these libraries and how to use them, refer to the [Morpheus Conda Packages](./conda_packages.md) guide.

## Building the Morpheus Container
### Clone the Repository

Expand Down
2 changes: 2 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ Getting Started
Using Morpheus
^^^^^^^^^^^^^^
* :doc:`getting_started` - Using pre-built Docker containers, building Docker containers from source, and fetching models and datasets
* :doc:`Morpheus Conda Packages <conda_packages>`- Using Morpheus Libraries via the pre-built Conda Packages
* :doc:`basics/overview` - Brief overview of the command line interface
* :doc:`basics/building_a_pipeline` - Introduction to building a pipeline using the command line interface
* :doc:`Morpheus Examples <examples>` - Example pipelines using both the Python API and command line interface
Expand All @@ -76,6 +77,7 @@ Deploying Morpheus
:hidden:

getting_started
conda_packages
basics/overview
basics/building_a_pipeline
models_and_datasets
Expand Down
1 change: 1 addition & 0 deletions docs/source/py_api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,5 @@ Python API
:recursive:

morpheus
morpheus_dfp
morpheus_llm
2 changes: 1 addition & 1 deletion docs/source/stages/morpheus_stages.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ Stages are the building blocks of Morpheus pipelines. Below is a list of the mos

## LLM

- LLM Engine Stage {py:class}`~morpheus.stages.llm.llm_engine_stage.LLMEngineStage` Execute an LLM engine within a Morpheus pipeline.
- LLM Engine Stage {py:class}`~morpheus_llm.stages.llm.llm_engine_stage.LLMEngineStage` Execute an LLM engine within a Morpheus pipeline.

## Output
- HTTP Client Sink Stage {py:class}`~morpheus.stages.output.http_client_sink_stage.HttpClientSinkStage` Write all messages to an HTTP endpoint.
Expand Down
50 changes: 25 additions & 25 deletions python/morpheus_dfp/morpheus_dfp/modules/dfp_deployment.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,8 +50,8 @@ def dfp_deployment(builder: mrc.Builder):
- mlflow_writer_options (dict): Options for the MLflow model writer; Example: See Below
- preprocessing_options (dict): Options for preprocessing the data; Example: See Below
- stream_aggregation_options (dict): Options for aggregating the data by stream; Example: See Below
- timestamp_column_name (str): Name of the timestamp column used in the data; Example: "my_timestamp"; Default:
"timestamp"
- timestamp_column_name (str): Name of the timestamp column used in the data; Example: "my_timestamp";
Default: "timestamp"
- user_splitting_options (dict): Options for splitting the data by user; Example: See Below

Inference Options Parameters:
Expand All @@ -61,18 +61,18 @@ def dfp_deployment(builder: mrc.Builder):
- fallback_username (str): User ID to use if user ID not found; Example: "generic_user"; Default: "generic_user"
- inference_options (dict): Options for the inference module; Example: See Below
- model_name_formatter (str): Format string for the model name; Example: "model_{timestamp}";
Default: `[Required]`
Default: `[Required]`
- num_output_ports (int): Number of output ports for the module; Example: 3
- timestamp_column_name (str): Name of the timestamp column in the input data; Example: "timestamp";
Default: "timestamp"
Default: "timestamp"
- stream_aggregation_options (dict): Options for aggregating the data by stream; Example: See Below
- user_splitting_options (dict): Options for splitting the data by user; Example: See Below
- write_to_file_options (dict): Options for writing the detections to a file; Example: See Below

batching_options:
- end_time (datetime/str): Endtime of the time window; Example: "2023-03-14T23:59:59"; Default: None
- iso_date_regex_pattern (str): Regex pattern for ISO date matching;
Example: "\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}"; Default: <iso_date_regex_pattern>
Example: "\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}"; Default: <iso_date_regex_pattern>
- parser_kwargs (dict): Additional arguments for the parser; Example: {}; Default: {}
- period (str): Time period for grouping files; Example: "1d"; Default: "1d"
- sampling_rate_s (int):: Sampling rate in seconds; Example: 0; Default: None
Expand All @@ -82,43 +82,43 @@ def dfp_deployment(builder: mrc.Builder):
- feature_columns (list): List of feature columns to train on; Example: ["column1", "column2", "column3"]
- epochs (int): Number of epochs to train for; Example: 50
- model_kwargs (dict): Keyword arguments to pass to the model; Example: {"encoder_layers": [64, 32],
"decoder_layers": [32, 64], "activation": "relu", "swap_p": 0.1, "lr": 0.001, "lr_decay": 0.9,
"batch_size": 32, "verbose": 1, "optimizer": "adam", "scalar": "min_max", "min_cats": 10,
"progress_bar": false, "device": "cpu"}
"decoder_layers": [32, 64], "activation": "relu", "swap_p": 0.1, "lr": 0.001, "lr_decay": 0.9,
"batch_size": 32, "verbose": 1, "optimizer": "adam", "scalar": "min_max", "min_cats": 10,
"progress_bar": false, "device": "cpu"}
- validation_size (float): Size of the validation set; Example: 0.1

mlflow_writer_options:
- conda_env (str): Conda environment for the model; Example: `path/to/conda_env.yml`; Default: `[Required]`
- databricks_permissions (dict): Permissions for the model; Example: See Below; Default: None
- experiment_name_formatter (str): Formatter for the experiment name; Example: `experiment_name_{timestamp}`;
Default: `[Required]`
Default: `[Required]`
- model_name_formatter (str): Formatter for the model name; Example: `model_name_{timestamp}`;
Default: `[Required]`
Default: `[Required]`
- timestamp_column_name (str): Name of the timestamp column; Example: `timestamp`; Default: timestamp

stream_aggregation_options:
- cache_mode (str): Mode for managing user cache. Setting to `batch` flushes cache once trigger conditions are
met. Otherwise, continue to aggregate user's history.; Example: 'batch'; Default: 'batch'
- trigger_on_min_history (int): Minimum history to trigger a new training event; Example: 1; Default: 1
- trigger_on_min_increment (int): Minmum increment from the last trained to new training event;
Example: 0; Default: 0
Example: 0; Default: 0
- timestamp_column_name (str): Name of the column containing timestamps; Example: 'timestamp';
Default: 'timestamp'
Default: 'timestamp'
- aggregation_span (str): Lookback timespan for training data in a new training event; Example: '60d';
Default: '60d'
Default: '60d'
- cache_to_disk (bool): Whether to cache streaming data to disk; Example: false; Default: false
- cache_dir (str): Directory to use for caching streaming data; Example: './.cache'; Default: './.cache'

user_splitting_options:
- fallback_username (str): The user ID to use if the user ID is not found; Example: "generic_user";
Default: 'generic_user'
Default: 'generic_user'
- include_generic (bool): Whether to include a generic user ID in the output; Example: false; Default: False
- include_individual (bool): Whether to include individual user IDs in the output; Example: true; Default: False
- only_users (list): List of user IDs to include; others will be excluded; Example: ["user1", "user2", "user3"];
Default: []
Default: []
- skip_users (list): List of user IDs to exclude from the output; Example: ["user4", "user5"]; Default: []
- timestamp_column_name (str): Name of the column containing timestamps; Example: "timestamp";
Default: 'timestamp'
Default: 'timestamp'
- userid_column_name (str): Name of the column containing user IDs; Example: "username"; Default: 'username'

detection_criteria:
Expand All @@ -127,9 +127,9 @@ def dfp_deployment(builder: mrc.Builder):

inference_options:
- model_name_formatter (str): Formatter for model names; Example: "user_{username}_model";
Default: `[Required]`
Default: `[Required]`
- fallback_username (str): Fallback user to use if no model is found for a user; Example: "generic_user";
Default: generic_user
Default: generic_user
- timestamp_column_name (str): Name of the timestamp column; Example: "timestamp"; Default: timestamp

write_to_file_options:
Expand All @@ -141,19 +141,19 @@ def dfp_deployment(builder: mrc.Builder):

monitoring_options:
- description (str): Name to show for this Monitor Stage in the console window; Example: 'Progress';
Default: 'Progress'
Default: 'Progress'
- silence_monitors (bool): Slience the monitors on the console; Example: True; Default: False
- smoothing (float): Smoothing parameter to determine how much the throughput should be averaged.
0 = Instantaneous, 1 = Average.; Example: 0.01; Default: 0.05
0 = Instantaneous, 1 = Average.; Example: 0.01; Default: 0.05
- unit (str): Units to show in the rate value.; Example: 'messages'; Default: 'messages'
- delayed_start (bool): When delayed_start is enabled, the progress bar will not be shown until the first
message is received. Otherwise, the progress bar is shown on pipeline startup and will begin timing
immediately. In large pipelines, this option may be desired to give a more accurate timing;
Example: True; Default: False
message is received. Otherwise, the progress bar is shown on pipeline startup and will begin timing
immediately. In large pipelines, this option may be desired to give a more accurate timing;
Example: True; Default: False
- determine_count_fn_schema (str): Custom function for determining the count in a message. Gets called for
each message. Allows for correct counting of batched and sliced messages.; Example: func_str; Default: None
each message. Allows for correct counting of batched and sliced messages.; Example: func_str; Default: None
- log_level (str): Enable this stage when the configured log level is at `log_level` or lower;
Example: 'DEBUG'; Default: INFO
Example: 'DEBUG'; Default: INFO
"""

# MODULE_INPUT_PORT
Expand Down
4 changes: 2 additions & 2 deletions python/morpheus_dfp/morpheus_dfp/modules/dfp_inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,9 +46,9 @@ def dfp_inference(builder: mrc.Builder):
----------
Configurable parameters:
- model_name_formatter (str): Formatter for model names; Example: "user_{username}_model";
Default: `[Required]`
Default: `[Required]`
- fallback_username (str): Fallback user to use if no model is found for a user; Example: "generic_user";
Default: generic_user
Default: generic_user
- timestamp_column_name (str): Name of the timestamp column; Example: "timestamp"; Default: timestamp
"""

Expand Down
Loading
Loading