Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add qa model and new settings in ml-commons #6749

Merged
merged 7 commits into from
Mar 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
144 changes: 143 additions & 1 deletion _ml-commons-plugin/cluster-settings.md
Original file line number Diff line number Diff line change
Expand Up @@ -239,6 +239,33 @@ plugins.ml_commons.native_memory_threshold: 90
- Default value: 90
- Value range: [0, 100]

## Set JVM heap memory threshold

Sets a circuit breaker that checks JVM heap memory usage before running an ML task. If the heap usage exceeds the threshold, OpenSearch triggers a circuit breaker and throws an exception to maintain optimal performance.

Values are based on the percentage of JVM heap memory available. When set to `0`, no ML tasks will run. When set to `100`, the circuit breaker closes and no threshold exists.

### Setting

```
plugins.ml_commons.jvm_heap_memory_threshold: 85
```

### Values

- Default value: 85
- Value range: [0, 100]

## Exclude node names

Use this setting to specify the names of nodes on which you don't want to run ML tasks. The value should be a valid node name or a comma-separated node name list.

### Setting

```
plugins.ml_commons.exclude_nodes._name: node1, node2
```

## Allow custom deployment plans

When enabled, this setting grants users the ability to deploy models to specific ML nodes according to that user's permissions.
Expand All @@ -254,6 +281,21 @@ plugins.ml_commons.allow_custom_deployment_plan: false
- Default value: false
- Valid values: `false`, `true`

## Enable auto deploy

This setting is applicable when you send a prediction request for an externally hosted model that has not been deployed. When set to `true`, this setting automatically deploys the model to the cluster if the model has not been deployed already.

### Setting

```
plugins.ml_commons.model_auto_deploy.enable: false
```

### Values

- Default value: `true`
- Valid values: `false`, `true`

## Enable auto redeploy

This setting automatically redeploys deployed or partially deployed models upon cluster failure. If all ML nodes inside a cluster crash, the model switches to the `DEPLOYED_FAILED` state, and the model must be deployed manually.
Expand Down Expand Up @@ -326,10 +368,110 @@ plugins.ml_commons.connector_access_control_enabled: true

### Values

- Default value: false
- Default value: `false`
- Valid values: `false`, `true`

## Enable a local model

This setting allows a cluster admin to enable running local models on the cluster. When this setting is `false`, users will not be able to run register, deploy, or predict operations on any local model.

### Setting

```
plugins.ml_commons.local_model.enabled: true
```

### Values

- Default value: `true`
- Valid values: `false`, `true`

## Node roles that can run externally hosted models

This setting allows a cluster admin to control the types of nodes on which externally hosted models can run.

### Setting

```
plugins.ml_commons.task_dispatcher.eligible_node_role.remote_model: ["ml"]
```

### Values

- Default value: `["data", "ml"]`, which allows externally hosted models to run on data nodes and ML nodes.


## Node roles that can run local models

This setting allows a cluster admin to control the types of nodes on which local models can run. The `plugins.ml_commons.only_run_on_ml_node` setting only allows the model to run on ML nodes. For a local model, if `plugins.ml_commons.only_run_on_ml_node` is set to `true`, then the model will always run on ML nodes. If `plugins.ml_commons.only_run_on_ml_node` is set to `false`, then the model will run on nodes defined in the `plugins.ml_commons.task_dispatcher.eligible_node_role.local_model` setting.

### Setting

```
plugins.ml_commons.task_dispatcher.eligible_node_role.remote_model: ["ml"]
```

### Values

- Default value: `["data", "ml"]`

## Enable remote inference

This setting allows a cluster admin to enable remote inference on the cluster. If this setting is `false`, users will not be able to run register, deploy, or predict operations on any externally hosted model or create a connector for remote inference.

### Setting

```
plugins.ml_commons.remote_inference.enabled: true
```

### Values

- Default value: `true`
- Valid values: `false`, `true`

## Enable agent framework

When set to `true`, this setting enables the agent framework (including agents and tools) on the cluster and allows users to run register, execute, delete, get, and search operations on an agent.

### Setting

```
plugins.ml_commons.agent_framework_enabled: true
```

### Values

- Default value: `true`
- Valid values: `false`, `true`

## Enable memory

When set to `true`, this setting enables conversational memory, which stores all messages from a conversation for conversational search.

### Setting

```
plugins.ml_commons.memory_feature_enabled: true
```

### Values

- Default value: `true`
- Valid values: `false`, `true`


## Enable RAG pipeline

When set to `true`, this setting enables the search processors for retrieval-augmented generation (RAG). RAG enhances query results by generating responses using relevant information from memory and previous conversations.

### Setting

```
plugins.ml_commons.agent_framework_enabled: true
```

### Values

- Default value: `true`
- Valid values: `false`, `true`
63 changes: 61 additions & 2 deletions _ml-commons-plugin/custom-local-models.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,14 @@ As of OpenSearch 2.11, OpenSearch supports local sparse encoding models.

As of OpenSearch 2.12, OpenSearch supports local cross-encoder models.

As of OpenSearch 2.13, OpenSearch supports local question answering models.

Running local models on the CentOS 7 operating system is not supported. Moreover, not all local models can run on all hardware and operating systems.
{: .important}

## Preparing a model

For both text embedding and sparse encoding models, you must provide a tokenizer JSON file within the model zip file.
For all the models, you must provide a tokenizer JSON file within the model zip file.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ZIP file?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think zip file should be capitalized.


For sparse encoding models, make sure your output format is `{"output":<sparse_vector>}` so that ML Commons can post-process the sparse vector.

Expand Down Expand Up @@ -157,7 +159,7 @@ POST /_plugins/_ml/models/_register
```
{% include copy.html %}

For descriptions of Register API parameters, see [Register a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/register-model/). The `model_task_type` corresponds to the model type. For text embedding models, set this parameter to `TEXT_EMBEDDING`. For sparse encoding models, set this parameter to `SPARSE_ENCODING` or `SPARSE_TOKENIZE`. For cross-encoder models, set this parameter to `TEXT_SIMILARITY`.
For descriptions of Register API parameters, see [Register a model]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/register-model/). The `model_task_type` corresponds to the model type. For text embedding models, set this parameter to `TEXT_EMBEDDING`. For sparse encoding models, set this parameter to `SPARSE_ENCODING` or `SPARSE_TOKENIZE`. For cross-encoder models, set this parameter to `TEXT_SIMILARITY`. For question answering models, set this parameter to `QUESTION_ANSWERING`.

OpenSearch returns the task ID of the register operation:

Expand Down Expand Up @@ -321,3 +323,60 @@ The response contains the tokens and weights:
## Step 5: Use the model for search

To learn how to use the model for vector search, see [Using an ML model for neural search]({{site.url}}{{site.baseurl}}/search-plugins/neural-search/#using-an-ml-model-for-neural-search).

## Question answering models

A question answering model extracts the answer to a question from a given context. ML Commons supports context in `text` format.

To register a question answering model, send a request in the following format. Specify the `function_name` as `QUESTION_ANSWERING`:

```json
POST /_plugins/_ml/models/_register
{
"name": "question_answering",
"version": "1.0.0",
"function_name": "QUESTION_ANSWERING",
"description": "test model",
"model_format": "TORCH_SCRIPT",
"model_group_id": "lN4AP40BKolAMNtR4KJ5",
"model_content_hash_value": "e837c8fc05fd58a6e2e8383b319257f9c3859dfb3edc89b26badfaf8a4405ff6",
"model_config": {
"model_type": "bert",
"framework_type": "huggingface_transformers"
},
"url": "https://github.com/opensearch-project/ml-commons/blob/main/ml-algorithms/src/test/resources/org/opensearch/ml/engine/algorithms/question_answering/question_answering_pt.zip?raw=true"
}
```
{% include copy-curl.html %}

Then send a request to deploy the model:

```json
POST _plugins/_ml/models/<model_id>/_deploy
```
{% include copy-curl.html %}

To test a question answering model, send the following request. It requires a `question` and the relevant `context` from which the answer will be generated:

```json
POST /_plugins/_ml/_predict/question_answering/<model_id>
{
"question": "Where do I live?"
"context": "My name is John. I live in New York"
}
```
{% include copy-curl.html %}

The response provides the answer based on the context:

```json
{
"inference_results": [
{
"output": [
{
"result": "New York"
}
}
}
```
Loading