Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redhat OpenShift AI demo #233

Merged
merged 31 commits into from
Dec 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
78f9e57
Added Red Hat folders
kesarwam Oct 31, 2024
632944e
Initial commit
fOO223Fr Nov 1, 2024
764a19f
Merge pull request #229 from fOO223Fr/redhat-openshift-ai-demo-v1.0
kesarwam Nov 1, 2024
40a9505
Update README.md
kesarwam Nov 7, 2024
e9a3744
Update README.md
kesarwam Nov 7, 2024
86874b3
Update lakefs-minio.yaml
kesarwam Nov 7, 2024
cf2e4b8
Update README.md
kesarwam Nov 7, 2024
7849d6f
Update README.md
kesarwam Nov 7, 2024
47be2b6
Updated Readme and added images
fOO223Fr Nov 8, 2024
8876550
updated minio endpoint
fOO223Fr Nov 8, 2024
bfaa873
Update lakefs-minio.yaml
kesarwam Nov 8, 2024
ddd88b1
Update README.md
kesarwam Nov 8, 2024
29fc938
Update README.md
kesarwam Nov 8, 2024
44876ee
Delete README.md
kesarwam Nov 8, 2024
87f34ea
Create README.md
kesarwam Nov 8, 2024
686793f
Merge pull request #230 from fOO223Fr/redhat-openshift-ai-demo-v1.0
kesarwam Nov 8, 2024
edaaf2f
Automatedmin import lakefs repo creation
fOO223Fr Nov 13, 2024
0abba72
Updated Readme and added quickstart repo
fOO223Fr Nov 13, 2024
97a3efe
Update lakefs-minio.yaml
kesarwam Nov 27, 2024
55b6bc8
Update README.md
kesarwam Nov 27, 2024
a081559
Added instructions to access lakeFS UI.
kesarwam Nov 28, 2024
b0deefd
Update README.md
kesarwam Nov 28, 2024
1fe0ed1
Merge pull request #231 from fOO223Fr/redhat-openshift-ai-demo-v1.0
kesarwam Nov 28, 2024
468da5b
Moved Readme to top level
kesarwam Nov 28, 2024
707c20c
Merge branch 'redhat-openshift-ai-demo-v1.0' of https://github.com/tr…
kesarwam Nov 28, 2024
35f7007
Automated minio password import and lakefs repo creation
kesarwam Dec 10, 2024
96ac8b8
Uploaded changed notebooks and Readme
kesarwam Dec 12, 2024
a788b70
Update README.md
kesarwam Dec 13, 2024
2ff713c
No need to create a Data Science project
kesarwam Dec 13, 2024
af3a229
Update 8_distributed_training_lakefs.ipynb
kesarwam Dec 13, 2024
766f635
Updated Readme
kesarwam Dec 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions 00_notebooks/00_index.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@
"* [*Labelbox* integration](https://github.com/treeverse/lakeFS-samples/blob/main/01_standalone_examples/labelbox-integration/)\n",
"* [*Kafka* integration](https://github.com/treeverse/lakeFS-samples/blob/main/01_standalone_examples/kafka/)\n",
"* [*Flink* integration](https://github.com/treeverse/lakeFS-samples/blob/main/01_standalone_examples/flink/)\n",
"* [*Red Hat OpenShift AI* integration](https://github.com/treeverse/lakeFS-samples/blob/main/01_standalone_examples/red-hat-openshift-ai/)\n",
"* [How to **migrate or clone** a repo](https://github.com/treeverse/lakeFS-samples/blob/main/01_standalone_examples/migrate-or-clone-repo/)"
]
},
Expand Down
209 changes: 209 additions & 0 deletions 01_standalone_examples/red-hat-openshift-ai/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,209 @@
# Overview

[lakeFS](https://lakefs.io/) is a data versioning application that brings git-like versioning to object storage. It can interface with many object storage applications on the backend, and provide a S3 API gateway for object storage clients to connect to. In this demo, we'll configure OpenShift AI to connect over S3 interace to lakeFS, which will version the data in a backend [MinIO](https://min.io/docs/minio/kubernetes/openshift/index.html) instance.

![lakefs](img/lakefsv3.png)

# lakeFS with OpenShift AI Demo

The following steps should be followed to perform the [Fraud Detection demo](https://docs.redhat.com/en/documentation/red_hat_openshift_ai_self-managed/2-latest/html/openshift_ai_tutorial_-_fraud_detection_example/index) on OpenShift AI, with lakeFS used for object storage management.

## Prerequisites

1. Bring up [OpenShift cluster](https://docs.redhat.com/en/documentation/openshift_container_platform/4.17#Install)
2. Install [OpenShift Service Mesh](https://docs.openshift.com/container-platform/4.16/service_mesh/v2x/installing-ossm.html#ossm-install-ossm-operator_installing-ossm), [OpenShift Serverless](https://docs.openshift.com/serverless/1.34/install/install-serverless-operator.html) and [OpenShift Pipelines](https://docs.openshift.com/pipelines/1.16/install_config/installing-pipelines.html) on the OpenShift cluster
3. Install [OpenShift AI](https://docs.redhat.com/en/documentation/red_hat_openshift_ai_self-managed/2.13/html/installing_and_uninstalling_openshift_ai_self-managed/index) on the OpenShift cluster
4. Install the `oc` OpenShift [CLI client](https://docs.openshift.com/container-platform/4.16/cli_reference/openshift_cli/getting-started-cli.html) on a machine thas access to the cluster

## Deploy and Configure the Environment
From the client machine, authenticate the `oc` client.

```
oc login <cluster_api_url> -u kubeadmin -p <admin_pw>
```

### Create a `lakefs` project in OpenShift.

```
oc new-project lakefs
```

### Clone the lakeFS samples repo
Clone the [lakeFS-samples.git](https://github.com/treeverse/lakeFS-samples.git) repository and change into the newly created directory.

```
git clone https://github.com/treeverse/lakeFS-samples.git

cd lakeFS-samples/01_standalone_examples/red-hat-openshift-ai/cluster-configuration
```

### Deploy MinIO
Deploy MinIO in the `lakefs` project using the `minio-via-lakefs.yaml` file.

```
oc apply -f minio-via-lakefs.yaml
```
A random MinIO root user and password will be generated, stored in a `secret`, and used to populate MinIO with three storage buckets:
* **my-storage**
* **pipeline-artifacts**
* **quickstart**


### Deploy lakeFS
Deploy lakeFS in the **lakefs** project using the `lakefs-minio.yaml` file. This yaml will not only deploy lakefs but also:
* connect it with MinIO buckets created earlier
* create two lakeFS repo:
* **quickstart:** as a sample data repo
* **my-storage** which is connected to backend my-storage s3 bucket created earlier



```
oc apply -f lakefs-minio.yaml
```

### Access lakeFS UI
You can now log into the OpenShift cluster's web console as a regular user (ie. developer). Follow the arrows in the screenshot below to find the lakeFS `route`, which provides external access to the lakeFS administrator. Use the lakeFS route to access the lakeFS UI.

For this demo, you will use the following credentials to access the lakeFS UI.

* **Access Key**: something
* **Secret Access Key**: simple

![lakefs](img/lakefs-route.png)

NOTES:
- You can also follow above steps, but click on MinIO in the topology, to find the `route` to access MinIO's console or S3 interface. MinIO access credentials can be found in the `minio-root-user` secret within the OpenShift web console when logged in as an admin user (ie. kubeadmin).

- Switch to the **Administrator** persona using the drop-down at the top left
- Expand the **Workloads** navigation
- Click on **Secrets**
- Filter for 'minio' name
- Click on the **minio-root-user** secret
- Scroll down and click on **Reveal values** to see the MinIO root user and password

- If you don't see the visual layout as shown in the screenshot, then click on the icon highlighted below to change the view.

![lakefs](img/topology.png)

### Access OpenShift AI Console
From the OpenShift web console, you can now open the OpenShift AI web console as shown below.

![lakefs](img/oai-console.png)

## Fraud Detection Demo

You may now run through the [Fraud Detection demo](https://docs.redhat.com/en/documentation/red_hat_openshift_ai_self-managed/2-latest/html/openshift_ai_tutorial_-_fraud_detection_example/index) in the new **lakefs** data science project. Refer to following notes for the different sections of this demo:

2.2. Setting up your data science project:
* Use the `lakefs` data science project for the demo. You do not need to create a new project.

2.3. Storing data with data connections:
* When going through the demo, follow the steps to manually configure the storage data connections. **Do not** follow steps that use a script to automate the MinIO storage deployment, configuration and data connections.

2.3.1. Creating data connections to your own S3-compatible object storage:
* When creating "My Storage" data connection, use lakeFS access key ("something"), secret key ("simple"), endpoint ("http://my-lakefs"), region ("us-east-1") and bucket ("my-storage") instead of MinIO access key and endpoint:

![My Storage data connection](img/data-connection-my-storage.png)

* When creating "Pipeline Artifacts" data connection, use MinIO access key, secret key, endpoint (the route to access MinIO's S3 interface), region ("us-east-1") and bucket ("pipeline-artifacts"):

![Pipeline Artifacts data connection](img/data-connection-pipeline-artifacts.png)

3.1. Creating a workbench and selecting a notebook image:
* While creating Workbench add environment variables to access lakeFS:
* LAKECTL_SERVER_ENDPOINT_URL = http://my-lakefs
* LAKEFS_REPO_NAME = my-storage
* LAKEFS_DEFAULT_REGION =us-east-1
* LAKECTL_CREDENTIALS_ACCESS_KEY_ID = something
* LAKECTL_CREDENTIALS_SECRET_ACCESS_KEY = simple

![Workbench lakeFS Environment Variables](img/workbench-lakefs-env-variables.png)

3.2. Importing the tutorial files into the Jupyter environment:
* After cloning and selecting latest branch for the Fraud Detection tutorial repository (https://github.com/rh-aiservices-bu/fraud-detection.git), double-click the newly-created `fraud-detection` folder in the file browser and click on "Upload Files" icon:

![Fraud Detection Tutorial fraud-detection folder](img/fraud-detection-tutorial-image1.png)

* Select and upload tutorial notebooks changed for the lakeFS tutorial (ending with lakeFS) which are saved in `lakeFS-samples/red-hat-openshift-ai/fraud-detection` folder of `lakeFS-samples` repo (https://github.com/treeverse/lakeFS-samples.git):

![Fraud Detection Tutorial upload lakeFS Notebooks](img/fraud-detection-tutorial-image2.png)

* Double-click the `ray-scripts` subfolder inside `fraud-detection` folder in the file browser and click on "Upload Files" icon:

![Fraud Detection Tutorial ray-scripts subfolder](img/fraud-detection-tutorial-image3.png)

* Select and upload `train_tf_cpu_lakefs.py` changed for the lakeFS tutorial which is saved in `lakeFS-samples/red-hat-openshift-ai/fraud-detection/ray-scripts` folder of `lakeFS-samples` repo:

![Fraud Detection Tutorial upload ray script](img/fraud-detection-tutorial-image4.png)

* After uploading `train_tf_cpu_lakefs.py` file, file browser will show two Python programs:

![Fraud Detection Tutorial ray-scripts subfolder after uploading script](img/fraud-detection-tutorial-image5.png)

* Double-click the `pipeline` subfolder inside `fraud-detection` folder in the file browser and click on "Upload Files" icon:

![Fraud Detection Tutorial pipeline subfolder](img/fraud-detection-tutorial-image11.png)

* Select and upload `7_get_data_train_upload_lakefs.py` and `build_lakefs.sh` changed for the lakeFS tutorial which is saved in `lakeFS-samples/red-hat-openshift-ai/fraud-detection/pipeline` folder of `lakeFS-samples` repo:

![Fraud Detection Tutorial upload pipeline](img/fraud-detection-tutorial-image12.png)

3.4. Training a model:
* In your notebook environment, open the `1_experiment_train_lakefs.ipynb` file instead of `1_experiment_train.ipynb` and follow the instructions directly in the notebook. The instructions guide you through some simple data exploration, experimentation, and model training tasks.

4.1. Preparing a model for deployment:
* In your notebook environment, open the `2_save_model_lakefs.ipynb` file instead of `2_save_model.ipynb` and follow the instructions directly in the notebook.

4.2. Deploying a model:
* Use the lakeFS branch name in the path that leads to the version folder that contains your model file: `train01/models/fraud`:

![Fraud Detection Tutorial Deploy Model](img/fraud-detection-tutorial-image6.png)

4.3. Testing the model API:
* In your notebook environment, open the `3_rest_requests_multi_model_lakefs.ipynb` file instead of `3_rest_requests_multi_model.ipynb` and follow the instructions directly in the notebook.
* In your notebook environment, open the `4_grpc_requests_multi_model_lakefs.ipynb` file instead of `4_grpc_requests_multi_model.ipynb` and follow the instructions directly in the notebook.
* In your notebook environment, open the `5_rest_requests_single_model_lakefs.ipynb` file instead of `5_rest_requests_single_model.ipynb` and follow the instructions directly in the notebook.

5.1. Automating workflows with data science pipelines:
* Instead of creating Red Hat OpenShift AI pipeline from stratch, you can run already created pipeline called `6 Train Save lakefs.pipeline`. In your notebook environment, open `6 Train Save lakefs.pipeline` and click the play button in the toolbar of the pipeline editor to run the pipeline. If you want to create the pipeline from stratch then follow the tutorial instructions but make following changes in section 5.1.5:

5.1.5. Configure the data connection to the S3 storage bucket:
* Under Kubernetes Secrets, use the secret name for `pipeline-artifacts` data connection for the following environment variables in **both nodes** of the pipeline:
* AWS_ACCESS_KEY_ID
* AWS_SECRET_ACCESS_KEY
* AWS_S3_ENDPOINT
* AWS_DEFAULT_REGION
* AWS_S3_BUCKET

![Fraud Detection Tutorial Pipeline Kubernetes Secrets 1](img/fraud-detection-tutorial-image7.png)

![Fraud Detection Tutorial Pipeline Kubernetes Secrets 1](img/fraud-detection-tutorial-image8.png)

* Under Kubernetes Secrets, use the secret name for `my-storage` data connection when adding following lakeFS environment variables in **both nodes** of the pipeline:
* LAKECTL_SERVER_ENDPOINT_URL = AWS_S3_ENDPOINT
* LAKECTL_CREDENTIALS_ACCESS_KEY_ID = AWS_ACCESS_KEY_ID
* LAKECTL_CREDENTIALS_SECRET_ACCESS_KEY = AWS_SECRET_ACCESS_KEY
* LAKEFS_REPO_NAME = AWS_S3_BUCKET
* LAKEFS_DEFAULT_REGION =AWS_DEFAULT_REGION

![Fraud Detection Tutorial Pipeline Kubernetes Secrets 1](img/fraud-detection-tutorial-image9.png)

![Fraud Detection Tutorial Pipeline Kubernetes Secrets 1](img/fraud-detection-tutorial-image10.png)

5.2. Running a data science pipeline generated from Python code:
* Use `7_get_data_train_upload_lakefs.yaml` instead of `7_get_data_train_upload.yaml` when importing pipeline in OpenShift AI.

6.1. Distributing training jobs with Ray:
* In your notebook environment, open the `8_distributed_training_lakefs.ipynb` file instead of `8_distributed_training.ipynb`. Change MinIO Access and Secret keys in the 2nd code cell of the notebook and run the notebook.

Optionally, if you want to view the Python code for this section, you can find it in the ray-scripts/train_tf_cpu_lakefs.py file.

See [lakeFS documentation](https://docs.lakefs.io/) and [MinIO documentation for OpenShift](https://min.io/docs/minio/kubernetes/openshift/index.html) for details.

# File Descriptions

- [lakefs-local.yaml](./lakefs-local.yaml): Bring up lakeFS using local object storage. This would be useful for a quick demo where MinIO is not included.
- [lakefs-minio.yaml](./lakefs-minio.yaml): Bring up lakeFS configured to use MinIO as backend object storage. This will be used in the lakeFS demo.
- [minio-direct.yaml](./minio-direct.yaml): This file would only be used if lakeFS is not in the picture and OpenShift AI will communicate directly with MinIO. It will bring up MinIO as it is in the default Fraud Detection demo, complete with configuring MinIO storage buckets and the OpenShift AI data connections. It may serve useful in debugging an issue.
- [minio-via-lakefs.yaml](./minio-via-lakefs.yaml): Bring up MinIO for the modified Fraud Detection demo that includes lakeFS, complete with configuring MinIO storage buckets, but do NOT configure the OpenShift AI data connections. This will be used in the lakeFS demo.
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
---
kind: ConfigMap
apiVersion: v1
metadata:
name: my-lakefs
namespace: lakefs
labels:
app.kubernetes.io/managed-by: Helm
annotations:
meta.helm.sh/release-name: my-lakefs
meta.helm.sh/release-namespace: lakefs
data:
config.yaml: |
database:
type: local
blockstore:
type: local
---
kind: Deployment
apiVersion: apps/v1
metadata:
annotations:
deployment.kubernetes.io/revision: '2'
meta.helm.sh/release-name: my-lakefs
meta.helm.sh/release-namespace: lakefs
resourceVersion: '102204'
name: my-lakefs
namespace: lakefs
labels:
app: lakefs
app.kubernetes.io/instance: my-lakefs
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: lakefs
app.kubernetes.io/version: 1.38.0
helm.sh/chart: lakefs-1.3.14
spec:
replicas: 1
selector:
matchLabels:
app: lakefs
app.kubernetes.io/instance: my-lakefs
app.kubernetes.io/name: lakefs
template:
metadata:
labels:
app: lakefs
app.kubernetes.io/instance: my-lakefs
app.kubernetes.io/name: lakefs
annotations:
checksum/config: 2dde95d5a2b50bddc89371d1692db1005db9407701085531ea77ce14b56c6ec1
spec:
restartPolicy: Always
serviceAccountName: default
schedulerName: default-scheduler
terminationGracePeriodSeconds: 30
securityContext: {}
containers:
- resources: {}
readinessProbe:
httpGet:
path: /_health
port: http
scheme: HTTP
timeoutSeconds: 1
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
terminationMessagePath: /dev/termination-log
name: lakefs
livenessProbe:
httpGet:
path: /_health
port: http
scheme: HTTP
timeoutSeconds: 1
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
env:
- name: LAKEFS_AUTH_ENCRYPT_SECRET_KEY
value: asdjfhjaskdhuioaweyuiorasdsjbaskcbkj
ports:
- name: http
containerPort: 8000
protocol: TCP
imagePullPolicy: IfNotPresent
volumeMounts:
- name: config-volume
mountPath: /etc/lakefs
- name: lakefs-volume
mountPath: /lakefs
terminationMessagePolicy: File
image: 'treeverse/lakefs:1.38.0'
args:
- run
- '--config'
- /etc/lakefs/config.yaml
serviceAccount: default
volumes:
- name: config-volume
configMap:
name: my-lakefs
items:
- key: config.yaml
path: config.yaml
defaultMode: 420
- name: lakefs-volume
emptyDir:
sizeLimit: 100Mi
dnsPolicy: ClusterFirst
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 25%
maxSurge: 25%
revisionHistoryLimit: 10
progressDeadlineSeconds: 600
---
kind: Service
apiVersion: v1
metadata:
name: my-lakefs
namespace: lakefs
labels:
app: lakefs
app.kubernetes.io/instance: my-lakefs
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: lakefs
app.kubernetes.io/version: 1.38.0
helm.sh/chart: lakefs-1.3.14
annotations:
meta.helm.sh/release-name: my-lakefs
meta.helm.sh/release-namespace: lakefs
spec:
ipFamilies:
- IPv4
ports:
- name: http
protocol: TCP
port: 80
targetPort: http
internalTrafficPolicy: Cluster
type: ClusterIP
ipFamilyPolicy: SingleStack
sessionAffinity: None
selector:
app: lakefs
app.kubernetes.io/instance: my-lakefs
app.kubernetes.io/name: lakefs
---
kind: Route
apiVersion: route.openshift.io/v1
metadata:
name: lakefs-route
namespace: lakefs
labels:
app: lakefs
app.kubernetes.io/instance: my-lakefs
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: lakefs
app.kubernetes.io/version: 1.38.0
helm.sh/chart: lakefs-1.3.14
annotations:
openshift.io/host.generated: 'true'
spec:
host: lakefs-route-lakefs.apps-crc.testing
to:
kind: Service
name: my-lakefs
weight: 100
port:
targetPort: http
wildcardPolicy: None
Loading