Skip to content

Commit

Permalink
Merge pull request #11 from rghosh9/main
Browse files Browse the repository at this point in the history
WMS ID #11685 recreate the folder
  • Loading branch information
klazarz authored Sep 4, 2024
2 parents 79210f3 + 7599fc0 commit 6f1b8e2
Show file tree
Hide file tree
Showing 143 changed files with 1,283 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
# OCI Data Science setup

## Introduction

In this lab, we will setup the OCI Data science environment necessary for developing code, API calls, customization and automation for generation of compliance documents

Estimated Lab Time: -- 10 minutes

### Oracle cloud Data science

OCI Data science is a managed Python based Jupyter lab based notebook development environment for developing and deploying Machine learning and AI models including Generative AI services. The service supports both CPU and GPU infrastructure, has access to OCI lakehouse storage like Object store, Autonomous database, Data flow, Data catalog and other services.

### Objectives

In this lab, you will:

* Deploy a pre-built langchain based conda environment
* Test connectivity to OCI Generative AI services
* Test connectivity with OCI Opensearch services
* Deploy OCI CLI connectivity with OCI Object store
* Download and install required pip libraries
* Install Compliance Document Generation Notebooks

### Prerequisites

This lab assumes you have:

* An Oracle Cloud account with admin privileges in the Chicago region
* A running Data science notebook session environment
* A running OCI Opensearch service

## Task 1: Deploy a pre-built langchain conda environment

1. From the Launcher (File-->New Launcher if needed), click on the Environment explorer to view the list of conda environments
![Install pre-built conda](images/lab3-ds-cnd-1.png)

2. Filter the conda environment to view the ones containing the langchain libraries and select the one marked below
![Install pre-built conda](images/lab3-ds-cnd-2.png)

3. Copy the command command below to run in a terminal session
![Install pre-built conda](images/lab3-ds-cnd-2-1.png)

4. Open up a Terminal session as shown from the Launcher
![Install pre-built conda](images/lab3-ds-cnd-3.png)

5. Paste and run the *odsc conda install -s pytorch21_p39_gpu_v1* command as shown. It may take a few minutes to install the conda environment. Make sure it is successfully completed and installed as shown
![Install pre-built conda](images/lab3-ds-cnd-4.png)

## Task 2: Download and install required pip libraries

1. Locate the notebooks in the /home/datascience/conda directory. This directory will be used for creating and running all notebooks for the workshop
![Install pip libraries](images/lab3-ds-note-1.png)

2. Create a new notebook
![Install pip libraries](images/lab3-ds-note-2.png)

3. Change the kernel to the installed conda environment
![Install pip libraries](images/lab3-ds-note-3.png)

4. Copy and execute to install the pip libraries as shown below in the notebook cell. Press *Shift+Enter* to execute the notebook cell

```text
<copy>
!pip install langchain
!pip install langchain_community
!pip install opensearch-py
!pip install sentence-transformers
!pip install tabulate
!pip install pypdf
!pip install fillpdf
</copy>
```

![Install pip libraries](images/lab3-ds-note-4.png)

NOTE: It is possible that some of the libraries are pre-installed in the environment. Ignore if that is so. You may also have incompatibilities with other libraries in the pre-built conda. You may ignore them if that occurs. Comment them as shown below

![Install pip libraries](images/lab3-ds-note-5.png)

## Task 3: Install Workshop Compliance Document Generation code

1. Download [LAB-3 Conda zip](https://orasenatdpltintegration03.objectstorage.us-chicago-1.oci.customer-oci.com/p/SfhRh7OEvLj9yR0hAIM3BwT7bCpi3jALfP6NqoCODU7mFe51nl1PeBPWcJj2El9K/n/orasenatdpltintegration03/b/clinical-trials/o/conda.zip) and upload to the home directory */home/datascience* in the notebook session as shown below. You can also directly download in your environment using *wget <download link>* as well from a data science terminal session.
![Install lab notebooks](images/lab3-ds-note-6.png)

2. Open up a terminal session and run *unzip conda.zip* as shown below.
![Install lab notebooks](images/lab3-ds-note-7.png)

## Task 4: Test connectivity with OCI Opensearch services

1. Copy the Opensearch API URL from the console
![Test Opensearch Access](images/lab3-ds-os-1.png)

2. Change to *cd /home/datascience/conda/scripts* directory in a data science terminal window and run. Sucecssful connection should display the json as shown below

```text
<copy>
curl -k -u (os_userid):(os_password) (os_api_endpoint):9200
</copy>
```

![Test Opensearch Access](images/lab3-ds-os-2.png)

## Task 5: Configure OCI CLI Connectivity to Object store and Generative AI

1. Get your user OCID and your Tenancy ID from console as shown below
![Test Opensearch Access](images/lab3-ds-cli-1.png)
![Test Opensearch Access](images/lab3-ds-cli-2.png)
![Test Opensearch Access](images/lab3-ds-cli-3.png)

2. Open up a terminal window and enter *oci os ns get*. Enter values as follows

```text
<copy>
Do you want to create a new config file ? Y
Create logging through a browser? n
Location of your config: Enter
Enter user OCID : <copied from console in previous step>
Enter Tenancy OCID : <copied from console in previous step>
Region by index or name : us-chicago-1
Do you want to generate a new RSA key pair? Y
Enter directory for keys created : Enter
Enter name of your key : Enter
Enter passphrase: N/A
Re-enter passphrase : N/A
</copy>
```

![Test Opensearch Access](images/lab3-ds-cli-4.png)
![Test Opensearch Access](images/lab3-ds-cli-5.png)

1. Move and download your generated public key pem file
![Test Opensearch Access](images/lab3-ds-cli-7.png)

2. Upload the downloaded public API key to OCI Console
![Test Opensearch Access](images/lab3-ds-cli-8.png)
![Test Opensearch Access](images/lab3-ds-cli-9.png)
![Test Opensearch Access](images/lab3-ds-cli-10.png)

3. Test out the OCI CLI access after from Data science notebook session.
![Test Opensearch Access](images/lab3-ds-cli-11.png)

## Task 6: Test connectivity to OCI Generative AI services

1. Open up the Generative AI Generation Interface for API code testing. Please note that the *command R* chat interface may not work as of yet and is not required for this workshop. You may test with the cohere chat interface available in the generation interface
![Test Opensearch Access](images/lab3-ds-gai-1.png)

2. Generate a query and click on the *View Code* button and select *python* as the Language
![Test Opensearch Access](images/lab3-ds-gai-2.png)

3. Copy the generated code to a notebook cell. You should be able to generate output as shown below
![Test Opensearch Access](images/lab3-ds-gai-3.png)

## Learn More

* [Generative AI made easy with OCI Datascience](https://www.oracle.com/artificial-intelligence/generative-ai/generative-ai-service/)
* [Data science github repository](https://github.com/oracle-samples/oci-data-science-ai-samples/tree/main/ai-quick-actions)

## Acknowledgements

* **Author** - Rajib Ghosh, Master Principal Cloud Architect, OCI AI and Gen AI Center of Excellence
* **Last Updated By/Date** - Aug 2024
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
rm -Rf ~/conda/data
rm -Rf ~/conda/scripts
rm ~/conda/notebooks/demo*.ipynb
rm -f conda.zip
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# OCI Generative AI Playground and Trial Generation

## Introduction

OCI Generative AI Playground allows you chat, generate and summarize content and also show numerical vector embeddings for textual data. It is REST API enabled for programmatic access and allows you to tune the outputs of your query based on certain parameters. It currently allows Cohere command R+ and Meta Llama model. In this workshop, all clinical trials data is generated with Cohere command R+ playground. Clinical trial datasets are generated for Cancer, diabetes and liver diseases.

Estimated Lab Time: -- 5 minutes

### Objectives

In this lab, you will:

* Learn how to use OCI Generative AI Playground interactively
* Prompt and generate a few clinical trial documents yourself
* Verify the generated document has no personal information
* Prompt to try out some summarization examples on your text
* View generated API code (Python) code

### Prerequisites (Optional)

This lab assumes you have:

* An Oracle Cloud account in the Chicago region
* You have completed the required policy setup for this workshop
* You are part of the administrator group in the tenancy

## Task 1: Accessing OCI Generative AI Playground

In this section get familiar with the OCI Generative AI playground console

1. Login to your Oracle cloud tenancy and change your tenancy to US Midwest (chicago)
![Connect to US-Midwest Chicago Tenancy](images/lab-11.png)

2. From Hamburger menu (top left corner), pull up Analytics & AI --> AI Services --> Generative AI
![Connect to OCI Gen AI](images/lab-12.png)

## Task 2: Generate a clinical trial in OCI Generative AI Playground

1. Click on the Generative AI -> Overview -> Playground -> Chat and Run the example "Generate a job description" with the cohere-command-r-16k model
![Test OCI Gen AI Example](images/lab-13.png)

2. Copy the following text in the chat window *Generate a clinical trial report on drug evaluation on Advanced Non-Small Cell Lung Cancer* , change the *Maximum output settings* and the *Temperature* settings as shown. Press Submit to generate a clinical trial report.
![Generate trial document](images/lab-14.png)

3. Note how the Personal Identifiable Information (PII) is redacted and substituted
![PII Redaction](images/lab-15.png)

## Task 3: Generate a summary and view generated code

1. Copy the generated clinical trial to Playground -> Summarization and generate summary.
![Summary](images/lab-16.png)

2. Click on the view code button to see the generated code
![Generated python code](images/lab-17.png)

## Learn More

* [OCI Generative AI](https://www.oracle.com/artificial-intelligence/generative-ai/generative-ai-service/)
* [Realize business value by transforming data into action with Generative AI](https://blogs.oracle.com/ai-and-datascience/post/generative-ai-use-cases/)

## Acknowledgements

* **Author** - Rajib Ghosh, Master Principal Cloud Architect, OCI AI and GenAI Center of Excellence
* **Last Updated By/Date** - Aug, 2024
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# Generate Compliance submission documents

## Introduction

In this lab, we will generate a sample compliance submission form summarizing the clinical trial data sections from a sample with a pre-canned PDF template. Prompting techniques and formatting has been used to hint the OCI Generative AI large language model to contain sectional summaries in a condensed manner. The process can be extended to summarize across multiple retrieval chunks of data with langchain and vector search.

Estimated Lab Time: -- 10 minutes

### Langchain output parsers and OCI Generative AI

[LangChain](https://python.langchain.com/v0.2/docs/introduction/) is a Python library that simplifies the development, productionization, and deployment of applications powered by large language models (LLMs). It is a development framework that chains LLMs, agents and retrieval strategies to effectively make up an end-to-end application's cognitive architecture. Chains can be defined declaratively for convenience. Langchain output parsers offer various custom formatting options for sectional prompting to extract summarized textual data into a templated format.

### Objectives

In this lab, you will:

* Load clinical-trial PDF documents and embeddings into an Opensearch index
* Use langchain output parser to produce sectional summaries for a document template
* Generate a sample compliance submission form from OCI Generative AI LLM

### Prerequisites

This lab assumes you have:

* Working knowledge of Python and Notebooks
* Working knowledge of OCI Data science and conda packs
* Some knowledge of langchain framework but not required.

## Task 1: Load clinical-trials documents and metadata

1. Get the following information into a notepad or a script

* Compartment OCID for *clinical-trials* compartment. (Search on OCI console for compartments, click your compartment and copy the OCID)
* Opensearch username - The username entered while provisioning Opensearch cluster (i,e *osmaster*)
* Opensearch password - The password entered while provisioning Opensearch cluster
* API end point Private IP from OCI Opensearch service console

2. Double click to open up the notebook *demo-generate-document.ipynb* Run each of the cells one by one from top by using *Shift+Enter* or play button at the top

3. Substitute the following definitions in the cell as shown below
![Image alt text](images/lab5-note-os-1.png)

4. Load all PDF documents using PyPDFDirectory loader to load all documents into a pandas data frame

5. Generate page_content and document metadata embeddings using OCI Generative AI

6. Check Opensearch client connectivity. It should show the *OpenSearch([{'host': 'hostname', 'port': 9200}])* as output

7. Load both text and embeddings data into the *idx-oci-genai-clinical-trials* index

8. Paste the title retrieved from the previous lab *demo-vector-search-ext.ipynb* to query based on page_content embeddings

9. Report file metadata and the score.

## Task 2: Generate Compliance document form

1. Select the top retrieved document from the query search above and enter in your query.

2. Run the rest of the cells to generate the compliance form for the trial.

3. View the generated compliance form in */home/datascience/conda/data/outputs* directory
![Image alt text](images/lab5-comp-doc.png)

This involves

1. Defining a pydantic Object base model class called *TrialInfo* to structure document sections and their description instructions. These are formatted instructions that are passed to the OCI Generative AI LLM at runtime.
2. This *TrialInfo* class is a superset representing sectional headers for all clinical-trial documents in this workshop.
3. Defining a langchain pydantic output parser object and passing the format instructions.
4. Defining a chat prompt template with specific instructions to use the format instructions
5. Using OCI Generative AI chat llm to perform sectional summarization based on the format instructions.
6. Mapping the summarized results to the form template fields
7. Generating the compliance document with a PDF filler

## Task 3: Various other ways to customize this notebook

Ways to extend and customize

1. Using langchain chunking classes to split document, embed and load to an index
2. Perform embedding search on chunked documents index
3. Compare chunked retrievals vs full document retrievals and evaluate scores
4. Use a different template or use multiple clinical trial templates by disease
5. Use other prompting techniques with different format specifications
6. Use a better PDF form filler.

## Learn More

* [Deploy Langchain applications as OCI Model Deployments](https://blogs.oracle.com/ai-and-datascience/post/deploy-langchain-application-as-model-deployment)
* [OCI AI Quick actions](https://docs.oracle.com/en-us/iaas/data-science/using/ai-quick-actions.htm)

## Acknowledgements

* **Author** - Rajib Ghosh, Master Principal Cloud Architect, OCI AI and Gen AI Center of Excellence
* **Last Updated By/Date** - Aug 2024
46 changes: 46 additions & 0 deletions compliance-submission-oci-genai/getting-started/getting-started.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Get started

Generating templated compliance documents is a complex process that involves understanding of various components services, their interactions and the setup.
Though this workshop do not reflect an actual form, however the principles outlined here can be effectively used for this purpose.So, let's get started.

## Objectives

In this workshop, you will learn how to:

* Use OCI Generative AI and OCI DataScience to vectorize clinical trial data
* Embedding and loading data into a vector store like OCI Opensearch
* Implement a Retrieval Augmented Generation (RAG) solution
* Effective prompting to generate query outputs with Cohere chat LLM model
* Structure LLM generated outputs with langchain
* Using a pre-canned template to generate a compliance form

## Task 1: Prerequisites

This lab assumes you have:

* Basic familiarity with Generative AI concepts, RAG and Industry
* Some familiarity with OCI Generative AI Services and Tool sets
* Familiarity with Python programming language.
* Basic understanding of large language models (LLM)
* Some familiarity with OCI Opensearch service
* Some familiarity with open source langchain framework
* Familiarity with clinical trial and compliance submission process would be helpful but not required

## Task 2: Downloads

The zipped notebooks and scripts can be downloaded from [here](https://orasenatdpltintegration03.objectstorage.us-chicago-1.oci.customer-oci.com/p/SfhRh7OEvLj9yR0hAIM3BwT7bCpi3jALfP6NqoCODU7mFe51nl1PeBPWcJj2El9K/n/orasenatdpltintegration03/b/clinical-trials/o/conda.zip). Installation instructions are described in LAB-3 Developing with OCI Data science.

## Task 3: Provision Oracle cloud tenancy and login

Use the live-lab link to provision a cloud tenancy and testing your login
[provision new cloud tenancy account](https://github.com/oracle-livelabs/common/blob/main/labs/cloud-login/event-register-free-tier-account.md)

## Learn More

* [Oracle Generative AI Capabilities](https://www.oracle.com/artificial-intelligence/generative-ai/)
* [Oracle Clinical Digital Assistant](https://www.oracle.com/health/clinical-suite/clinical-digital-assistant/)

## Acknowledgements

* **Author** - Rajib Ghosh, Master Principal Cloud Architect, OCI GenAI Center of excellence
* **Last Updated By/Date** - Aug, 2024
Loading

0 comments on commit 6f1b8e2

Please sign in to comment.