Skip to content

Commit

Permalink
Documentation (#50)
Browse files Browse the repository at this point in the history
* Updates to README.md to improve documentation

* Significant expansion/updates
  • Loading branch information
mattprintz authored Dec 26, 2023
1 parent 77dc939 commit 07efb5e
Show file tree
Hide file tree
Showing 24 changed files with 862 additions and 38 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -64,3 +64,4 @@ create.sql
**/.venv/
.openai.toml
shell.nix
**/.jekyll-cache
126 changes: 88 additions & 38 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,95 @@
# Jupyter package
# Beaker Kernel: Contextually-aware notebooks with built-in AI assistant

This package provides a custom Jupyter kernel for the ASKEM project that allows extra communication beyond the usual Jupyter message types to allow for the kernel to interact with an LLM (GPT-4) and answer questions and generate code that is runnable in native notebook code cells.
Beaker allows you not just work in notebooks, but to integrate notebooks into any web application, and by leveraging the power of LLMs, you can easily super-power your application and/or notebook with a powerful [ReAct](https://www.promptingguide.ai/techniques/react) agent powered by [Archytas](https://github.com/jataware/archytas).

The kernel connects to the Terarium Data Service to allow contextual queries about Terarium assets such as visualizing or modifying datasets.
The Beaker agent can generate code to populate an existing notebook, or run code in the notebook environment in the background, passing the updated state or task response to the front-end for display. This allows for tasks such as asking Beaker to create a certain document, and not only will Beaker generate the text of the document, but will take care of creating, filling, and saving the document, and then notify the front-end the id of the document so it can be displayed to the user once it's complete.


This package contains 4 different products:
## Components of Beaker

This package contains the following components:

* The Beaker Jupyter kernel (`beaker`)
* A stand-alone Jupyter service (`service`)
* Contains a both a production-ready custom server and a standalone development interface
* A library of contexts (`contexts`) that can be extended to add functionality to Beaker


## How Beaker works

The Beaker kernel acts custom Python kernel that sits between the user interface and the execution environment (subkernel) and proxies messages, as needed, to the subkernel. The Beaker kernel inspects the messages that pass through it and may take extra actions, as needed. This allows you to define custom message types that result in custom behavior, have extra behavior be triggered by normal actions, or modify the request and/or response messages on the fly.

When it is first initialized, the Beaker kernel will start a subkernel using its defaults (usually Python3). If you check the existing kernels in the Jupyter service, will see both kernels listed. At this point, you can use Beaker as a naive kernel and all regular messages will be sent to the subkernel as if you were connected directly to the subkernel itself. To really get started with Beaker, you need to set a context.

### Contexts

Beaker works best when used within a particular context. At a high level, a context has three parts: What language you're working in, what problem space are you working within, and any particular items/objects you are working on.

When connecting to Beaker, usually the first action following connecting is to set the context.

#### Setting a Beaker context will do the following:

* Change the subkernel language if needed (destroying the current subkernel and creating a new one)
* Set the LLM prompt for the context
* Run initialization code in the subkernel to pre-import libraries and load objects/instances (optional)
* Register any context-specific custom message handlers (optional)
* Register any "post-execute" actions to run after a notebook cell is executed (optional)

** Currently, setting a context is the only way to change the language of the subkernel.

#### Setting the context

You set a context by sending a custom message to the beaker kernel. The message should have the following format:

`msg_type`: `context_setup_request`<br/>
`msg_payload`:<br/>
```json
{
"language": "<subkernel name>",
"context": "<context name>",
"context_info": {"json payload of any settings/info required to start the context"}
}
```

The list of available languages and contexts depends on what has been installed. The `context_info` payload is dependent on the particular context chosen.

The Beaker service provides an end-point that will return a JSON payload listing the installed contexts and the languages available for each context to allow discovery.

This endpoint is found at: `http://{jupyter_url}/contexts`, usually, `http://localhost:8888/contexts`

### Components of contexts

#### Toolsets

Toolsets are, primarily, set of tools provided to the LLM that allows the LLM to interact with both the subkernel environment and the front-end using the [Archytas ReAct framework](https://github.com/jataware/archytas). Details for how toolsets work and should be defined can be found in the [Archytas documentation](https://github.com/jataware/archytas).


#### Codesets

Codesets define snippets of code that run in a subkernel. These can often be thought of as analogous to functions, although it can be important to keep in mind that these are executed directly within the subkernel environment exactly as if the codeset content were executed within a notebook code cell.

Codesets are separated by context and language, allowing for analogous behavior across different subkernel languages for each configured contexts.

Each codeset is defined using the [Jinja templating language](https://jinja.palletsprojects.com/en/3.1.x/), allowing for dynamic variation of the code based on the current state of the context/subkernel environment.

Code from a codeset can be rendered using the `get_code()` method on an context object.
Once the template is rendered in to properly formatted code, it can be executed in the subkernel using the `execute()` or `evaluate()` methods on the context.


#### Subkernels

The subkernel files within the context directory are required to define some common behaviours and provide a function for Beaker to behave consistently and properly parse the responses from subkernel executions.

#TODO: Move subkernels out of context?

* A Python module named `beaker_kernel` (`pyproject.toml`)
* A Jupyter kernel (`beaker`)
* A Jupyter service (`main.py`)
* A standalone development interface (`dev_ui`)

## Install / setup

### beaker_kernel Python module
### Docker



### python (local)

Normal installation:
```bash
Expand All @@ -28,20 +103,6 @@ $ poetry config virtualenvs.create false
$ poetry install --no-dev
```

### Jupyter kernel

To install the kernel, simply copy or symlink the `beaker` directory in to one of the directories defined in the following document:

https://jupyter-client.readthedocs.io/en/stable/kernels.html#kernel-specs


For example:
```bash
$ cp -r beaker /usr/share/jupyter/kernels/beaker
```

Once the directory exists and the jupyter service is restarted the kernel should be available for selection.

For development, the kernel is automatically installed in the proper location in your development virtual environment when you run `make dev-install` as explained in the Dev setup section.


Expand Down Expand Up @@ -78,25 +139,14 @@ available to Beaker, it must be added to `askem-julia`'s `Project.toml`.
This setup uses stock Jupyter services as provided
in the Jupyter Python packages.

The entry point of the docker file runs the file main.py which
starts a JupyterLab Server App. The only differences here
The entry point of the docker file runs the file main.py which
starts a JupyterLab Server App. The only differences here
are:
1. This service does not run any front-end and only provides
1. This service does not run any front-end and only provides
API and websocket access as Terarium is meant to be the
interface when deployed.
2. Some settings are changed to allow access through the
Terarium interface and be accessed by the proxy kernel:
1. allow_orgin rule
2. disable_check_xsrf security issue to allow the proxy
2. disable_check_xsrf security issue to allow the proxy
kernel to make API calls

The main engineering on the back end goes in to the writing of
the LLM/Proxy kernel.

This custom kernel manages a "sub-kernel" that can be for any
language, etc as long as it is a good kernel and installed. The
proxy passes messages from the client/jupyter server back and
forth with the sub-kernel, but sometimes intercepts the
messages or performs extra messages. This message interception
allows for the client to request LLM queries and to generate
code cells back to the client based on the LLM response.
1 change: 1 addition & 0 deletions docs/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
_site
14 changes: 14 additions & 0 deletions docs/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
FROM ruby:3.2

ENV LC_ALL C.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US.UTF-8

WORKDIR /usr/src/app

COPY Gemfile ./
RUN gem install bundler
RUN bundle install

EXPOSE 4000
CMD jekyll serve -d /_site --watch --force_polling -H 0.0.0.0 -P 4000
11 changes: 11 additions & 0 deletions docs/Gemfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
source "https://rubygems.org"

gem "jekyll"

gem "just-the-docs"

gem "jekyll-github-metadata", ">= 2.15"

gem "jekyll-include-cache", group: :jekyll_plugins

gem "html-proofer", "~> 5.0", :group => :development
142 changes: 142 additions & 0 deletions docs/Gemfile.lock
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
GEM
remote: https://rubygems.org/
specs:
Ascii85 (1.1.0)
addressable (2.8.5)
public_suffix (>= 2.0.2, < 6.0)
afm (0.2.2)
async (2.6.5)
console (~> 1.10)
fiber-annotation
io-event (~> 1.1)
timers (~> 4.1)
base64 (0.2.0)
colorator (1.1.0)
concurrent-ruby (1.2.2)
console (1.23.3)
fiber-annotation
fiber-local
em-websocket (0.5.3)
eventmachine (>= 0.12.9)
http_parser.rb (~> 0)
ethon (0.16.0)
ffi (>= 1.15.0)
eventmachine (1.2.7)
faraday (2.7.12)
base64
faraday-net_http (>= 2.0, < 3.1)
ruby2_keywords (>= 0.0.4)
faraday-net_http (3.0.2)
ffi (1.16.3)
fiber-annotation (0.2.0)
fiber-local (1.0.0)
forwardable-extended (2.6.0)
google-protobuf (3.25.1-x86_64-linux)
hashery (2.1.2)
html-proofer (5.0.8)
addressable (~> 2.3)
async (~> 2.1)
nokogiri (~> 1.13)
pdf-reader (~> 2.11)
rainbow (~> 3.0)
typhoeus (~> 1.3)
yell (~> 2.0)
zeitwerk (~> 2.5)
http_parser.rb (0.8.0)
i18n (1.14.1)
concurrent-ruby (~> 1.0)
io-event (1.3.3)
jekyll (4.3.2)
addressable (~> 2.4)
colorator (~> 1.0)
em-websocket (~> 0.5)
i18n (~> 1.0)
jekyll-sass-converter (>= 2.0, < 4.0)
jekyll-watch (~> 2.0)
kramdown (~> 2.3, >= 2.3.1)
kramdown-parser-gfm (~> 1.0)
liquid (~> 4.0)
mercenary (>= 0.3.6, < 0.5)
pathutil (~> 0.9)
rouge (>= 3.0, < 5.0)
safe_yaml (~> 1.0)
terminal-table (>= 1.8, < 4.0)
webrick (~> 1.7)
jekyll-github-metadata (2.16.0)
jekyll (>= 3.4, < 5.0)
octokit (>= 4, < 7, != 4.4.0)
jekyll-include-cache (0.2.1)
jekyll (>= 3.7, < 5.0)
jekyll-sass-converter (3.0.0)
sass-embedded (~> 1.54)
jekyll-seo-tag (2.8.0)
jekyll (>= 3.8, < 5.0)
jekyll-watch (2.2.1)
listen (~> 3.0)
just-the-docs (0.7.0)
jekyll (>= 3.8.5)
jekyll-include-cache
jekyll-seo-tag (>= 2.0)
rake (>= 12.3.1)
kramdown (2.4.0)
rexml
kramdown-parser-gfm (1.1.0)
kramdown (~> 2.0)
liquid (4.0.4)
listen (3.8.0)
rb-fsevent (~> 0.10, >= 0.10.3)
rb-inotify (~> 0.9, >= 0.9.10)
mercenary (0.4.0)
nokogiri (1.15.5-x86_64-linux)
racc (~> 1.4)
octokit (6.1.1)
faraday (>= 1, < 3)
sawyer (~> 0.9)
pathutil (0.16.2)
forwardable-extended (~> 2.6)
pdf-reader (2.11.0)
Ascii85 (~> 1.0)
afm (~> 0.2.1)
hashery (~> 2.0)
ruby-rc4
ttfunk
public_suffix (5.0.4)
racc (1.7.3)
rainbow (3.1.1)
rake (13.1.0)
rb-fsevent (0.11.2)
rb-inotify (0.10.1)
ffi (~> 1.0)
rexml (3.2.6)
rouge (4.2.0)
ruby-rc4 (0.1.5)
ruby2_keywords (0.0.5)
safe_yaml (1.0.5)
sass-embedded (1.69.5-x86_64-linux-gnu)
google-protobuf (~> 3.23)
sawyer (0.9.2)
addressable (>= 2.3.5)
faraday (>= 0.17.3, < 3)
terminal-table (3.0.2)
unicode-display_width (>= 1.1.1, < 3)
timers (4.3.5)
ttfunk (1.7.0)
typhoeus (1.4.1)
ethon (>= 0.9.0)
unicode-display_width (2.5.0)
webrick (1.8.1)
yell (2.2.2)
zeitwerk (2.6.12)

PLATFORMS
x86_64-linux

DEPENDENCIES
html-proofer (~> 5.0)
jekyll
jekyll-github-metadata (>= 2.15)
jekyll-include-cache
just-the-docs

BUNDLED WITH
2.4.22
23 changes: 23 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Updating the documentation

The documentation will be automatically rebuilt and deployed upon merge to main on Github.

For local development, you can use the provided docker-compose which monitors the files for
changes and automatically updates as you modify the local files.

You can access the documentation preview by starting the service (see below) and opening
[http://localhost:4000](http://localhost:4000) in your browser.


To start:
```bash
cd docs
docker compose up -d --build
```

To stop:
```bash
cd docs
docker compose down
```

9 changes: 9 additions & 0 deletions docs/_base.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
---
layout: default
title: Home
nav_order: 1
has_toc: true
---

# Page

15 changes: 15 additions & 0 deletions docs/_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
title: "Beaker Kernel"
search_enabled: true
remote_theme: pmarsceill/just-the-docs

gems:
- just-the-docs
- jekyll-github-metadata
- jekyll-mentions
- jekyll-redirect-from
- jekyll-sitemap
- jemoji

aux_links:
"Beaker on Github":
- "//github.com/DARPA-ASKEM/beaker-kernel"
15 changes: 15 additions & 0 deletions docs/_config_local.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
title: "Beaker Kernel"
search_enabled: true
theme: "just-the-docs"

gems:
- just-the-docs
- jekyll-github-metadata
- jekyll-mentions
- jekyll-redirect-from
- jekyll-sitemap
- jemoji

aux_links:
"Beaker on Github":
- "//github.com/DARPA-ASKEM/beaker-kernel"
Loading

0 comments on commit 07efb5e

Please sign in to comment.