Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrating native-proxy #501

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/source/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ install
server-process
launchers
arbitrary-ports-hosts
standalone
```

## Convenience packages for popular applications
Expand Down
171 changes: 171 additions & 0 deletions docs/source/standalone.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
(standanlone)=

# Spawning and proxying a web service from JupyterHub

The `standalone` feature of Jupyter Server Proxy enables JupyterHub Admins to launch and proxy arbitrary web services
directly, instead of JupyterLab or Notebook. You can use Jupyter Server Proxy to spawn a single proxy,
without it being attached to a Jupyter server. The proxy securely authenticates and restricts access to authorized
users through JupyterHub, providing a unified way to access arbitrary applications securely.

This works similarly to {ref}`proxying Server Processes <server-process>`, where a server process is started and proxied.
The Proxy is usually started from the command line, often by modifying the `Spawner.cmd` in your
[JupyterHub Configuration](https://jupyterhub.readthedocs.io/en/stable/tutorial/getting-started/spawners-basics.html).

This feature builds upon the work of [Dan Lester](https://github.com/danlester), who originally developed it in the
[jhsingle-native-proxy](https://github.com/ideonate/jhsingle-native-proxy) package.

## Installation

This feature has a dependency on JupyterHub and must be explicitly installed via an optional dependency:

```shell
pip install jupyter-server-proxy[standalone]
```

## Usage

The standalone proxy is controlled with the `jupyter standaloneproxy` command. You always need to specify the
{ref}`command <server-process:cmd>` of the web service that will be launched and proxied. Let's use
[voilà](https://github.com/voila-dashboards/voila) as an example here:

```shell
jupyter standaloneproxy -- voila --no-browser --port={port} /path/to/some/Notebook.ipynb
```

Executing this command will spawn a new HTTP Server, creating the voilà dashboard and rendering the notebook.
Any template strings (like the `--port={port}`) inside the command will be automatically replaced when the command is
executed.

The CLI has multiple advanced options to customize the proxy behavior. Execute `jupyter standaloneproxy --help`
to get a complete list of all arguments.

### Specify the address and port

The proxy will try to extract the address and port from the `JUPYTERHUB_SERVICE_URL` environment variable. This variable
will be set by JupyterHub. Otherwise, the server will be launched on `127.0.0.1:8888`.
You can also explicitly overwrite these values:

```shell
jupyter standaloneproxy --address=localhost --port=8000 ...
```

### Disable Authentication

For testing, it can be useful to disable the authentication with JupyterHub. Passing `--skip-authentication` will
not trigger the login process when accessing the application.

```{warning} Disabling authentication will leave the application open to anyone! Be careful with it,
especially on multi-user systems.
```

## Usage with JupyterHub

To launch a standalone proxy with JupyterHub, you need to customize the `Spawner` inside the configuration
using `traitlets`:

```python
c.Spawner.cmd = "jupyter-standaloneproxy"
c.Spawner.args = ["--", "voila", "--no-browser", "--port={port}", "/path/to/some/Notebook.ipynb"]
```

This will hard-code JupyterHub to launch voilà instead of `jupyterhub-singleuser`. In case you want to give the users
of JupyterHub the ability to select which application to launch (like selecting either JupyterLab or voilà),
you will want to make this configuration optional:

```python
# Let users select which application start
c.Spawner.options_form = """
<label for="select-application">Choose Application: </label>
<select name="application" required>
<option value="lab">JupyterLab</option>
<option value="voila">voila</option>
</select>
"""

def select_application(spawner):
application = spawner.user_options.get("application", ["lab"])[0]
if application == "voila":
spawner.cmd = "jupyter-standaloneproxy"
spawner.args = ["--", "voila", "--no-browser", "--port={port}", "/path/to/some/Notebook.ipynb"]

c.Spawner.pre_spawn_hook = select_application
```

```{note} This is only a very basic implementation to show a possible approach. For a production setup, you can create
a more rigorous implementation by creating a custom `Spawner` and overwriting the appropriate functions and/or
creating a custom `spawner.html` page.
```

## Technical Overview

The following section should serve as an explanation to developers of the standalone feature of jupyter-server-proxy.
It outlines the basic functionality and will explain the different components of the code in more depth.

### JupyterHub and jupyterhub-singleuser

By default, JupyterHub will use the `jupyterhub-singleuser` executable when launching a new instance for a user.
This executable is usually a wrapper around the `JupyterLab` or `Notebook` application, with some
additions regarding authentication and multi-user systems.
In the standalone feature, we try to mimic these additions, but instead of using `JupyterLab` or `Notebook`, we
will wrap them around an arbitrary web application.
This will ensure direct, authenticated access to the application, without needing a Jupyter server to be running
in the background. The different additions will be discussed in more detail below.

### Structure

The standalone feature is built on top of the `SuperviseAndProxyhandler`, which will spawn a process and proxy
requests to this server. While this process is called _Server_ in the documentation, the term _Application_ will be
used here, to avoid confusion with the other server where the `SuperviseAndProxyhandler` is attached to.
When using jupyter-server-proxy, the proxies are attached to the Jupyter server and will proxy requests
to the application.
Since we do not want to use the Jupyter server here, we instead require an alternative server, which will be used
to attach the `SuperviseAndProxyhandler` and all the required additions from `jupyterhub-singleuser`.
For that, we use tornado `HTTPServer`.

### Login and Authentication

One central component is the authentication with the JupyterHub Server.
Any client accessing the application will need to authenticate with the JupyterHub API, which will ensure only
users themselves (or otherwise allowed users, e.g., admins) can access the application.
The Login process is started by deriving our `StandaloneProxyHandler` from
[jupyterhub.services.auth.HubOAuthenticated](https://github.com/jupyterhub/jupyterhub/blob/5.0.0/jupyterhub/services/auth.py#L1541)
and decorating any methods we want to authenticate with `tornado.web.authenticated`.
For the proxy, we just decorate the `proxy` method with `web.authenticated`, which will authenticate all routes on all HTTP Methods.
`HubOAuthenticated` will automatically provide the login URL for the authentication process and any
client accessing any path of our server will be redirected to the JupyterHub API.

After a client has been authenticated with the JupyterHub API, they will be redirected back to our server.
This redirect will be received on the `/oauth_callback` path, from where we need to redirect the client back to the
root of the application.
We use the [HubOAuthCallbackHander](https://github.com/jupyterhub/jupyterhub/blob/5.0.0/jupyterhub/services/auth.py#L1547),
another handler from the JupyterHub package, for this.
It will also cache the received OAuth state from the login so that we can skip authentication for the next requests
and do not need to go through the whole login process for each request.

### SSL certificates

In some JupyterHub configurations, the launched application will be configured to use an SSL certificate for requests
between the JupyterLab / Notebook and the JupyterHub API. The path of the certificate is given in the
`JUPYTERHUB_SSL_*` environment variables. We use these variables to create a new SSL Context for both
the `AsyncHTTPClient` (used for Activity Notification, see below) and the `HTTPServer`.

### Activity Notifications

The `jupyterhub-singleuser` will periodically send an activity notification to the JupyterHub API and inform it that
the currently running application is still active. Whether this information is used or not depends on the specific
configuration of this JupyterHub.

### Environment Variables

JupyterHub uses a lot of environment variables to specify how the launched app should be run.
This list is a small overview of all used variables and what they contain and are used for.

| Variable | Explanation | Typical Value |
| ------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------ |
| `JUPYTERHUB_SERVICE_URL` | URL where the server should be listening. Used to find the Address and Port to start the server on. | `http://127.0.0.1:5555` |
| `JUPYTERHUB_SERVICE_PREFIX` | An URL Prefix where the root of the launched application should be hosted. E.g., when set to `/user/name/`, then the root of the proxied aplication should be `/user/name/index.html` | `/services/service-name/` or `/user/name/` |
| `JUPYTERHUB_ACTIVITY_URL` | URL where to send activity notifications to. | `$JUPYTERHUB_API_URL/user/name/activity` |
| `JUPYTERHUB_API_TOKEN` | Authorization Token for requests to the JupyterHub API. | |
| `JUPYTERHUB_SERVER_NAME` | A name given to all apps launched by the JupyterHub. | |
| `JUPYTERHUB_SSL_KEYFILE`, `JUPYTERHUB_SSL_CERTFILE`, `JUPYTERHUB_SSL_CLIENT_CA` | Paths to keyfile, certfile and client CA for the SSL configuration | |
| `JUPYTERHUB_USER`, `JUPYTERHUB_GROUP` | Name and Group of the user for this application. Required for Authentication |
143 changes: 84 additions & 59 deletions jupyter_server_proxy/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
Traitlets based configuration for jupyter_server_proxy
"""

from __future__ import annotations

import sys
from textwrap import dedent, indent
from warnings import warn
Expand Down Expand Up @@ -263,60 +265,83 @@ def cats_only(response, path):
""",
).tag(config=True)

def get_proxy_base_class(self) -> tuple[type | None, dict]:
"""
Return the appropriate ProxyHandler Subclass and its kwargs
"""
if self.command:
return (
SuperviseAndRawSocketHandler
if self.raw_socket_proxy
else SuperviseAndProxyHandler
), dict(state={})

if not (self.port or isinstance(self.unix_socket, str)):
warn(
f"""Server proxy {self.name} does not have a command, port number or unix_socket path.
At least one of these is required."""
)
return None, dict()

return (
RawSocketHandler if self.raw_socket_proxy else NamedLocalProxyHandler
), dict()

def _make_proxy_handler(sp: ServerProcess):
"""
Create an appropriate handler with given parameters
"""
if sp.command:
cls = (
SuperviseAndRawSocketHandler
if sp.raw_socket_proxy
else SuperviseAndProxyHandler
)
args = dict(state={})
elif not (sp.port or isinstance(sp.unix_socket, str)):
warn(
f"Server proxy {sp.name} does not have a command, port "
f"number or unix_socket path. At least one of these is "
f"required."
)
return
else:
cls = RawSocketHandler if sp.raw_socket_proxy else NamedLocalProxyHandler
args = {}

# FIXME: Set 'name' properly
class _Proxy(cls):
kwargs = args

def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.name = sp.name
self.command = sp.command
self.proxy_base = sp.name
self.absolute_url = sp.absolute_url
if sp.command:
self.requested_port = sp.port
self.requested_unix_socket = sp.unix_socket
else:
self.port = sp.port
self.unix_socket = sp.unix_socket
self.mappath = sp.mappath
self.rewrite_response = sp.rewrite_response
self.update_last_activity = sp.update_last_activity

def get_request_headers_override(self):
return self._realize_rendered_template(sp.request_headers_override)

# these two methods are only used in supervise classes, but do no harm otherwise
def get_env(self):
return self._realize_rendered_template(sp.environment)

def get_timeout(self):
return sp.timeout

return _Proxy
def get_proxy_attributes(self) -> dict:
"""
Return the required attributes, which will be set on the proxy handler
"""
attributes = {
"name": self.name,
"command": self.command,
"proxy_base": self.name,
"absolute_url": self.absolute_url,
"mappath": self.mappath,
"rewrite_response": self.rewrite_response,
"update_last_activity": self.update_last_activity,
"request_headers_override": self.request_headers_override,
}

if self.command:
attributes["requested_port"] = self.port
attributes["requested_unix_socket"] = self.unix_socket
attributes["environment"] = self.environment
attributes["timeout"] = self.timeout
else:
attributes["port"] = self.port
attributes["unix_socket"] = self.unix_socket

return attributes

def make_proxy_handler(self) -> tuple[type | None, dict]:
"""
Create an appropriate handler for this ServerProxy Configuration
"""
cls, proxy_kwargs = self.get_proxy_base_class()
if cls is None:
return None, proxy_kwargs

# FIXME: Set 'name' properly
attributes = self.get_proxy_attributes()

class _Proxy(cls):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)

for name, value in attributes.items():
setattr(self, name, value)

def get_request_headers_override(self):
return self._realize_rendered_template(self.request_headers_override)

# these two methods are only used in supervise classes, but do no harm otherwise
def get_env(self):
return self._realize_rendered_template(self.environment)

def get_timeout(self):
return self.timeout

return _Proxy, proxy_kwargs


def get_entrypoint_server_processes(serverproxy_config):
Expand All @@ -332,21 +357,21 @@ def get_entrypoint_server_processes(serverproxy_config):
return sps


def make_handlers(base_url, server_processes):
def make_handlers(base_url: str, server_processes: list[ServerProcess]):
"""
Get tornado handlers for registered server_processes
"""
handlers = []
for sp in server_processes:
handler = _make_proxy_handler(sp)
for server in server_processes:
handler, kwargs = server.make_proxy_handler()
if not handler:
continue
handlers.append((ujoin(base_url, sp.name, r"(.*)"), handler, handler.kwargs))
handlers.append((ujoin(base_url, sp.name), AddSlashHandler))
handlers.append((ujoin(base_url, server.name, r"(.*)"), handler, kwargs))
handlers.append((ujoin(base_url, server.name), AddSlashHandler))
return handlers


def make_server_process(name, server_process_config, serverproxy_config):
def make_server_process(name: str, server_process_config: dict, serverproxy_config):
return ServerProcess(name=name, **server_process_config)


Expand Down
9 changes: 9 additions & 0 deletions jupyter_server_proxy/standalone/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
from .app import StandaloneProxyServer


def main():
StandaloneProxyServer.launch_instance()


if __name__ == "__main__":
main()
Loading