jupyterhub · jwindgassen · Sep 10, 2024 · Sep 11, 2024 · Sep 12, 2024 · Sep 13, 2024
diff --git a/docs/source/index.md b/docs/source/index.md
@@ -30,6 +30,7 @@ install
 server-process
 launchers
 arbitrary-ports-hosts
+standalone
 ```
 
 ## Convenience packages for popular applications

diff --git a/docs/source/standalone.md b/docs/source/standalone.md
@@ -0,0 +1,171 @@
+(standanlone)=
+
+# Spawning and proxying a web service from JupyterHub
+
+The `standalone` feature of Jupyter Server Proxy enables JupyterHub Admins to launch and proxy arbitrary web services
+directly, instead of JupyterLab or Notebook. You can use Jupyter Server Proxy to spawn a single proxy,
+without it being attached to a Jupyter server. The proxy securely authenticates and restricts access to authorized
+users through JupyterHub, providing a unified way to access arbitrary applications securely.
+
+This works similarly to {ref}`proxying Server Processes <server-process>`, where a server process is started and proxied.
+The Proxy is usually started from the command line, often by modifying the `Spawner.cmd` in your
+[JupyterHub Configuration](https://jupyterhub.readthedocs.io/en/stable/tutorial/getting-started/spawners-basics.html).
+
+This feature builds upon the work of [Dan Lester](https://github.com/danlester), who originally developed it in the
+[jhsingle-native-proxy](https://github.com/ideonate/jhsingle-native-proxy) package.
+
+## Installation
+
+This feature has a dependency on JupyterHub and must be explicitly installed via an optional dependency:
+
+```shell
+pip install jupyter-server-proxy[standalone]
+```
+
+## Usage
+
+The standalone proxy is controlled with the `jupyter standaloneproxy` command. You always need to specify the
+{ref}`command <server-process:cmd>` of the web service that will be launched and proxied. Let's use
+[voilà](https://github.com/voila-dashboards/voila) as an example here:
+
+```shell
+jupyter standaloneproxy -- voila --no-browser --port={port} /path/to/some/Notebook.ipynb
+```
+
+Executing this command will spawn a new HTTP Server, creating the voilà dashboard and rendering the notebook.
+Any template strings (like the `--port={port}`) inside the command will be automatically replaced when the command is
+executed.
+
+The CLI has multiple advanced options to customize the proxy behavior. Execute `jupyter standaloneproxy --help`
+to get a complete list of all arguments.
+
+### Specify the address and port
+
+The proxy will try to extract the address and port from the `JUPYTERHUB_SERVICE_URL` environment variable. This variable
+will be set by JupyterHub. Otherwise, the server will be launched on `127.0.0.1:8888`.
+You can also explicitly overwrite these values:
+
+```shell
+jupyter standaloneproxy --address=localhost --port=8000 ...
+```
+
+### Disable Authentication
+
+For testing, it can be useful to disable the authentication with JupyterHub. Passing `--skip-authentication` will
+not trigger the login process when accessing the application.
+
+```{warning} Disabling authentication will leave the application open to anyone! Be careful with it,
+especially on multi-user systems.
+```
+
+## Usage with JupyterHub
+
+To launch a standalone proxy with JupyterHub, you need to customize the `Spawner` inside the configuration
+using `traitlets`:
+
+```python
+c.Spawner.cmd = "jupyter-standaloneproxy"
+c.Spawner.args = ["--", "voila", "--no-browser", "--port={port}", "/path/to/some/Notebook.ipynb"]
+```
+
+This will hard-code JupyterHub to launch voilà instead of `jupyterhub-singleuser`. In case you want to give the users
+of JupyterHub the ability to select which application to launch (like selecting either JupyterLab or voilà),
+you will want to make this configuration optional:
+
+```python
+# Let users select which application start
+c.Spawner.options_form = """
+        <label for="select-application">Choose Application: </label>
+        <select name="application" required>
+            <option value="lab">JupyterLab</option>
+            <option value="voila">voila</option>
+        </select>
+    """
+
+def select_application(spawner):
+    application = spawner.user_options.get("application", ["lab"])[0]
+    if application == "voila":
+        spawner.cmd = "jupyter-standaloneproxy"
+        spawner.args = ["--", "voila", "--no-browser", "--port={port}", "/path/to/some/Notebook.ipynb"]
+
+c.Spawner.pre_spawn_hook = select_application
+```
+
+```{note} This is only a very basic implementation to show a possible approach. For a production setup, you can create
+a more rigorous implementation by creating a custom `Spawner` and overwriting the appropriate functions and/or
+creating a custom `spawner.html` page.
+```
+
+## Technical Overview
+
+The following section should serve as an explanation to developers of the standalone feature of jupyter-server-proxy.
+It outlines the basic functionality and will explain the different components of the code in more depth.
+
+### JupyterHub and jupyterhub-singleuser
+
+By default, JupyterHub will use the `jupyterhub-singleuser` executable when launching a new instance for a user.
+This executable is usually a wrapper around the `JupyterLab` or `Notebook` application, with some
+additions regarding authentication and multi-user systems.
+In the standalone feature, we try to mimic these additions, but instead of using `JupyterLab` or `Notebook`, we
+will wrap them around an arbitrary web application.
+This will ensure direct, authenticated access to the application, without needing a Jupyter server to be running
+in the background. The different additions will be discussed in more detail below.
+
+### Structure
+
+The standalone feature is built on top of the `SuperviseAndProxyhandler`, which will spawn a process and proxy
+requests to this server. While this process is called _Server_ in the documentation, the term _Application_ will be
+used here, to avoid confusion with the other server where the `SuperviseAndProxyhandler` is attached to.
+When using jupyter-server-proxy, the proxies are attached to the Jupyter server and will proxy requests
+to the application.
+Since we do not want to use the Jupyter server here, we instead require an alternative server, which will be used
+to attach the `SuperviseAndProxyhandler` and all the required additions from `jupyterhub-singleuser`.
+For that, we use tornado `HTTPServer`.
+
+### Login and Authentication
+
+One central component is the authentication with the JupyterHub Server.
+Any client accessing the application will need to authenticate with the JupyterHub API, which will ensure only
+users themselves (or otherwise allowed users, e.g., admins) can access the application.
+The Login process is started by deriving our `StandaloneProxyHandler` from
+[jupyterhub.services.auth.HubOAuthenticated](https://github.com/jupyterhub/jupyterhub/blob/5.0.0/jupyterhub/services/auth.py#L1541)
+and decorating any methods we want to authenticate with `tornado.web.authenticated`.
+For the proxy, we just decorate the `proxy` method with `web.authenticated`, which will authenticate all routes on all HTTP Methods.
+`HubOAuthenticated` will automatically provide the login URL for the authentication process and any
+client accessing any path of our server will be redirected to the JupyterHub API.
+
+After a client has been authenticated with the JupyterHub API, they will be redirected back to our server.
+This redirect will be received on the `/oauth_callback` path, from where we need to redirect the client back to the
+root of the application.
+We use the [HubOAuthCallbackHander](https://github.com/jupyterhub/jupyterhub/blob/5.0.0/jupyterhub/services/auth.py#L1547),
+another handler from the JupyterHub package, for this.
+It will also cache the received OAuth state from the login so that we can skip authentication for the next requests
+and do not need to go through the whole login process for each request.
+
+### SSL certificates
+
+In some JupyterHub configurations, the launched application will be configured to use an SSL certificate for requests
+between the JupyterLab / Notebook and the JupyterHub API. The path of the certificate is given in the
+`JUPYTERHUB_SSL_*` environment variables. We use these variables to create a new SSL Context for both
+the `AsyncHTTPClient` (used for Activity Notification, see below) and the `HTTPServer`.
+
+### Activity Notifications
+
+The `jupyterhub-singleuser` will periodically send an activity notification to the JupyterHub API and inform it that
+the currently running application is still active. Whether this information is used or not depends on the specific
+configuration of this JupyterHub.
+
+### Environment Variables
+
+JupyterHub uses a lot of environment variables to specify how the launched app should be run.
+This list is a small overview of all used variables and what they contain and are used for.
+
+| Variable                                                                        | Explanation                                                                                                                                                                           | Typical Value                              |
+| ------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------ |
+| `JUPYTERHUB_SERVICE_URL`                                                        | URL where the server should be listening. Used to find the Address and Port to start the server on.                                                                                   | `http://127.0.0.1:5555`                    |
+| `JUPYTERHUB_SERVICE_PREFIX`                                                     | An URL Prefix where the root of the launched application should be hosted. E.g., when set to `/user/name/`, then the root of the proxied aplication should be `/user/name/index.html` | `/services/service-name/` or `/user/name/` |
+| `JUPYTERHUB_ACTIVITY_URL`                                                       | URL where to send activity notifications to.                                                                                                                                          | `$JUPYTERHUB_API_URL/user/name/activity`   |
+| `JUPYTERHUB_API_TOKEN`                                                          | Authorization Token for requests to the JupyterHub API.                                                                                                                               |                                            |
+| `JUPYTERHUB_SERVER_NAME`                                                        | A name given to all apps launched by the JupyterHub.                                                                                                                                  |                                            |
+| `JUPYTERHUB_SSL_KEYFILE`, `JUPYTERHUB_SSL_CERTFILE`, `JUPYTERHUB_SSL_CLIENT_CA` | Paths to keyfile, certfile and client CA for the SSL configuration                                                                                                                    |                                            |
+| `JUPYTERHUB_USER`, `JUPYTERHUB_GROUP`                                           | Name and Group of the user for this application. Required for Authentication                                                                                                          |
diff --git a/jupyter_server_proxy/config.py b/jupyter_server_proxy/config.py
@@ -2,6 +2,8 @@
 Traitlets based configuration for jupyter_server_proxy
 """
 
+from __future__ import annotations
+
 import sys
 from textwrap import dedent, indent
 from warnings import warn
@@ -263,60 +265,83 @@ def cats_only(response, path):
     """,
     ).tag(config=True)
 
+    def get_proxy_base_class(self) -> tuple[type | None, dict]:
+        """
+        Return the appropriate ProxyHandler Subclass and its kwargs
+        """
+        if self.command:
+            return (
+                SuperviseAndRawSocketHandler
+                if self.raw_socket_proxy
+                else SuperviseAndProxyHandler
+            ), dict(state={})
+
+        if not (self.port or isinstance(self.unix_socket, str)):
+            warn(
+                f"""Server proxy {self.name} does not have a command, port number or unix_socket path. 
+                At least one of these is required."""
+            )
+            return None, dict()
+
+        return (
+            RawSocketHandler if self.raw_socket_proxy else NamedLocalProxyHandler
+        ), dict()
 
-def _make_proxy_handler(sp: ServerProcess):
-    """
-    Create an appropriate handler with given parameters
-    """
-    if sp.command:
-        cls = (
-            SuperviseAndRawSocketHandler
-            if sp.raw_socket_proxy
-            else SuperviseAndProxyHandler
-        )
-        args = dict(state={})
-    elif not (sp.port or isinstance(sp.unix_socket, str)):
-        warn(
-            f"Server proxy {sp.name} does not have a command, port "
-            f"number or unix_socket path. At least one of these is "
-            f"required."
-        )
-        return
-    else:
-        cls = RawSocketHandler if sp.raw_socket_proxy else NamedLocalProxyHandler
-        args = {}
-
-    # FIXME: Set 'name' properly
-    class _Proxy(cls):
-        kwargs = args
-
-        def __init__(self, *args, **kwargs):
-            super().__init__(*args, **kwargs)
-            self.name = sp.name
-            self.command = sp.command
-            self.proxy_base = sp.name
-            self.absolute_url = sp.absolute_url
-            if sp.command:
-                self.requested_port = sp.port
-                self.requested_unix_socket = sp.unix_socket
-            else:
-                self.port = sp.port
-                self.unix_socket = sp.unix_socket
-            self.mappath = sp.mappath
-            self.rewrite_response = sp.rewrite_response
-            self.update_last_activity = sp.update_last_activity
-
-        def get_request_headers_override(self):
-            return self._realize_rendered_template(sp.request_headers_override)
-
-        # these two methods are only used in supervise classes, but do no harm otherwise
-        def get_env(self):
-            return self._realize_rendered_template(sp.environment)
-
-        def get_timeout(self):
-            return sp.timeout
-
-    return _Proxy
+    def get_proxy_attributes(self) -> dict:
+        """
+        Return the required attributes, which will be set on the proxy handler
+        """
+        attributes = {
+            "name": self.name,
+            "command": self.command,
+            "proxy_base": self.name,
+            "absolute_url": self.absolute_url,
+            "mappath": self.mappath,
+            "rewrite_response": self.rewrite_response,
+            "update_last_activity": self.update_last_activity,
+            "request_headers_override": self.request_headers_override,
+        }
+
+        if self.command:
+            attributes["requested_port"] = self.port
+            attributes["requested_unix_socket"] = self.unix_socket
+            attributes["environment"] = self.environment
+            attributes["timeout"] = self.timeout
+        else:
+            attributes["port"] = self.port
+            attributes["unix_socket"] = self.unix_socket
+
+        return attributes
+
+    def make_proxy_handler(self) -> tuple[type | None, dict]:
+        """
+        Create an appropriate handler for this ServerProxy Configuration
+        """
+        cls, proxy_kwargs = self.get_proxy_base_class()
+        if cls is None:
+            return None, proxy_kwargs
+
+        # FIXME: Set 'name' properly
+        attributes = self.get_proxy_attributes()
+
+        class _Proxy(cls):
+            def __init__(self, *args, **kwargs):
+                super().__init__(*args, **kwargs)
+
+                for name, value in attributes.items():
+                    setattr(self, name, value)
+
+            def get_request_headers_override(self):
+                return self._realize_rendered_template(self.request_headers_override)
+
+            # these two methods are only used in supervise classes, but do no harm otherwise
+            def get_env(self):
+                return self._realize_rendered_template(self.environment)
+
+            def get_timeout(self):
+                return self.timeout
+
+        return _Proxy, proxy_kwargs
 
 
 def get_entrypoint_server_processes(serverproxy_config):
@@ -332,21 +357,21 @@ def get_entrypoint_server_processes(serverproxy_config):
     return sps
 
 
-def make_handlers(base_url, server_processes):
+def make_handlers(base_url: str, server_processes: list[ServerProcess]):
     """
     Get tornado handlers for registered server_processes
     """
     handlers = []
-    for sp in server_processes:
-        handler = _make_proxy_handler(sp)
+    for server in server_processes:
+        handler, kwargs = server.make_proxy_handler()
         if not handler:
             continue
-        handlers.append((ujoin(base_url, sp.name, r"(.*)"), handler, handler.kwargs))
-        handlers.append((ujoin(base_url, sp.name), AddSlashHandler))
+        handlers.append((ujoin(base_url, server.name, r"(.*)"), handler, kwargs))
+        handlers.append((ujoin(base_url, server.name), AddSlashHandler))
     return handlers
 
 
-def make_server_process(name, server_process_config, serverproxy_config):
+def make_server_process(name: str, server_process_config: dict, serverproxy_config):
     return ServerProcess(name=name, **server_process_config)
 
 

diff --git a/jupyter_server_proxy/standalone/__init__.py b/jupyter_server_proxy/standalone/__init__.py
@@ -0,0 +1,9 @@
+from .app import StandaloneProxyServer
+
+
+def main():
+    StandaloneProxyServer.launch_instance()
+
+
+if __name__ == "__main__":
+    main()