Skip to content

Commit

Permalink
Merge pull request #94 from Zarquan/20241126-zrq-metadata-roles
Browse files Browse the repository at this point in the history
Improving the metadata roles section
  • Loading branch information
Zarquan authored Nov 26, 2024
2 parents c6be6cb + 6af673e commit 73d3489
Showing 1 changed file with 153 additions and 63 deletions.
216 changes: 153 additions & 63 deletions ExecutionBroker.tex
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@

\newcommand{\python} {Python}
\newcommand{\pythonprogram} {Python program}
\newcommand{\pythonruntime} {Python runtime}

\newcommand{\apache} {Apache}
\newcommand{\spark} {Spark}
Expand Down Expand Up @@ -675,7 +676,7 @@ \subsubsection{Update options}
....
....
options:
- type: "urn:enum-value-option"
- type: "uri:enum-value-option"
path: "state"
values:
- "ACCEPTED"
Expand All @@ -694,7 +695,7 @@ \subsubsection{Update options}
....
....
options:
- type: "urn:enum-value-option"
- type: "uri:enum-value-option"
path: "state"
values:
- "CANCELLED"
Expand Down Expand Up @@ -770,7 +771,7 @@ \subsection{Session lifecycle}
\section{The data model}
\label{data-model}

\subsection{Data curation roles}
\subsection{Metadata roles}
\label{metadata-roles}

The full description of an \executablething{} will include several layers of metadata
Expand All @@ -790,15 +791,16 @@ \subsection{Data curation roles}
\subsubsection{The developer}
\label{software-developer}

The first layer of metadata comes from the person who wrote the \pythonprogram{}.
The first layer of metadata comes from the person who wrote the software.
They have detailed knowledge of what the software does, what execution environment it needs,
and what the inputs and outputs are.

For the square root example, it is a \pythonprogram{} which needs a platform with the \python{} runtime installed,
For our square root example, it is a \pythonprogram{} which needs a platform with the \python{} runtime installed,
and a list of the \python{} libraries that the program relies on.

\begin{lstlisting}[]
executable:
name: Newton-Raphson example
type: uri:python-program
requirements:
- numpi: ""
Expand All @@ -811,27 +813,30 @@ \subsubsection{The developer}
\begin{lstlisting}[]
resources:
compute:
- type: uri:generic-compute
cores:
requested:
min: 4
memory:
requested:
min: 16
units: GiB
....
- type: uri:generic-compute
cores:
requested:
min: 4
memory:
requested:
min: 16
units: GiB
....
\end{lstlisting}

The developers also know about what inputs and outputs the program expects and what file
formats can it can handle.
% needs work
%https://github.com/ivoa-std/ExecutionBroker/issues/89

\begin{lstlisting}[]
executable:
name: Newton-Raphson example
type: uri:python-program
....
parameters:
- type: uri:param-file
name: "input data"
- name: input data
type: uri:param-file
mode: readonly
description:
A table containing a list of numbers to be processed, formatted as
Expand All @@ -841,9 +846,8 @@ \subsubsection{The developer}
....
- type: uri:votable
....
- type: uri:param-value
name: "input column name"
type: string
- name: input column
type: uri:param-value
description:
The column name within the 'input data' to use.
\end{lstlisting}
Expand All @@ -862,31 +866,40 @@ \subsubsection{The packager}
step that could be implemented by a different person.
To make this distinction clear we can refer to this person, or role, as 'the packager'.

In terms of the \metadoc{}, the packager changes the description of the \executablething{}
from a \pythonprogram{} to a \dockercontainer{}.
This step packages the \pythonprogram{} along with any \python{} modules it requires,
the \pythonruntime{}, and any operating system components it requires, into a single
standard format binary file, making it much easier to deploy.

To represent the new type of \executablething{} in the \metadoc{}, the packager
would change the description of the \executablething{} from a \pythonprogram{}
to a \dockercontainer{}.

\begin{lstlisting}[]
executable:
type: uri:docker-container
name: Newton-Raphson example
type: uri:docker-container-1.0
repository: ghcr.io
image: ivoa/calycopis/java-builder
image: ivoa/analytics/Newton-Raphson-albert
tag: 2024.08.30
....
\end{lstlisting}

Depending on how the software is packaged in the container they may also need to update
the description of the inputs and outputs,
and link them to specific locations in the filesystem.
the description of the inputs and outputs, and link them to specific locations in the
filesystem.
% needs work
%https://github.com/ivoa-std/ExecutionBroker/issues/89

\begin{lstlisting}[]
executable:
type: uri:docker-container
name: Newton-Raphson example
type: uri:docker-container-1.0
....
parameters:
- type: uri:data-file
name: "input data"
- name: input-data
type: uri:data-file
format:
- type: urn:ivoa-votable
- type: uri:ivoa-votable
filename: input-data.vot
....

Expand All @@ -895,7 +908,7 @@ \subsubsection{The packager}
- type: uri:generic-compute
volumes:
- type: uri:file-mount
parameter: "input data"
parameter: input-data
filepath: /data
mode: readonly
....
Expand All @@ -916,52 +929,66 @@ \subsubsection{The publisher}
\item A project specific discovery service that only includes software vetted by the project.
Execution platforms within the project would only accept curated \metadoc{s}
from that discovery service.
\item A domain specific discovery service that modifies the execution environment, optimising
the software for analysing a particular type of data.
\item A domain specific discovery service that modifies the execution environment, configuring
the software to analyse a particular type of data.
\item A catalog of \metadoc{s} maintained as part of a university teaching course, modifying the
execution environment to integrate the software into the university system and setting
parameters to configure the software to match the course notes.
parameters to configure it to match the course notes.
\end{itemize}

\subsubsection{The user}
\label{software-user}

The user, or the user's client agent, starts with an initial \metadoc{} from the
software discovery service and adds additional information describing how the user
wants to use the software.
The user starts with an initial \metadoc{} from the
software discovery service and adds additional information describing how they
want to use the software.

Adding details of the data resources the user wants to use enables the \execbrokerservice{}
to transfer the data to local storage before the \execsession{} is started.

Including a value for the filesize enables the \execbrokerservice{} to estimate
how much local storage it will need to allocate
and how much time will be needed to transfer the data.
The \execbrokerservice{} can take this into account when calculating the start time of
the \execoffer{s} it makes, allowing enough time for the data transfers to complete
before the \execsession{} starts.
This would include selecting the data resources that they want to use
and adding them to the metadata.

\begin{lstlisting}[]
executable:
....

resources:

data:
- type: uri:simple-data-resource
name: "input data"
- name: input data
type: uri:simple-data-resource
location: http:data.example.org/....
filesize:
value: 145
units: MiB
....

compute:
....
\end{lstlisting}

Including details of the data resources in the \metadoc{} means the \execbrokerservice{}
will include the time needed to transfer the data to local storage before the
\execsession{} is begins.

Including a value for the data size enables the \execbrokerservice{} to estimate
how much local storage it will need to allocate
and how much time will be needed to transfer the data.
The \execbrokerservice{} can take this into account when calculating the start time of
the \execoffer{s} it makes, allowing enough time for the data transfers to complete
before the \execsession{} starts.

Linking the data resources with volumes on the corresponding compute resources enables
the \execbrokerservice{} to mount the data resources at the correct location in
the compute resource's filesystem.

\begin{lstlisting}[]
resources:

data:
- type: uri:simple-data-resource
name: input-data
- name: input-data
type: uri:simple-data-resource
location: http:data.example.org/....
....

compute:
- type: uri:generic-compute
....
Expand Down Expand Up @@ -996,7 +1023,7 @@ \subsubsection{The user}
units: GiB
\end{lstlisting}

TODO user provides the schedule ... when they want to run it.
TODO user provides the schedule to describe when they want to run it.

\subsection{The \executable{}}
\label{executable}
Expand All @@ -1012,26 +1039,89 @@ \subsection{The \executable{}}
Rather than try to model every possible type of \executable{} in one large \datamodel{},
the \datamodel{} for each type is described in an extension to the core \datamodel{}.

To support this, the core \datamodel{} defines two fields:
\begin{itemize}
\item \codeword{type} - a URI identifying the type of \executable{}.
\item \codeword{spec} - a place holder for type specific details.
\end{itemize}
The \datamodel{} uses a common pattern for polymorphic types based on a discriminator
value to indicate the type of thing it is describing, followed by the specific
details for that type.

This is implemented in the \openapi{} specification as an abstract base class
containing common fields like a name and uuid identifier, followed by a list
of derived types and their type identifiers.

% Type URLs
% https://www.purl.org/ivoa.net/executable-types/example
% https://github.com/ivoa-std/ExecutionBroker/blob/main/types/executable-types/example-executable.md
\begin{lstlisting}[]
# ExecutionBroker client request.
AbstractExecutable:
type: object
discriminator:
propertyName: type
mapping:
"uri:docker-container-1.0": 'DockerContainer'
"uri:jupyter-notebook-1.0": 'JupyterNotebook'
....
properties:
name:
description: >
A human readable name, assigned by the client.
type: string
uuid:
description: >
A machine readable UUID, assigned by the server.
type: string
format: uuid
type:
description: >
The type identifier.
type: string
\end{lstlisting}

The derived types extend this abstract base class to include the details needed to
describe this type of \executablething{}.
For example, the derived type for a \dockercontainer{} includes properties
to describe where to get the \docker image from, including the repository endpoint URL,
and the name and version tag of the \docker{} image to download.

\begin{lstlisting}[]
DockerContainer:
description: |
A Docker or OCI container.
See https://opencontainers.org/
type: object
title: DockerContainer
allOf:
- $ref: 'AbstractExecutable'
- type: object
properties:
repository:
type: string
description: >
The image respository URL.
image:
type: string
description: >
The image name within the repository.
tag:
type: string
description: The image tag.
....
\end{lstlisting}

This results in the following message being sent to request the execution
of a \dockercontainer{}.

\begin{lstlisting}[]
# ExecutionBroker request.
request:

# Details of the executable.
executable:

# A URI identifying the type of executable.
type: "https://www.purl.org/ivoa.net/executable-types/example"
# Common fields from the AbstractExecutable
name: Experiment one
type: uri:docker-container-1.0

# The details, specific to a Docker container executable.
repository: ghcr.io
image: ivoa/analytics/Newton-Rahpson-example
tag: 2024.08.30

# The details, specific to the type of executable.
spec: {}
\end{lstlisting}

\subsubsection{\jupyternotebook{}}
Expand Down

0 comments on commit 73d3489

Please sign in to comment.