Skip to content

Commit 73d3489

Browse files
authored
Merge pull request #94 from Zarquan/20241126-zrq-metadata-roles
Improving the metadata roles section
2 parents c6be6cb + 6af673e commit 73d3489

File tree

1 file changed

+153
-63
lines changed

1 file changed

+153
-63
lines changed

ExecutionBroker.tex

+153-63
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@
6262

6363
\newcommand{\python} {Python}
6464
\newcommand{\pythonprogram} {Python program}
65+
\newcommand{\pythonruntime} {Python runtime}
6566

6667
\newcommand{\apache} {Apache}
6768
\newcommand{\spark} {Spark}
@@ -675,7 +676,7 @@ \subsubsection{Update options}
675676
....
676677
....
677678
options:
678-
- type: "urn:enum-value-option"
679+
- type: "uri:enum-value-option"
679680
path: "state"
680681
values:
681682
- "ACCEPTED"
@@ -694,7 +695,7 @@ \subsubsection{Update options}
694695
....
695696
....
696697
options:
697-
- type: "urn:enum-value-option"
698+
- type: "uri:enum-value-option"
698699
path: "state"
699700
values:
700701
- "CANCELLED"
@@ -770,7 +771,7 @@ \subsection{Session lifecycle}
770771
\section{The data model}
771772
\label{data-model}
772773

773-
\subsection{Data curation roles}
774+
\subsection{Metadata roles}
774775
\label{metadata-roles}
775776

776777
The full description of an \executablething{} will include several layers of metadata
@@ -790,15 +791,16 @@ \subsection{Data curation roles}
790791
\subsubsection{The developer}
791792
\label{software-developer}
792793

793-
The first layer of metadata comes from the person who wrote the \pythonprogram{}.
794+
The first layer of metadata comes from the person who wrote the software.
794795
They have detailed knowledge of what the software does, what execution environment it needs,
795796
and what the inputs and outputs are.
796797

797-
For the square root example, it is a \pythonprogram{} which needs a platform with the \python{} runtime installed,
798+
For our square root example, it is a \pythonprogram{} which needs a platform with the \python{} runtime installed,
798799
and a list of the \python{} libraries that the program relies on.
799800

800801
\begin{lstlisting}[]
801802
executable:
803+
name: Newton-Raphson example
802804
type: uri:python-program
803805
requirements:
804806
- numpi: ""
@@ -811,27 +813,30 @@ \subsubsection{The developer}
811813
\begin{lstlisting}[]
812814
resources:
813815
compute:
814-
- type: uri:generic-compute
815-
cores:
816-
requested:
817-
min: 4
818-
memory:
819-
requested:
820-
min: 16
821-
units: GiB
822-
....
816+
- type: uri:generic-compute
817+
cores:
818+
requested:
819+
min: 4
820+
memory:
821+
requested:
822+
min: 16
823+
units: GiB
824+
....
823825
\end{lstlisting}
824826

825827
The developers also know about what inputs and outputs the program expects and what file
826828
formats can it can handle.
829+
% needs work
830+
%https://github.com/ivoa-std/ExecutionBroker/issues/89
827831

828832
\begin{lstlisting}[]
829833
executable:
834+
name: Newton-Raphson example
830835
type: uri:python-program
831836
....
832837
parameters:
833-
- type: uri:param-file
834-
name: "input data"
838+
- name: input data
839+
type: uri:param-file
835840
mode: readonly
836841
description:
837842
A table containing a list of numbers to be processed, formatted as
@@ -841,9 +846,8 @@ \subsubsection{The developer}
841846
....
842847
- type: uri:votable
843848
....
844-
- type: uri:param-value
845-
name: "input column name"
846-
type: string
849+
- name: input column
850+
type: uri:param-value
847851
description:
848852
The column name within the 'input data' to use.
849853
\end{lstlisting}
@@ -862,31 +866,40 @@ \subsubsection{The packager}
862866
step that could be implemented by a different person.
863867
To make this distinction clear we can refer to this person, or role, as 'the packager'.
864868

865-
In terms of the \metadoc{}, the packager changes the description of the \executablething{}
866-
from a \pythonprogram{} to a \dockercontainer{}.
869+
This step packages the \pythonprogram{} along with any \python{} modules it requires,
870+
the \pythonruntime{}, and any operating system components it requires, into a single
871+
standard format binary file, making it much easier to deploy.
872+
873+
To represent the new type of \executablething{} in the \metadoc{}, the packager
874+
would change the description of the \executablething{} from a \pythonprogram{}
875+
to a \dockercontainer{}.
867876

868877
\begin{lstlisting}[]
869878
executable:
870-
type: uri:docker-container
879+
name: Newton-Raphson example
880+
type: uri:docker-container-1.0
871881
repository: ghcr.io
872-
image: ivoa/calycopis/java-builder
882+
image: ivoa/analytics/Newton-Raphson-albert
873883
tag: 2024.08.30
874884
....
875885
\end{lstlisting}
876886

877887
Depending on how the software is packaged in the container they may also need to update
878-
the description of the inputs and outputs,
879-
and link them to specific locations in the filesystem.
888+
the description of the inputs and outputs, and link them to specific locations in the
889+
filesystem.
890+
% needs work
891+
%https://github.com/ivoa-std/ExecutionBroker/issues/89
880892

881893
\begin{lstlisting}[]
882894
executable:
883-
type: uri:docker-container
895+
name: Newton-Raphson example
896+
type: uri:docker-container-1.0
884897
....
885898
parameters:
886-
- type: uri:data-file
887-
name: "input data"
899+
- name: input-data
900+
type: uri:data-file
888901
format:
889-
- type: urn:ivoa-votable
902+
- type: uri:ivoa-votable
890903
filename: input-data.vot
891904
....
892905

@@ -895,7 +908,7 @@ \subsubsection{The packager}
895908
- type: uri:generic-compute
896909
volumes:
897910
- type: uri:file-mount
898-
parameter: "input data"
911+
parameter: input-data
899912
filepath: /data
900913
mode: readonly
901914
....
@@ -916,52 +929,66 @@ \subsubsection{The publisher}
916929
\item A project specific discovery service that only includes software vetted by the project.
917930
Execution platforms within the project would only accept curated \metadoc{s}
918931
from that discovery service.
919-
\item A domain specific discovery service that modifies the execution environment, optimising
920-
the software for analysing a particular type of data.
932+
\item A domain specific discovery service that modifies the execution environment, configuring
933+
the software to analyse a particular type of data.
921934
\item A catalog of \metadoc{s} maintained as part of a university teaching course, modifying the
922935
execution environment to integrate the software into the university system and setting
923-
parameters to configure the software to match the course notes.
936+
parameters to configure it to match the course notes.
924937
\end{itemize}
925938

926939
\subsubsection{The user}
927940
\label{software-user}
928941

929-
The user, or the user's client agent, starts with an initial \metadoc{} from the
930-
software discovery service and adds additional information describing how the user
931-
wants to use the software.
942+
The user starts with an initial \metadoc{} from the
943+
software discovery service and adds additional information describing how they
944+
want to use the software.
932945

933-
Adding details of the data resources the user wants to use enables the \execbrokerservice{}
934-
to transfer the data to local storage before the \execsession{} is started.
935-
936-
Including a value for the filesize enables the \execbrokerservice{} to estimate
937-
how much local storage it will need to allocate
938-
and how much time will be needed to transfer the data.
939-
The \execbrokerservice{} can take this into account when calculating the start time of
940-
the \execoffer{s} it makes, allowing enough time for the data transfers to complete
941-
before the \execsession{} starts.
946+
This would include selecting the data resources that they want to use
947+
and adding them to the metadata.
942948

943949
\begin{lstlisting}[]
950+
executable:
951+
....
952+
944953
resources:
954+
945955
data:
946-
- type: uri:simple-data-resource
947-
name: "input data"
956+
- name: input data
957+
type: uri:simple-data-resource
948958
location: http:data.example.org/....
949959
filesize:
950960
value: 145
951961
units: MiB
952962
....
963+
964+
compute:
965+
....
953966
\end{lstlisting}
954967

968+
Including details of the data resources in the \metadoc{} means the \execbrokerservice{}
969+
will include the time needed to transfer the data to local storage before the
970+
\execsession{} is begins.
971+
972+
Including a value for the data size enables the \execbrokerservice{} to estimate
973+
how much local storage it will need to allocate
974+
and how much time will be needed to transfer the data.
975+
The \execbrokerservice{} can take this into account when calculating the start time of
976+
the \execoffer{s} it makes, allowing enough time for the data transfers to complete
977+
before the \execsession{} starts.
978+
955979
Linking the data resources with volumes on the corresponding compute resources enables
956980
the \execbrokerservice{} to mount the data resources at the correct location in
957981
the compute resource's filesystem.
958982

959983
\begin{lstlisting}[]
960984
resources:
985+
961986
data:
962-
- type: uri:simple-data-resource
963-
name: input-data
987+
- name: input-data
988+
type: uri:simple-data-resource
989+
location: http:data.example.org/....
964990
....
991+
965992
compute:
966993
- type: uri:generic-compute
967994
....
@@ -996,7 +1023,7 @@ \subsubsection{The user}
9961023
units: GiB
9971024
\end{lstlisting}
9981025

999-
TODO user provides the schedule ... when they want to run it.
1026+
TODO user provides the schedule to describe when they want to run it.
10001027

10011028
\subsection{The \executable{}}
10021029
\label{executable}
@@ -1012,26 +1039,89 @@ \subsection{The \executable{}}
10121039
Rather than try to model every possible type of \executable{} in one large \datamodel{},
10131040
the \datamodel{} for each type is described in an extension to the core \datamodel{}.
10141041

1015-
To support this, the core \datamodel{} defines two fields:
1016-
\begin{itemize}
1017-
\item \codeword{type} - a URI identifying the type of \executable{}.
1018-
\item \codeword{spec} - a place holder for type specific details.
1019-
\end{itemize}
1042+
The \datamodel{} uses a common pattern for polymorphic types based on a discriminator
1043+
value to indicate the type of thing it is describing, followed by the specific
1044+
details for that type.
1045+
1046+
This is implemented in the \openapi{} specification as an abstract base class
1047+
containing common fields like a name and uuid identifier, followed by a list
1048+
of derived types and their type identifiers.
10201049

1021-
% Type URLs
1022-
% https://www.purl.org/ivoa.net/executable-types/example
1023-
% https://github.com/ivoa-std/ExecutionBroker/blob/main/types/executable-types/example-executable.md
10241050
\begin{lstlisting}[]
1025-
# ExecutionBroker client request.
1051+
AbstractExecutable:
1052+
type: object
1053+
discriminator:
1054+
propertyName: type
1055+
mapping:
1056+
"uri:docker-container-1.0": 'DockerContainer'
1057+
"uri:jupyter-notebook-1.0": 'JupyterNotebook'
1058+
....
1059+
properties:
1060+
name:
1061+
description: >
1062+
A human readable name, assigned by the client.
1063+
type: string
1064+
uuid:
1065+
description: >
1066+
A machine readable UUID, assigned by the server.
1067+
type: string
1068+
format: uuid
1069+
type:
1070+
description: >
1071+
The type identifier.
1072+
type: string
1073+
\end{lstlisting}
1074+
1075+
The derived types extend this abstract base class to include the details needed to
1076+
describe this type of \executablething{}.
1077+
For example, the derived type for a \dockercontainer{} includes properties
1078+
to describe where to get the \docker image from, including the repository endpoint URL,
1079+
and the name and version tag of the \docker{} image to download.
1080+
1081+
\begin{lstlisting}[]
1082+
DockerContainer:
1083+
description: |
1084+
A Docker or OCI container.
1085+
See https://opencontainers.org/
1086+
type: object
1087+
title: DockerContainer
1088+
allOf:
1089+
- $ref: 'AbstractExecutable'
1090+
- type: object
1091+
properties:
1092+
repository:
1093+
type: string
1094+
description: >
1095+
The image respository URL.
1096+
image:
1097+
type: string
1098+
description: >
1099+
The image name within the repository.
1100+
tag:
1101+
type: string
1102+
description: The image tag.
1103+
....
1104+
\end{lstlisting}
1105+
1106+
This results in the following message being sent to request the execution
1107+
of a \dockercontainer{}.
1108+
1109+
\begin{lstlisting}[]
1110+
# ExecutionBroker request.
10261111
request:
1112+
10271113
# Details of the executable.
10281114
executable:
10291115

1030-
# A URI identifying the type of executable.
1031-
type: "https://www.purl.org/ivoa.net/executable-types/example"
1116+
# Common fields from the AbstractExecutable
1117+
name: Experiment one
1118+
type: uri:docker-container-1.0
1119+
1120+
# The details, specific to a Docker container executable.
1121+
repository: ghcr.io
1122+
image: ivoa/analytics/Newton-Rahpson-example
1123+
tag: 2024.08.30
10321124

1033-
# The details, specific to the type of executable.
1034-
spec: {}
10351125
\end{lstlisting}
10361126

10371127
\subsubsection{\jupyternotebook{}}

0 commit comments

Comments
 (0)