-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataflow #9
Comments
First idea (german only):
|
TODO @nbrinckm Make it better (maybe with the payloads, maybe with some pseudo code for the db lookip) & in english. |
Proposal for a possible data flow2022-07-27 12:24 CEST The ProblemWe want to have a data flow that allows us to run processes that depend on multiple complex inputs, which are complex outputs of other processes. The overall workflow shouldn't need any further user interaction (except the creation of this order) and the data flow should be completely asynchronosly - and without any unneeded sleeping time. The example use case aka a more concrete exampleIn our current flow in the riesgos demonstrator such an example is the process to compute damages (output of the deus process) based on a shaking intensity of an earthquake (shakyground), an exposure model (assetmaster) and the fragiligy functions (modelprop). A possible data structureTo have some book keeping about what the user asks for (order) and the already calcuated results, we have a database in the background. See #6 The current datamodel on time of writing is the following: create table processes (
id serial,
wps_url varchar(256),
wps_identifier varchar(256),
primary key (id)
);
create table users (
id serial,
email varchar(256),
primary key (id)
);
create table orders (
id serial,
user_id bigint,
order_constraints jsonb,
primary key (id)
);
create table jobs (
id serial,
process_id bigint,
status varchar(16),
primary key (id),
foreign key (process_id) references processes(id)
);
create table order_job_refs (
id serial,
job_id bigint,
order_id bigint,
primary key (id),
foreign key (order_id) references orders(id),
foreign key (job_id) references jobs(id)
);
create table complex_outputs (
id serial,
job_id bigint,
wps_identifier varchar(256),
link varchar(1024),
mime_type varchar(64),
xmlschema varchar(256),
primary key (id),
foreign key (job_id) references jobs(id)
);
create table complex_outputs_as_inputs(
id serial,
wps_identifier varchar(256),
job_id bigint,
-- To refer to an already existing complex output
-- that we are going to reuse.
complex_output_id bigint,
primary key (id),
foreign key (job_id) references jobs(id),
foreign key (complex_output_id) references complex_outputs(id)
);
create table complex_inputs (
id serial,
job_id bigint,
wps_identifier varchar(256),
link varchar(1024),
mime_type varchar(64),
xmlschema varchar(256),
primary key (id),
foreign key (job_id) references jobs(id)
);
create table literal_inputs (
id serial,
job_id bigint,
wps_identifier varchar(256),
input_value text,
primary key (id),
foreign key (job_id) references jobs(id)
);
create table bbox_inputs (
id serial,
job_id bigint,
wps_identifier varchar(256),
lower_corner_x real,
lower_corner_y real,
upper_corner_x real,
upper_corner_y real,
crs varchar(32),
primary key (id),
foreign key (job_id) references jobs(id)
); The flowNew OrderIn the UI the user starts the request for a new order. Here is defined what should be computed, what the comutation is based on & what possible input parameter constraints are. In this case the user selects an earthquake (an explicit one that the frontend loaded from quakeledger before - our database catalog) in the fronented & decides to calculate the later products with the 2 ground motion prediction equations, 2 exposure models, and 1 schema. With this we define the following constraints: {
"shakyground": {
"quakeMLFile": "<link_to_geojson>",
"gmpe": ["Abrahamson", "Montalva"],
},
"assetmaster": {
"model": ["LimaCVT4_PD40_TI60_5000", "LimaCVT3_PD30_TI70_50000"]
},
"modelprop": {
"schema": "SARA_v1.0"
}
} We put this order in the
After we stored that in the database (we need to store it before other processes are trying to read it), we can send a message to the pulsar:
{
"orderId": 1,
} Ground motion simulationThe shakyground service listens to the It extracts the order id & queries the database. select order_constraints from orders where id = :order_id With those constraints it can extract the value for It will create a loop & uses each of them to do the further processing. The second run will not be different from the first run of the wrapper internal logic (except the different parameter). As we also specified different With that the shakyground wrapper will write in the database that it starts the processing & will create new entries for jobs:
literal_inputs:
complex_inputs:
We also link the order with the job: order_jobs_ref:
Now that we have those in place we call the real wps that does the work.
And when the wps call is done, we can update the job entry again:
After that we can put the outputs in our
With that we are done with this processing within the loop with the explicit parametrization of shakyground with Abrahamson So now we publish a message on the pulsar - and on the {
"orderId": 1,
} We also run the processing for the other parameter combinations of vsgrid and gmpe. For each parameter combination we create a new job, associate those with the order & put all the inputs & outputs in the database. First try damage computationThe deus wrapper for the damage computation listens on multiple topics - one of them is It extracts the order id and starts to extract the inputs for itself. select processes.wps_identifier as process_identifier, complex_outputs.*
from complex_outputs
join jobs on jobs.id = complex_outputs.job_id
join order_job_refs on order_job_refs.job_id = jobs.id
join processes on processes.id = jobs.process_id
where order_job_refs.order_id = :order_id We then get the results from our shakyground service so far:
It can extract the shakeMapFile with text/xml - so deus has the input that it could use as intensity file. However, it doesn't find anything from assetmaster, nor modelprop. As it has no data for those, it will stop processing right now. Exposure model (Assetmaster)Assetmaster listens to the select order_constraints from orders where id = :order_id With that we now that we have 2 model parameters that we can use. Similar to shakyground, we create a job, with inputs & outputs in the database for each parameter combination. Deus after assetmasterDeus is triggered also by the Fragility functions aka ModelpropHere it is analog to assetmaster & shakyground. We send messages to Finally some deus processingAs we also listen to select processes.wps_identifier as process_identifier, complex_outputs.*
from complex_outputs
join jobs on jobs.id = complex_outputs.job_id
join order_job_refs on order_job_refs.job_id = jobs.id
join processes on processes.id = jobs.process_id
where order_job_refs.order_id = :order_id We then get the results from our shakyground service so far:
With those we now that we can make a run of deus. We create a new entry in the jobs table, fill the literal_input for the schema (see extra point for the problem about how we could extract this). For the complex inputs, we make it a little bit different - we store them in the
Now we can start the deus process on the WPS, update the job data accordingly & set the complex outputs once we are done. Once we are done with the job, we send a message with the order id to the When modelprop, assetmaster and shakyground provide new dataAs deus listens to So the moment we checked for the input data of the processes & we extract the parameter combinations that we can run for, we also need to extract the combinations for that we already computed the resuts for (or started a job some seconds ago). select processes.wps_identifier as process_identifier, complex_outputs.*
from complex_outputs_as_inputs
join complex_outputs.id = complex_outputs_as_inputs.complex_output_id
join jobs on jobs.id = complex_outputs.job_id
join order_job_refs on order_job_refs.job_id = jobs.id
join processes on processes.id = jobs.process_id
where order_job_refs.order_id = :order_id So, with those we know for which parameter combinations we already run deus & can remove those from the sets for that deus is going to run. ProblemsHow to extract the schema for the deus call?It should be possible, as we know the job id that gave us the complex_output of assetmaster. select *
from literal_inputs
where job_id = :job_id
and wps_identifier = 'schema' (But maybe this will require a bit more of thinking). |
Addition to re-use existing products2022-07-28 13:34 CEST The ProblemWe want to allow user requests that start with a an existing product (aka job output). An exampleWe have an concreate earthquake near the chilenian coast (https://geofon.gfz-potsdam.de/eqinfo/event.php?id=gfz2022oqjb) with magnitude 8.1, depth 60 km (values as the time of writing - real world geofon value may change as the scientsits get more information about it). We already had an order that created this shakemap. Orders:
Jobs:
We have its inputs & outputs: LiteralInputs:
ComplexInputs:
ComplexOutputs:
The old job is also still associated with an old order: order_job_ref:
The order constraintsThe main place to define that we want to reuse the existing shakyground are the order constraints. We would put something in the database like this: {
"shakyground": {
"job_id": 42,
},
// ...
} Now we put our new order in the database: Orders:
And emit a new message in the {
"orderId": 14
} The processing on shakygroundThe shakyground wrapper listens to the With that we don't need to start a new job, but instead just link the existing job to our new order. order_job_ref:
The wrapper doesn't have to do anything more than emiting a message on the {
"orderId": 14
} With this relationship the later processes can extract the output data of the old jobs for their own order id. The processing in deusAs before, deus listens to the For the wrapped deus process there is no change. Just another kind of cachingIn any case a process with its wrapper can decice to either start the main processing (calling the wps itself + do all the bookkeeping in the database) or to associate an existing job with its new order id & just emit a messge on the success topic. In any case the later processes are able to extract the needed input values from the database by checking the jobs that are linked to the order. There are two options to do this:
Suggestion for the order constraints structureWith the idea to use the job id in the constraints, designing the structure of the constraints gets a little bit more complicated. My best idea so far is the following: {
"serviceA": {
"job_id": 123,
},
"serviceB": {
"literal_inputs": {
"parameter1": ["Possible value 1", "possible value 2"],
"parameter2": ["One value that needs to be used"],
},
"complex_inputs": {
"parameter3": ["https://url/to/first/parameter/option", "https://url/to/second/paramaeter/option",
"parameter4": ["https://some/parameter/that/needs/to/be/used"],
}
} If there are more steps inbetweenThe settingIn the skakyground & deus example, we have the case that we define a result for shakyground which already listens to the However, imagine we don't start with shakyground here, but have another service that needs to be called before. This process then directly listens to the So instead of
we have
We still want to fix the shakemap, so we give a job_id in the order constraints. A problemWhen we commit an order the imaginary service doesn't care about the shakygrounds constraints, it either starts anyway & computes everything (triggering shakyground later too), or it stops as it has not enough contraints nor data. (And as the requesting user we don't care about it, as we already said that we want to start with the exact shakyground result). In the later case it doesn't even emit a message on its success topic. The shakyground wrapper will not run, nor the deus wrapper. Possible ideasThere are some ideas to address that: 1. Add more constraintsWe also put the constraints for the jobs that delivered the shakemap. The earlier jobs can just give success messages as the data is already in the database. {
"imaginary": {
"job_id": 41,
},
"shakyground": {
"job_id": 42,
},
// ...
} It looks like a tight coupling in the (we need to send all the job ids that gave us the shakemap), but it could be extracted from the database easily. 2. Let all processes listen to new ordersWe let also the later processes (in this case shakyground; in the real case it would be deus & the system reliability service) listen to the
But this last example would have require that the process needs to have some kind of constraints to run overall. If the process would need no constraints, it would still start processing (unless there really is the job id also for the process in the constraints). |
No description provided.
The text was updated successfully, but these errors were encountered: