Add origin_url field #94

spbnick · 2020-07-15T09:08:49Z

Add an "origin_url" field to every schema object, pointing to the object
within, and served by, the origin CI system.

Fixes #93

Add an "origin_url" field to every schema object, pointing to the object within, and served by, the origin CI system. Fixes #93

spbnick · 2020-07-15T09:09:26Z

@gctucker, here's the addition of the URL I was talking about at the meeting yesterday.

gctucker

I wonder what the value of adding this is, given that we don't have the basic data schema really used in production yet. It seems that having links to where the data came from might be useful it but should not really be necessary in order to make a functional database. Would the submitted data not speak for itself?

gctucker · 2020-07-16T12:14:27Z

kcidb/io/schema/v3.py

+                    "format": "uri",
+                    "description":
+                        "The URL of the environment in the origin CI system",
+                    "examples": ["https://kernelci.org/soc/allwinner/"],


This is not really an environment, it's a family of devices from a same vendor.

I don't think there is anything that matches the field description on the kernelci.org dashboard. A link to the test platform in a LAVA lab would probably be a bit more relevant, for example:
https://lava.collabora.co.uk/scheduler/device/bcm2836-rpi-2-b-cbg-0

although that's not stored in the kernelci-backend database. What is stored is the name of the test lab and the name of the platform i.e. bcm2836-rpi-2-b for this Raspberry Pi 2b, which is more generic than a specific instance in a test lab.

Does CKI have some view to show details of a runtime environment or test platform?

I see the "environment" as a way to identify something where test executed, with some precision. As much precision as the submitter can afford. Its only purpose for KCIDB itself is to determine which tests executed in a similar-enough environment, so e.g. we can say the results should be the same, and can group them in the report, the dashboard, or take into account when locating the breaking commit. I still don't know how exactly I would implement or organize this, though.

If KernelCI doesn't expose the reported environment on the dashboard with similar precision, then it can choose not to provide a link here, or provide this link, even though it's of lower precision, just to have something. Or it can provide the Lava link you post.

For the purpose of example, I think the link here is OK. The Lava link would be better, though. Would you mind me using it even though Kernel CI wouldn't provide it?

CKI only has hostnames, I think, we can always link to Beaker which has very detailed description of the host. That might never be public, though, so we will probably not going to be using it, instead providing as much information as we can in the environment object itself (once we have the fields described).

KernelCI uses device types, which is basically a name for an "execution environment". At least that covers the immutable part of the environment, i.e. a hardware board with some firmware or a virtual device with a particular configuration. Then each test has some extra parts of the environment such as a root file system or a Docker image with test suites, which changes sometimes but is still part of the environment. The real moving part is the kernel.

So if different labs have the same Raspberry Pi, or a lab has several of them, results for any of them will appear as for the same device type. There just isn't a view on the current dashboard to show all the information specific to a particular device type, or any particular device instance.

I see the "environment" as a way to identify something where test executed, with some precision. As much precision as the submitter can afford.

That's the lab name and device type name as far we're concerned at the moment. I believe the actual instance name is also stored in the database although not shown on the dashboard, at least I think the field for it is still there.

I see the value of this kind of meta-data. But to me, that's rather different to a URL on a web interface.

So rather than using origin_url fields, maybe something like origin_metadata could be used with more arbitrary fields depending on the submitter? For LAVA labs it will be the lab name and the device type, and maybe device instance name. For your CKI/Beaker results, it will be the hostname or whatever works from your point of view.

I see. Thank you for explaining how Kernel CI identifies the devices, it will help me come up with a schema to actually support it. I think we more or less in agreement on what the essence of an environment is, and its importance.

Now, origin_url has nothing to do with identifying the environment. That's a job for yet-to-be defined fields.

Regarding origin_metadata, we have misc exactly for that, in environments as well.

The origin_url is just an escape hatch, for humans to reach the origin's representation of the same object (if available), with more data and more features than the implementation-in-progress can afford. In this way, it is similar to misc, which actually worries me, because it would be easier to just plop the link to your own web UI instead of submitting the data we might need to store and correlate. That would be an argument against it, IMO, and one I'm starting to find more and more weighty. Hmm...

spbnick · 2020-07-16T12:53:05Z

I wonder what the value of adding this is, given that we don't have the basic data schema really used in production yet. It seems that having links to where the data came from might be useful it but should not really be necessary in order to make a functional database. Would the submitted data not speak for itself?

This is just to provide a way to reach more data and functionality at the origin while we ramp up support for it. Missing fields, fancy graphs, links to other data/objects, etc. We gotta start offering people our reports and dashboard before we have full parity, and these fields could smooth the transition.

I.e. this is not to make the database functional, but to make it easier for both report submitters and developers to make the decision to try our system, even though it might not have all the features they're used to yet. At least they'll be able to reach the original data, if something is missing.

schema: Add "origin_url" field to every object

3eafd3d

Add an "origin_url" field to every schema object, pointing to the object within, and served by, the origin CI system. Fixes #93

spbnick assigned gctucker Jul 15, 2020

gctucker reviewed Jul 16, 2020

View reviewed changes

Base automatically changed from master to main January 13, 2021 10:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add origin_url field #94

Add origin_url field #94

spbnick commented Jul 15, 2020

spbnick commented Jul 15, 2020

gctucker left a comment

gctucker Jul 16, 2020

spbnick Jul 16, 2020

spbnick Jul 16, 2020

gctucker Jul 16, 2020

spbnick Jul 16, 2020

spbnick commented Jul 16, 2020

Add origin_url field #94

Are you sure you want to change the base?

Add origin_url field #94

Conversation

spbnick commented Jul 15, 2020

spbnick commented Jul 15, 2020

gctucker left a comment

Choose a reason for hiding this comment

gctucker Jul 16, 2020

Choose a reason for hiding this comment

spbnick Jul 16, 2020

Choose a reason for hiding this comment

spbnick Jul 16, 2020

Choose a reason for hiding this comment

gctucker Jul 16, 2020

Choose a reason for hiding this comment

spbnick Jul 16, 2020

Choose a reason for hiding this comment

spbnick commented Jul 16, 2020