-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add extra_slots metamodel slot #205
Conversation
i am told that there is prior art here? https://mapping-commons.github.io/sssom/spec-model/ |
This makes me nervous but I agree 100% that many users want it, and if we are going to allow it, this seems like a step in the right direction. I never would have thought of specifying the maximum cardinality of extra slots, but I suppose it wouldn't hurt. I would be interested in seeing some examples of the limit solving a problem. I'm also interested to hear how SSSOM provides a solution for this that LinkML could follow. @sneakers-the-rat do you think your implementation could limit extra slots to scalar key/value pairs? Maybe constraining the range to string would do that, in the sense that the extra slot's value could never be an instance of some class. |
Are
This says to me: "total of 3 extra slots, each with 5 parameters each"? |
Also - how does it play with the |
I get it. I often find myself of several minds working on linkml, and there's this recurring 3-way tension between how we think schemas should be modeled, what is *possible to express," and what is feasible to implement. I think allowing arbitrary extra items probably scores low on the "should" scale (why not add those things to the model), though not always true for eg. property-centric frameworks, but very high on the "expressiveness" scale, and relatively high on the implementability scale for those frameworks where it's possible. In this case i'm modeling a schema/format/standard for whom constraints on arbitrary extra fields are a core part of the format. I personally would have probably designed it differently, but it exists and it would be nice to be able to express it.
Don't know it, but yes as always let's integrate with prior art if possible. I am partly making this PR to try and get the ball rolling on this rather than trying to say "this is definitely what we should do and i have considered all options" because the other issues without a PR on the table were languishing.
I think that allowing the slot range is relatively important, and if we don't add it we'll want to do so later. the two examples i was giving in the prior issue were JSON Schema and Pydantic, both of which allow you to set what would be partial/anonymous slot schemas in linkml for extra properties. specifically allowing class ranges in the version in this PR doesn't get us all the way into abstractionland (eg. one can't set a pattern constraint on what additional fields
haha yes see this is the exact ambiguity i'm talking about. The So by itself, MyClass:
extra_slots:
range: string
multivalued: true
maximum_cardinality: 3 would behave like MyClass:
attributes:
extra_1:
range: string
multivalued: true
maximum_cardinality: 3
extra_2:
range: string
multivalued: true
maximum_cardinality: 3
# ... which is why i added this example to clarify that explicitly. The other way would be mixing levels, where it would become ambiguous whether we were talking about all or each - eg if
i'm not finding this, can you link me to the definition? |
I got it from the linked issue - linkml/linkml#1404 looks like it's a flag to a specific generator. but my question still is bugging me a bit - do we have to consider that flag and how it behaves with this change? |
linkml_model/model/schema/meta.yaml
Outdated
domain: class_definition | ||
ifabsent: false | ||
any_of: | ||
- range: boolean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My preference is to avoid any of in the metamodel (we also had this discussion in Berkeley:-). One reason is that it means the metamodel can't be mapped to a relational model without bespoke transformations (thus breaking eg Ben's Django editing workflow). It could also impede mapping of linkml to other target languages in future. I know this seems like favoring implementers over users but in this case I think the user may be better served by a more explicit way to declare unrestrictedness?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated OP with another possible example that avoids this :)
from LinkML dev call - @sneakers-the-rat - would you mind coming next week to discuss at our next dev call? (or we can schedule a one-off if this is urgent). |
I would think that flag gets deprecated in favor of specifying it in the schema, with ample transition time where it basically behaves like |
Had a few minutes before needing to run, but updated OP with a class slot option that avoids |
I favor the 2nd option. It also makes it easier for frameworks to declaratively declare their conformance profile (e.g. json-schema can declare it supports If you like we can proceed with For constraining the expression part, I would like to make sure we are future proofing for future changes. See my comments here linkml/linkml#2241 (comment) on the common use case of allowing additional slots if their names match certain patterns. I am not sure how obscure a use case this is (or how obscure the ability to semi-constrain additional slots is). It would be good to compile examples outside of NWB/HDMF where this is allowed. We also need to decide on inheritance semantics. What if I want to declare at a base class level that any extra slots should be integers, but I will leave it to subclasses to decide whether this is switched on? I think it's easier to permit this kind of behavior than forbid it. The semantics here should be monotonic, i.e you can progressively constrain not relax. |
This is related to the "level shifting" problem discussed above re: whether
This is related to the previous discussion in #2241 on the difference between specifying addition slots provided by the data and additional slots defined in inheriting classes. Re: monotonicity, since there isn't a general way this is enforced/declared in the metamodel or schemaview, I would think that it would take on the same conventional status as the rest of linkml: it is currently technically possible to override parent class declarations for all but min/max value last I checked, but the conventional expectation is monotonicity. |
Updated OP with examples in JSON schema and pydantic |
Just a general note: we have been discussing this proposal in relation to pydantic and json-schema, but shacl has an analogous mechanism See https://www.w3.org/TR/shacl/#ClosedConstraintComponent (I don't know if we also want to have something like shacl's ignore here too) |
Is there a way to express constraints for extra unspecified properties as well? Or would that just be adding a union onto the general property constraint? I see the ignoredProperties constraint, but that looks like it's just limiting to a (list of) URIs for slots/props |
@sneakers-the-rat - thank you very much for the addition doc here! Option 2 is terrific - and now I understand it! :D |
OK i have updated the impl here to be the class slot. I reused How does this look? |
Related to:
extra
metaslot to define behavior of extra values in classes. linkml#2241Many modeling frameworks allow one to specify how to handle extra values provided to an instance of a class. By default linkml forbids all additional data in those frameworks that allow that. In addition to declaring whether or not extra data is allowed, it would also be nice to constrain what type that extra data can be. There are no other existing linkml metamodel items that could be used for this that i am aware of, but please let me know if we can repurpose something existing here.
Options
Option 1: Single slot
(current contents of PR, simplified)
Option 2: Class slot
If we want to avoid doing
any_of
in the metamodel, we could also do something like this:I'm not sure if there is a general "allowed" slot, ctrl+f isn't finding one, but seems better than making a single-purpose "extra_slots_allowed" slot. Maybe
slot_expression
should beanonymous_slot_expression
here but that seems like a lot to type when defining a schema lol, idk if that is incorrect semantically.this makes one awkward case which is syntactically possible but semantically impossible
And also doesn't allow us to
ifabsent
allowed
toFalse
because aslot_expression
should be able to be specified without explicitly settingallowed: true
imo. So the "defaultallowed: False
" behavior becomes "extra slots are not allowed ifallowed: false
orallowed: null && slot_expression: null
"Examples
the examples in the PR and the prior issue give examples of expected use, but for the sake of recordkeeping:
Allow all extra slots:
JSON Schema
assuming
"$schema": "https://json-schema.org/draft/2020-12/schema"
for all these, and adding a dummy "foo" slot because it's just empty otherwisePydantic
Allow no extra slots (default)
or undefined
or undefined
JSON Schema
Pydantic
(in reality we wouldn't set this, because it's both pydantic's default and also the default in the
ConfiguredBaseClass
, but for illustration...)Constrain extra slots by slot expression
Simple Types
Only allow additional strings
JSON Schema
additionalProperties
also accepts a schema object...Pydantic
Unions
JSON Schema
Pydantic
Class Ranges
Allow extra slots if they are instances of the class
SecondClass
JSON Schema
Pydantic
Discussion
Ambiguity
The two major points of ambiguity that i can see
maximum_cardinality
to limit the number of extra slots that could be provided, or make providing extra slotsrequired
. Added examples to clarify these behaviors - I am not sure if there is a circumstance where we would want to support that, but if we did then we could turn this into a class so someone could specify something like this, which would be backwards compatible with anyanonymous_slot_expression
definitions that happen in the meantime:to mean "there can be at most 5 extra slots that are lists of strings at most length 3"
Naming
"Extra slots" might be a bit too specific, are there other cases where we would want to allow/deny extra things in a domain? @sierra-moxon brings up the name "
closed
" which has currency in closed-world/open-world parlance of RDF and formal logic circles, but is less obvious to your average data modeler.This is high priority for me since it's blocking final implementation of nwb-linkml, so if we can make relatively short work of this then i'll implement it for pydanticgen, pythongen, and json schema gen as thanks for the quickness.