-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dtype_numpy
with medium-effort validation
#315
Conversation
f6667a8
to
76e53f7
Compare
Running the schema generation seems to remove $ git diff
diff --git a/event_model/schemas/event_descriptor.json b/event_model/schemas/event_descriptor.json
index 3549af5..5958023 100644
--- a/event_model/schemas/event_descriptor.json
+++ b/event_model/schemas/event_descriptor.json
@@ -112,12 +112,6 @@
"description": "The 'interesting' data keys for this device.",
"type": "object",
"properties": {
- "NX_class": {
- "title": "Nx Class",
- "description": "The NeXus class definition for this device.",
- "type": "string",
- "pattern": "^NX[A-Za-z_]+$"
- },
"fields": {
"title": "Fields",
"description": "The 'interesting' data keys for this device.", |
I'll take a look! |
Thank you, @evalott100! |
This is very confusing - everything is working as it should on my end (copying that diff --git a/event_model/documents/event_descriptor.py b/event_model/documents/event_descriptor.py
index 3fa799a..8e565d4 100644
--- a/event_model/documents/event_descriptor.py
+++ b/event_model/documents/event_descriptor.py
@@ -1,4 +1,4 @@
-from typing import Any, Dict, List, Optional
+from typing import Any, Dict, List, Optional, Tuple, Union
from typing_extensions import Annotated, Literal, NotRequired, TypedDict
@@ -23,6 +23,19 @@ class DataKey(TypedDict):
Dtype,
Field(description="The type of the data in the event."),
]
+ dtype_numpy: NotRequired[
+ Annotated[
+ Union[
+ str, # e.g. "<u4",
+ List[Tuple[str, str]], # e.g. [("a", "<u4"), ("b", "<f8")]
+ ],
+ Field(
+ description="The type of the data in the event, given as a numpy dtype string (or, for structured dtypes, array)",
+ pattern="[|<>][tbiufcmMOSUV][0-9]+",
+ ),
+ # TODO Enforce the regex pattern.
+ ]
+ ]
external: NotRequired[
Annotated[
str,
diff --git a/event_model/schemas/datum_page.json b/event_model/schemas/datum_page.json
index a09dc59..7d34f98 100644
--- a/event_model/schemas/datum_page.json
+++ b/event_model/schemas/datum_page.json
@@ -14,11 +14,7 @@
"properties": {
"datum_id": {
"description": "Array unique identifiers for each Datum (akin to 'uid' for other Document types), typically formatted as '<resource>/<integer>'",
- "allOf": [
- {
- "$ref": "#/$defs/DataFrameForDatumPage"
- }
- ]
+ "$ref": "#/$defs/DataFrameForDatumPage"
},
"datum_kwargs": {
"title": "Datum Kwargs",
diff --git a/event_model/schemas/event_descriptor.json b/event_model/schemas/event_descriptor.json
index 3549af5..5195a23 100644
--- a/event_model/schemas/event_descriptor.json
+++ b/event_model/schemas/event_descriptor.json
@@ -52,6 +52,31 @@
"integer"
]
},
+ "dtype_numpy": {
+ "title": "Dtype Numpy",
+ "description": "The type of the data in the event, given as a numpy dtype string (or, for structured dtypes, array)",
+ "anyOf": [
+ {
+ "type": "string"
+ },
+ {
+ "items": {
+ "maxItems": 2,
+ "minItems": 2,
+ "prefixItems": [
+ {
+ "type": "string"
+ },
+ {
+ "type": "string"
+ }
+ ],
+ "type": "array"
+ },
+ "type": "array"
+ }
+ ]
+ },
"external": {
"title": "External",
"description": "Where the data is stored if it is stored external to the events",
... FURTHER REMOVAL OF UNNECESSARY allOf |
And I get the same correct output as above on your branch directly (with python 3.10/3.11) Were you using an older venv? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the regex work? If would be interesting to see a couple of examples (just string and structured dtype) that fail validation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to guess the regex does nothing, as it doesn't appear in the schema. @evalott100 can you think of a way round this? We want it to be Annotated[str, something_that_checks_for_regex] | List[Tuple[str, Annotated[str, something_that_checks_for_regex]
, but I don't know if you can do that with pydantic...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In pydantic 2 they want to use Field
instead of the constr
wrapper in the type itself... which is annoying. The following should work (I'll try tomorrow).
_ConstrainedDtype = Annotated[str, Field(pattern="[|<>][tbiufcmMOSUV][0-9]+")]
numpy_dtype: Union[
_ConstrainedDtype,
List[Tuple[str, _ConstrainedDtype]]
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, constr
just does this directly:
return Annotated[ # pyright: ignore[reportReturnType]
str,
StringConstraints(
strip_whitespace=strip_whitespace,
to_upper=to_upper,
to_lower=to_lower,
strict=strict,
min_length=min_length,
max_length=max_length,
pattern=pattern,
),
]
so I'd be surprised if the above doesn't work (perhaps swapping Field
to StringConstraints
from pydantic.types
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup this works:
_ConstrainedDtype = Annotated[str, Field(pattern="[|<>][tbiufcmMOSUV][0-9]+")]
class MyModel(BaseModel):
numpy_dtype: Union[
_ConstrainedDtype,
List[Tuple[str, _ConstrainedDtype]]
]
errors = []
for dtype in (
"<u4", # Passes
">u4", # Passes
">S1000", # Passes
">uS4", # Fails
"u4", # Fails
"u", # Fails
">4u", # Fails
[("hi", "<U4"), ("hi", "<U4")], # Passes
[("hi", "<U4"), ("hi", ">U4")], # Passes
[("hi", "<U4"), ("hi", ">S1000")], # Passes
[("hi", "<U4"), ("hi", ">uS4")], # Fails
[("hi", "<U4"), ("hi", "u4")], # Fails
[("hi", "<U4"), ("hi", "u")], # Fails
[("hi", "<U4"), ("hi", ">4u")], # Fails
):
try:
model = MyModel(numpy_dtype=dtype)
print(model)
except ValueError as e:
errors.append(e)
[2 validation errors for MyModel
numpy_dtype.constrained-str
String should match pattern '[|<>][tbiufcmMOSUV][0-9]+' [type=string_pattern_mismatch, input_value='>uS4', input_type=str]
For further information visit https://errors.pydantic.dev/2.5/v/string_pattern_mismatch
numpy_dtype.list[tuple[str, constrained-str]]
Input should be a valid list [type=list_type, input_value='>uS4', input_type=str]
For further information visit https://errors.pydantic.dev/2.5/v/list_type,
2 validation errors for MyModel
numpy_dtype.constrained-str
String should match pattern '[|<>][tbiufcmMOSUV][0-9]+' [type=string_pattern_mismatch, input_value='u4', input_type=str]
For further information visit https://errors.pydantic.dev/2.5/v/string_pattern_mismatch
numpy_dtype.list[tuple[str, constrained-str]]
Input should be a valid list [type=list_type, input_value='u4', input_type=str]
For further information visit https://errors.pydantic.dev/2.5/v/list_type,
2 validation errors for MyModel
numpy_dtype.constrained-str
String should match pattern '[|<>][tbiufcmMOSUV][0-9]+' [type=string_pattern_mismatch, input_value='u', input_type=str]
For further information visit https://errors.pydantic.dev/2.5/v/string_pattern_mismatch
numpy_dtype.list[tuple[str, constrained-str]]
Input should be a valid list [type=list_type, input_value='u', input_type=str]
For further information visit https://errors.pydantic.dev/2.5/v/list_type,
2 validation errors for MyModel
numpy_dtype.constrained-str
String should match pattern '[|<>][tbiufcmMOSUV][0-9]+' [type=string_pattern_mismatch, input_value='>4u', input_type=str]
For further information visit https://errors.pydantic.dev/2.5/v/string_pattern_mismatch
numpy_dtype.list[tuple[str, constrained-str]]
Input should be a valid list [type=list_type, input_value='>4u', input_type=str]
For further information visit https://errors.pydantic.dev/2.5/v/list_type,
2 validation errors for MyModel
numpy_dtype.constrained-str
Input should be a valid string [type=string_type, input_value=[('hi', '<U4'), ('hi', '>uS4')], input_type=list]
For further information visit https://errors.pydantic.dev/2.5/v/string_type
numpy_dtype.list[tuple[str, constrained-str]].1.1
String should match pattern '[|<>][tbiufcmMOSUV][0-9]+' [type=string_pattern_mismatch, input_value='>uS4', input_type=str]
For further information visit https://errors.pydantic.dev/2.5/v/string_pattern_mismatch,
2 validation errors for MyModel
numpy_dtype.constrained-str
Input should be a valid string [type=string_type, input_value=[('hi', '<U4'), ('hi', 'u4')], input_type=list]
For further information visit https://errors.pydantic.dev/2.5/v/string_type
numpy_dtype.list[tuple[str, constrained-str]].1.1
String should match pattern '[|<>][tbiufcmMOSUV][0-9]+' [type=string_pattern_mismatch, input_value='u4', input_type=str]
For further information visit https://errors.pydantic.dev/2.5/v/string_pattern_mismatch,
2 validation errors for MyModel
numpy_dtype.constrained-str
Input should be a valid string [type=string_type, input_value=[('hi', '<U4'), ('hi', 'u')], input_type=list]
For further information visit https://errors.pydantic.dev/2.5/v/string_type
numpy_dtype.list[tuple[str, constrained-str]].1.1
String should match pattern '[|<>][tbiufcmMOSUV][0-9]+' [type=string_pattern_mismatch, input_value='u', input_type=str]
For further information visit https://errors.pydantic.dev/2.5/v/string_pattern_mismatch,
2 validation errors for MyModel
numpy_dtype.constrained-str
Input should be a valid string [type=string_type, input_value=[('hi', '<U4'), ('hi', '>4u')], input_type=list]
For further information visit https://errors.pydantic.dev/2.5/v/string_type
numpy_dtype.list[tuple[str, constrained-str]].1.1
String should match pattern '[|<>][tbiufcmMOSUV][0-9]+' [type=string_pattern_mismatch, input_value='>4u', input_type=str]
For further information visit https://errors.pydantic.dev/2.5/v/string_pattern_mismatch]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@evalott100 do you have time to turn that into a test and finish off this PR? I'm guessing @danielballan is busy at NOBUGS...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, can do now. I'm curious why Dan's schema generation wasn't working.
Add
dtype_numpy
to Event Descriptor.Supersedes #215.