-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate jsonschema from pydantic v2 #159
base: main
Are you sure you want to change the base?
Conversation
….com/dbt-labs/dbt-jsonschema into generate-jsonschema-from-pydantic-v2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to move this to src/generate.py
?
These schemas are generated from [pydantic models](https://docs.pydantic.dev/latest/concepts/json_schema/). To make updates, the process is as follows: | ||
|
||
1. Create a virtual environment and install the dependencies: `pip install -r requirements.txt` | ||
2. Make changes to the corresponding pydantic models in `src/latest.py` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you meant to reference a directory, not a file
2. Make changes to the corresponding pydantic models in `src/latest.py` | |
2. Make changes to the corresponding pydantic models in `src/latest` |
@@ -0,0 +1 @@ | |||
datamodel-code-generator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be necessary, right? This package was only necessary to initially transform json-schema to Pydantic, right?
How concerning is this? Is this something we should dig deeper into? |
schema_: Optional[str] = Field(None, alias="schema") | ||
|
||
|
||
class SemanticModel(BaseModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we follow the same order as https://github.com/dbt-labs/ai-codegen-api/blob/main/src/codegen/llm_output/semantic_model.py#L163
agg_time_dimension: Optional[str] = Field( | ||
None, pattern="(?!.*__).*^[a-z][a-z0-9_]*[a-z0-9]$" | ||
) | ||
create_metric: Optional[bool] = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the default should be None? https://github.com/dbt-labs/ai-codegen-api/blob/main/src/codegen/llm_output/semantic_model.py#L156
extra="forbid", | ||
) | ||
time_granularity: TimeGranularity | ||
validity_params: Optional[ValidityParams] = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This typing doesn't seem correct? Empty string is not None | ValidityParams
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is to prevent the autocomplete from outputting null. An alternative is to instead specify the default in the json_schema_extra of the Field
, which would mean this default could stay as None. Is the difference semantically important to Instructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh got it, I don't have whatever extension that is so vs code didn't flag it to me. In that case if we keep this we'll have to put everything in the json_schema_extra space instead which is all good
type_params: DimensionTypeParams | ||
|
||
|
||
class Dimension(RootModel[Union[CategoricalDimension, TimeDimension]]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we write this as a union like this https://github.com/dbt-labs/ai-codegen-api/blob/main/src/codegen/llm_output/semantic_model.py#L66
Yeah holding off on moving this forward, pending feedback on the loom i made for drew/marco |
Picks up where #150 and #157 left off. The purpose of this PR is to add Pydantic representations of dbt's YAML files, which can then be used to automatically generate JSON Schema files.
This was instigated because the Pydantic representations can be used as inputs for AI-powered codegen experiences, but in doing so we've also seen that it's easier/less confusing to write Pydantic models than straight JSON Schema because it's less nested and you can achieve inheritance more simply. (As an example, consider the representations of unit testing - a complex
if then else
block in JSON Schema, but three simpler classes and a union in Pydantic.)As far as I can tell, the validation is as good as the original versions. Some autocomplete offerings are a bit worse for some reason, even though they get rendered down to basically the same files. I was sometimes seeing
[Object object]
being output in VS Code, which I think is a bug in the YAML extension.