Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix] Add logger for compiler and marshal while comparing union #6034

Merged
merged 34 commits into from
Nov 20, 2024
Merged
Changes from 1 commit
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
f06cdc6
feat: fix Union type with dataclass ambiguous error
mao3267 Oct 18, 2024
8660db5
Merge branch 'master' of https://github.com/mao3267/flyte into fix/#5…
mao3267 Nov 1, 2024
47ccbd1
fix: direct json comparison for superset
mao3267 Nov 1, 2024
85489dc
fix: go.mod missing entry for error
mao3267 Nov 1, 2024
cc685bb
fix: update go module and sum
mao3267 Nov 1, 2024
3a629e1
refactor: gci format
mao3267 Nov 1, 2024
aa4d98e
test: add dataset casting tests for same (one/two levels) and supers…
mao3267 Nov 1, 2024
b282e5f
Merge branch 'master' of https://github.com/mao3267/flyte into fix/#5…
mao3267 Nov 8, 2024
818afb7
fix: support Pydantic BaseModel comparison
mao3267 Nov 8, 2024
d6468b6
fix: handle nested pydantic basemodel
mao3267 Nov 8, 2024
ada05ed
Reviews from Eduardo
Future-Outlier Nov 11, 2024
56623e3
fix: support strict subset match
mao3267 Nov 15, 2024
b8f38a7
test: update strict subset match test
mao3267 Nov 15, 2024
b698769
fix: missing go mod entry
mao3267 Nov 15, 2024
70ad767
fix: missing go mod entry
mao3267 Nov 15, 2024
9dc3fa6
fix: go mod entry
mao3267 Nov 15, 2024
b224a02
make go-tidy
Future-Outlier Nov 15, 2024
7ed9be2
comments
Future-Outlier Nov 15, 2024
6fe8871
Merge branch 'fix/#5489-dataclass-mismatch' of https://github.com/mao…
mao3267 Nov 15, 2024
64343c8
fix: strict subset match with draft 2020-12 mashumaro
mao3267 Nov 18, 2024
8ccced5
Merge branch 'master' of https://github.com/mao3267/flyte into fix/#5…
mao3267 Nov 18, 2024
0def0ad
refactor: make go-tidy
mao3267 Nov 18, 2024
81445a7
fix: support strict subset match with ambiguity
mao3267 Nov 19, 2024
9b19f04
fix: change test name and fix err
mao3267 Nov 19, 2024
30aa096
Add comments
Future-Outlier Nov 20, 2024
e62ba6e
nit
Future-Outlier Nov 20, 2024
c6ac729
add flytectl go-tidy in makefile
Future-Outlier Nov 20, 2024
ba4d6f1
nit
Future-Outlier Nov 20, 2024
86a395e
fix: add comment for error checking
mao3267 Nov 20, 2024
7f28c35
test: basemodel castable test, two level dataclass and ParentToChild …
mao3267 Nov 20, 2024
66104dd
fix: add logger for jsonschema compiler
mao3267 Nov 20, 2024
b768576
Merge branch 'master' of https://github.com/mao3267/flyte into fix/#5…
mao3267 Nov 20, 2024
22d4dca
fix: add logger for marshal and compiler
mao3267 Nov 20, 2024
1f92e5c
better error msg format
Future-Outlier Nov 20, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
fix: support strict subset match
Signed-off-by: mao3267 <[email protected]>
  • Loading branch information
mao3267 committed Nov 15, 2024
commit 56623e3971faa5adcae192b0fbbfd43a82f89b6f
2 changes: 2 additions & 0 deletions flytepropeller/go.mod
Original file line number Diff line number Diff line change
@@ -22,12 +22,14 @@ require (
github.com/mitchellh/mapstructure v1.5.0
github.com/pkg/errors v0.9.1
github.com/prometheus/client_golang v1.19.1
github.com/santhosh-tekuri/jsonschema v1.2.4
github.com/shamaton/msgpack/v2 v2.2.2
github.com/sirupsen/logrus v1.9.3
github.com/spf13/cobra v1.7.0
github.com/spf13/pflag v1.0.5
github.com/stretchr/testify v1.9.0
github.com/wI2L/jsondiff v0.6.0
gitlab.com/yvesf/json-schema-compare v0.0.0-20190604192943-a900c04201f7
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc v0.47.0
go.opentelemetry.io/otel v1.24.0
go.opentelemetry.io/otel/trace v1.24.0
4 changes: 4 additions & 0 deletions flytepropeller/go.sum
Original file line number Diff line number Diff line change
@@ -373,6 +373,8 @@ github.com/rogpeppe/go-internal v1.3.0/go.mod h1:M8bDsm7K2OlrFYOpmOWEs/qY81heoFR
github.com/rogpeppe/go-internal v1.12.0 h1:exVL4IDcn6na9z1rAb56Vxr+CgyK3nn3O+epU5NdKM8=
github.com/rogpeppe/go-internal v1.12.0/go.mod h1:E+RYuTGaKKdloAfM02xzb0FW3Paa99yedzYV+kq4uf4=
github.com/russross/blackfriday/v2 v2.1.0/go.mod h1:+Rmxgy9KzJVeS9/2gXHxylqXiyQDYRxCVz55jmeOWTM=
github.com/santhosh-tekuri/jsonschema v1.2.4 h1:hNhW8e7t+H1vgY+1QeEQpveR6D4+OwKPXCfD2aieJis=
github.com/santhosh-tekuri/jsonschema v1.2.4/go.mod h1:TEAUOeZSmIxTTuHatJzrvARHiuO9LYd+cIxzgEHCQI4=
github.com/shamaton/msgpack/v2 v2.2.2 h1:GOIg0c9LV04VwzOOqZSrmsv/JzjNOOMxnS/HvOHGdgs=
github.com/shamaton/msgpack/v2 v2.2.2/go.mod h1:6khjYnkx73f7VQU7wjcFS9DFjs+59naVWJv1TB7qdOI=
github.com/sirupsen/logrus v1.4.2/go.mod h1:tLMulIdttU9McNUspp0xgXVQah82FyeX6MwdIuYE2rE=
@@ -427,6 +429,8 @@ github.com/yuin/goldmark v1.1.27/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9de
github.com/yuin/goldmark v1.1.32/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9decYSb74=
github.com/yuin/goldmark v1.2.1/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9decYSb74=
github.com/yuin/goldmark v1.4.13/go.mod h1:6yULJ656Px+3vBD8DxQVa3kxgyrAnzto9xy5taEt/CY=
gitlab.com/yvesf/json-schema-compare v0.0.0-20190604192943-a900c04201f7 h1:BAkxmYRc1ZPl6Gap4HWqwPT8yLZMrgaAwx12Ft408sg=
gitlab.com/yvesf/json-schema-compare v0.0.0-20190604192943-a900c04201f7/go.mod h1:X40Z1OU8o1oiXWzBmkuYOaruzYGv60l0AxGiB0E9keI=
go.opencensus.io v0.21.0/go.mod h1:mSImk1erAIZhrmZN+AvHh14ztQfjbGwt4TtuofqLduU=
go.opencensus.io v0.22.0/go.mod h1:+kGneAE2xo2IficOXnaByMWTGM9T73dGwxeWcUqIpI8=
go.opencensus.io v0.22.2/go.mod h1:yxeiOL68Rb0Xd1ddK5vPZ/oVn4vY4Ynel7k9FzqtOIw=
127 changes: 24 additions & 103 deletions flytepropeller/pkg/compiler/validators/typing.go
Original file line number Diff line number Diff line change
@@ -1,11 +1,14 @@
package validators

import (
"bytes"
"encoding/json"
"strings"

structpb "github.com/golang/protobuf/ptypes/struct"
"github.com/santhosh-tekuri/jsonschema"
"github.com/wI2L/jsondiff"
jscmp "gitlab.com/yvesf/json-schema-compare"

flyte "github.com/flyteorg/flyte/flyteidl/gen/pb-go/flyteidl/core"
)
@@ -18,125 +21,43 @@ type trivialChecker struct {
literalType *flyte.LiteralType
}

func removeTitleFieldFromProperties(schema map[string]*structpb.Value) {
// TODO: Explain why we need this
// TODO: givse me example about dataclass vs. Pydantic BaseModel
properties, ok := schema["properties"]
if !ok {
return
}
func isSuperTypeInJSON(sourceMetaData, targetMetaData *structpb.Struct) bool {
compiler := jsonschema.NewCompiler()

for _, p := range properties.GetStructValue().Fields {
if _, ok := p.GetStructValue().Fields["properties"]; ok {
removeTitleFieldFromProperties(p.GetStructValue().Fields)
}
delete(p.GetStructValue().Fields, "title")
}
}
srcSchemaBytes, _ := json.Marshal(sourceMetaData.Fields)
tgtSchemaBytes, _ := json.Marshal(targetMetaData.Fields)

func resolveRef(schema, defs map[string]*structpb.Value) {
// Schema from Pydantic BaseModel includes a $def field, which is a reference to the actual schema.
// We need to resolve the reference to compare the schema with those from marshumaro.
// https://github.com/flyteorg/flytekit/blob/3475ddc41f2ba31d23dd072362be704d7c2470a0/flytekit/core/type_engine.py#L632-L641
for _, p := range schema["properties"].GetStructValue().Fields {
if _, ok := p.GetStructValue().Fields["$ref"]; ok {
propName := strings.TrimPrefix(p.GetStructValue().Fields["$ref"].GetStringValue(), "#/$defs/")
p.GetStructValue().Fields = defs[propName].GetStructValue().Fields
resolveRef(p.GetStructValue().Fields, defs)
delete(p.GetStructValue().Fields, "$ref")
}
err := compiler.AddResource("src", bytes.NewReader(srcSchemaBytes))
if err != nil {
return false
}

delete(schema, "$defs")
}

func isSuperTypeInJSON(sourceMetaData, targetMetaData *structpb.Struct) bool {
// Since there are lots of field differences between draft-07 and draft 2020-12,
// we only support json schema with 2020-12 draft, which is generated here: https://github.com/flyteorg/flytekit/blob/ff2d0da686c82266db4dbf764a009896cf062349/flytekit/core/type_engine.py#L630-L639
_, upstreamIsDraft7 := sourceMetaData.Fields["$schema"]
_, downstreamIsDraft7 := targetMetaData.Fields["$schema"]

// We only support super type check for draft 2020-12
if upstreamIsDraft7 || downstreamIsDraft7 {
err = compiler.AddResource("tgt", bytes.NewReader(tgtSchemaBytes))
if err != nil {
return false
}

copySrcSchema := make(map[string]*structpb.Value)
copyTgtSchema := make(map[string]*structpb.Value)
srcSchema, _ := compiler.Compile("src")
tgtSchema, _ := compiler.Compile("tgt")

for k, v := range sourceMetaData.Fields {
copySrcSchema[k] = v
}

for k, v := range targetMetaData.Fields {
copyTgtSchema[k] = v
}
// Compare the two schemas
errs := jscmp.Compare(tgtSchema, srcSchema)

// For nested Pydantic BaseModel, we need to resolve the reference to compare the schema.
if _, ok := copySrcSchema["$defs"]; ok {
resolveRef(copySrcSchema, copySrcSchema["$defs"].GetStructValue().Fields)
}
if _, ok := copyTgtSchema["$defs"]; ok {
resolveRef(copyTgtSchema, copyTgtSchema["$defs"].GetStructValue().Fields)
// json-schema-compare does not support additionalProperties
if len(errs) == 1 {
return strings.Contains(errs[0].Error(), "FIXME additionalProperties not implemented")
}
// The JSON schema generated by Pydantic.BaseModel includes a title field in its properties, repeatedly recording the property name.
// Since this title field is absent in the JSON schema generated for dataclass, we need to remove the title field from the properties to ensure equivalence.
removeTitleFieldFromProperties(copySrcSchema)
removeTitleFieldFromProperties(copyTgtSchema)

srcSchemaBytes, _ := json.Marshal(copySrcSchema)
tgtSchemaBytes, _ := json.Marshal(copyTgtSchema)

patch, _ := jsondiff.CompareJSON(srcSchemaBytes, tgtSchemaBytes)
for _, p := range patch {
// If additionalProperties is false, the field is not present in the schema from Pydantic.BaseModel.
// We handle this case by checking the relationships by ourselves.
if p.Type != jsondiff.OperationAdd && strings.Contains(p.Path, "additionalProperties") {
if p.Type == jsondiff.OperationRemove || p.Type == jsondiff.OperationReplace {
if p.OldValue != false {
return false
}
}
} else if p.Type != jsondiff.OperationAdd {
return false
} else if strings.Contains(p.Path, "required") {
return false
}
}
return true
return len(errs) == 0
}

func isSameTypeInJSON(sourceMetaData, targetMetaData *structpb.Struct) bool {
// Since there are lots of field differences between draft-07 and draft 2020-12,
// we only support json schema with 2020-12 draft, which is generated here: https://github.com/flyteorg/flytekit/blob/ff2d0da686c82266db4dbf764a009896cf062349/flytekit/core/type_engine.py#L630-L639
_, upstreamIsDraft7 := sourceMetaData.Fields["$schema"]
_, downstreamIsDraft7 := targetMetaData.Fields["$schema"]
srcSchemaBytes, _ := json.Marshal(sourceMetaData.Fields)
tgtSchemaBytes, _ := json.Marshal(targetMetaData.Fields)

// If the schema version is different, we can't compare them.
if upstreamIsDraft7 != downstreamIsDraft7 {
patch, err := jsondiff.CompareJSON(srcSchemaBytes, tgtSchemaBytes)
if err != nil {
return false
}

copySrcSchema := make(map[string]*structpb.Value)
copyTgtSchema := make(map[string]*structpb.Value)

for k, v := range sourceMetaData.Fields {
copySrcSchema[k] = v
}

for k, v := range targetMetaData.Fields {
copyTgtSchema[k] = v
}

// The JSON schema generated by Pydantic.BaseModel includes a title field in its properties, repeatedly recording the property name.
// Since this title field is absent in the JSON schema generated for dataclass, we need to remove the title field from the properties to ensure equivalence.
removeTitleFieldFromProperties(copySrcSchema)
removeTitleFieldFromProperties(copyTgtSchema)

srcSchemaBytes, _ := json.Marshal(copySrcSchema)
tgtSchemaBytes, _ := json.Marshal(copyTgtSchema)

patch, _ := jsondiff.CompareJSON(srcSchemaBytes, tgtSchemaBytes)
return len(patch) == 0
}

2 changes: 2 additions & 0 deletions go.mod
Original file line number Diff line number Diff line change
@@ -164,6 +164,7 @@ require (
github.com/rcrowley/go-metrics v0.0.0-20201227073835-cf1acfcdf475 // indirect
github.com/robfig/cron/v3 v3.0.0 // indirect
github.com/samber/lo v1.47.0 // indirect
github.com/santhosh-tekuri/jsonschema v1.2.4 // indirect
github.com/sendgrid/rest v2.6.9+incompatible // indirect
github.com/sendgrid/sendgrid-go v3.10.0+incompatible // indirect
github.com/shamaton/msgpack/v2 v2.2.2 // indirect
@@ -180,6 +181,7 @@ require (
github.com/tidwall/pretty v1.2.1 // indirect
github.com/tidwall/sjson v1.2.5 // indirect
github.com/wI2L/jsondiff v0.6.0 // indirect
gitlab.com/yvesf/json-schema-compare v0.0.0-20190604192943-a900c04201f7 // indirect
go.opencensus.io v0.24.0 // indirect
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc v0.47.0 // indirect
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.46.1 // indirect
3 changes: 3 additions & 0 deletions go.sum
Original file line number Diff line number Diff line change
@@ -1198,6 +1198,7 @@ github.com/russross/blackfriday/v2 v2.0.1/go.mod h1:+Rmxgy9KzJVeS9/2gXHxylqXiyQD
github.com/russross/blackfriday/v2 v2.1.0/go.mod h1:+Rmxgy9KzJVeS9/2gXHxylqXiyQDYRxCVz55jmeOWTM=
github.com/samber/lo v1.47.0 h1:z7RynLwP5nbyRscyvcD043DWYoOcYRv3mV8lBeqOCLc=
github.com/samber/lo v1.47.0/go.mod h1:RmDH9Ct32Qy3gduHQuKJ3gW1fMHAnE/fAzQuf6He5cU=
github.com/santhosh-tekuri/jsonschema v1.2.4 h1:hNhW8e7t+H1vgY+1QeEQpveR6D4+OwKPXCfD2aieJis=
github.com/santhosh-tekuri/jsonschema v1.2.4/go.mod h1:TEAUOeZSmIxTTuHatJzrvARHiuO9LYd+cIxzgEHCQI4=
github.com/santhosh-tekuri/jsonschema/v2 v2.1.0/go.mod h1:yzJzKUGV4RbWqWIBBP4wSOBqavX5saE02yirLS0OTyg=
github.com/satori/go.uuid v1.2.0/go.mod h1:dA0hQrYB0VpLJoorglMZABFdXlWrHn1NEOzdhQKdks0=
@@ -1359,6 +1360,8 @@ github.com/yuin/goldmark v1.3.5/go.mod h1:mwnBkeHKe2W/ZEtQ+71ViKU8L12m81fl3OWwC1
github.com/yuin/goldmark v1.4.13/go.mod h1:6yULJ656Px+3vBD8DxQVa3kxgyrAnzto9xy5taEt/CY=
github.com/zenazn/goji v0.9.0/go.mod h1:7S9M489iMyHBNxwZnk9/EHS098H4/F6TATF2mIxtB1Q=
github.com/ziutek/mymysql v1.5.4/go.mod h1:LMSpPZ6DbqWFxNCHW77HeMg9I646SAhApZ/wKdgO/C0=
gitlab.com/yvesf/json-schema-compare v0.0.0-20190604192943-a900c04201f7 h1:BAkxmYRc1ZPl6Gap4HWqwPT8yLZMrgaAwx12Ft408sg=
gitlab.com/yvesf/json-schema-compare v0.0.0-20190604192943-a900c04201f7/go.mod h1:X40Z1OU8o1oiXWzBmkuYOaruzYGv60l0AxGiB0E9keI=
go.elastic.co/apm v1.8.0/go.mod h1:tCw6CkOJgkWnzEthFN9HUP1uL3Gjc/Ur6m7gRPLaoH0=
go.elastic.co/apm/module/apmhttp v1.8.0/go.mod h1:9LPFlEON51/lRbnWDfqAWErihIiAFDUMfMV27YjoWQ8=
go.elastic.co/apm/module/apmot v1.8.0/go.mod h1:Q5Xzabte8G/fkvDjr1jlDuOSUt9hkVWNZEHh6ZNaTjI=