Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjustment to the function that determines whether to import AvroTypeGen in the code to be generated #141

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

thiagodpf
Copy link

@thiagodpf thiagodpf commented Aug 31, 2024

The sort.Search() function is not producing the desired result for the logic inside the shouldImportAvroTypeGen() function.

Assuming that the definitions slice contains two elements, the first being an element that requires the inclusion of the import and the second as an enumerator (which does not require the import). When iterating through the elements of the namespace.Definitions map, when the code is iterating over the element that needs the import, the sort.Search() function will internally try to iterate through the slice from back to front, since the last element of the slice is not the element that needs the import and since the comparison is being made with ==, the sort.Search() function will return a new index (which should be used for a future ordered insertion in the slice).

In short, I think that the sort package is not the best option for what is intended to be done in the shouldImportAvroTypeGen() function.

This tweak seems to be much simpler to understand and makes more sense for the purpose of the shouldImportAvroTypeGen() function.

Fix to the problem reported in this issue: #130

The iteration order over maps is not specified and is not guaranteed to be the same from one iteration to the next. Therefore, iterating through the namespace.Definitions map may sometimes not work with the sort.Search() function. This tweak seems to be much simpler to understand and makes more sense for the purpose of the shouldImportAvroTypeGen() function.
@rogpeppe
Copy link
Contributor

Have you found a case where this actually goes wrong? From a very brief look at the code (away from my laptop!) it looks like parseFile sorts the definitions slice before returning it so it looks ok to use sort.Search to me.

@thiagodpf
Copy link
Author

Hi @rogpeppe !
Here is my .avsc

{
    "type": "record",
    "name": "BusinessModelChangeAudit",
    "namespace": "br.com.ifd.content",
    "fields": [
        {
            "name": "context",
            "type": [
                "null",
                "string"
            ],
            "default": null,
            "doc": "information about the execution of the transaction"
        },
        {
            "name": "created_at",
            "doc": "The time that this event has created (epoch)",
            "type": "long",
            "logicalType": "timestamp-millis"
        },
        {
            "name": "current_providers",
            "doc": "current providers related to the event",
            "type": {
                "type": "array",
                "items": "string"
            }
        },
        {
            "name": "previous_providers",
            "doc": "previous providers related to the event",
            "type": {
                "type": "array",
                "items": "string"
            }
        },
        {
            "name": "product_external_id",
            "type": "string",
            "doc": "identifier product salesforce"
        },
        {
            "name": "response_transmitted",
            "type": "boolean",
            "default": false,
            "doc": "indicates whether the return was reported to salesforce"
        },
        {
            "name": "status",
            "type": {
                "type": "enum",
                "name": "status",
                "symbols": [
                    "PENDING",
                    "SUCCESS",
                    "FAILURE",
                    "ABORTED",
                    "REJECTED",
                    "INTERMEDIATE"
                ]
            },
            "doc": "transaction status"
        },
        {
            "name": "step",
            "type": "string",
            "doc": "This field indicates at which step the intermediate status occurs during a process or workflow"
        },
        {
            "name": "transaction_id",
            "doc": "transaction unique identifier",
            "type": "string",
            "logicalType": "UUID"
        },
        {
            "name": "updated_at",
            "doc": "The time that this event has updated (epoch)",
            "type": "long",
            "logicalType": "timestamp-millis"
        }
    ]
}

and here is the generated code:
(missing import "github.com/heetch/avro/avrotypegen")

// Code generated by avrogen. DO NOT EDIT.

package bmca

import (
	"fmt"
	"strconv"
)

type BusinessModelChangeAudit struct {
	// information about the execution of the transaction
	Context *string `json:"context"`

	// The time that this event has created (epoch)
	Created_at int64 `json:"created_at"`

	// current providers related to the event
	Current_providers []string `json:"current_providers"`

	// previous providers related to the event
	Previous_providers []string `json:"previous_providers"`

	// identifier product salesforce
	Product_external_id string `json:"product_external_id"`

	// indicates whether the return was reported to salesforce
	Response_transmitted bool `json:"response_transmitted"`

	// transaction status
	Status Status `json:"status"`

	// This field indicates at which step the intermediate status occurs during a process or workflow
	Step string `json:"step"`

	// transaction unique identifier
	Transaction_id string `json:"transaction_id"`

	// The time that this event has updated (epoch)
	Updated_at int64 `json:"updated_at"`
}

// AvroRecord implements the avro.AvroRecord interface.
func (BusinessModelChangeAudit) AvroRecord() avrotypegen.RecordInfo {
	return avrotypegen.RecordInfo{
		Schema: `{"fields":[{"default":null,"doc":"information about the execution of the transaction","name":"context","type":["null","string"]},{"doc":"The time that this event has created (epoch)","logicalType":"timestamp-millis","name":"created_at","type":"long"},{"doc":"current providers related to the event","name":"current_providers","type":{"items":"string","type":"array"}},{"doc":"previous providers related to the event","name":"previous_providers","type":{"items":"string","type":"array"}},{"doc":"identifier product sales force","name":"product_external_id","type":"string"},{"default":false,"doc":"indicates whether the return was reported to salesforce","name":"response_transmitted","type":"boolean"},{"doc":"transaction status","name":"status","type":{"name":"status","symbols":["PENDING","SUCCESS","FAILURE","ABORTED","REJECTED","INTERMEDIATE"],"type":"enum"}},{"doc":"This field indicates at which step the intermediate status occurs during a process or workflow","name":"step","type":"string"},{"doc":"transaction unique identifier","logicalType":"UUID","name":"transaction_id","type":"string"},{"doc":"The time that this event has updated (epoch)","logicalType":"timestamp-millis","name":"updated_at","type":"long"}],"name":"br.com.ifd.content.BusinessModelChangeAudit","type":"record"}`,
		Required: []bool{
			1: true,
			2: true,
			3: true,
			4: true,
			6: true,
			7: true,
			8: true,
			9: true,
		},
	}
}

type Status int

const (
	StatusPending Status = iota
	StatusSuccess
	StatusFailure
	StatusAborted
	StatusRejected
	StatusIntermediate
)

var _Status_strings = []string{
	"PENDING",
	"SUCCESS",
	"FAILURE",
	"ABORTED",
	"REJECTED",
	"INTERMEDIATE",
}

// String returns the textual representation of Status.
func (e Status) String() string {
	if e < 0 || int(e) >= len(_Status_strings) {
		return "Status(" + strconv.FormatInt(int64(e), 10) + ")"
	}
	return _Status_strings[e]
}

// MarshalText implements encoding.TextMarshaler
// by returning the textual representation of Status.
func (e Status) MarshalText() ([]byte, error) {
	if e < 0 || int(e) >= len(_Status_strings) {
		return nil, fmt.Errorf("Status value %d is out of bounds", e)
	}
	return []byte(_Status_strings[e]), nil
}

// UnmarshalText implements encoding.TextUnmarshaler
// by expecting the textual representation of Status.
func (e *Status) UnmarshalText(data []byte) error {
	// Note for future: this could be more efficient.
	for i, s := range _Status_strings {
		if string(data) == s {
			*e = Status(i)
			return nil
		}
	}
	return fmt.Errorf("unknown value %q for Status", data)
}

@thiagodpf
Copy link
Author

thiagodpf commented Sep 2, 2024

I believe the problem in the current algorithm is not related to the ordering of the definitions slice, but rather with the == comparator is being used within the sort.Search() function, when in fact (I believe) the comparator >= should be used if you want to keep using the sort.Search() function (although there would be side effects to think about 😅).

I don't know if it will help, but below is a screenshot where I managed to reproduce the problem, notice the order of the elements within the namespace.Definitions map and the if condition (since the sort.Search() function did not find the index and returned the size of the slice):

image

@thiagodpf
Copy link
Author

thiagodpf commented Sep 2, 2024

I was reviewing the description I left in the pull request and I saw that I gave too much emphasis to the iteration through namespace.Definitions map, when in fact the problem is in the condition within the sort.Search() function (my bad 😅) so I thought it best to edit it.

I think (just my opinion) that the sort.Search() function is not the most suitable for this case since it seems to be better used in "comparable" values ​​(less than, greater than, etc...) but not with raw strings.

I even thought that the sort.SearchStrings() could be useful to us, but the problem is that when it doesn't find the element in the slice, it will return the index where the element should be inserted, but it's not what we want at that point...

Still it wouldn't be a big problem given the current if conditions later on... 🤔
I can redo the PR to use sort.SearchStrings() if you prefer, what do you think?

@@ -29,10 +29,7 @@ const nullType = "avrotypegen.Null"
// schema.RecordDefinition by looking at their match within given parsed namespace
func shouldImportAvroTypeGen(namespace *parser.Namespace, definitions []schema.QualifiedName) bool {
Copy link
Contributor

@rogpeppe rogpeppe Sep 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked in details but that == comparison is definitely wrong!

How about something like this instead, which keeps the binary search but (hopefully!) fixes the logic:

// shouldImportAvroTypeGen reports whether avrotypegen is required. It checks that the definitions given are of type
// schema.RecordDefinition by looking at their match within given parsed namespace
func shouldImportAvroTypeGen(namespace *parser.Namespace, definitions []schema.QualifiedName) bool {
	for _, def := range namespace.Definitions {
		searchName := def.AvroName().Name
		_, found := sort.Find(len(definitions), func(i int) int {
			return strings.Compare(searchName, definitions[i].Name)
		})
		if found {
			switch def.(type) {
			case *schema.RecordDefinition, *schema.FixedDefinition:
				return true
			}
		}
	}
	return false
}

Note: I haven't actually run the above code :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It worked perfectly! I'll adjust the PR.

Thanks for the help, I really don't have much experience with the sort package and I was able to learn some cool stuff! 🙇‍♂️

Adjustment in the function that determines whether or not the avrotypegen package should be imported into the file to be generated.
@thiagodpf thiagodpf changed the title Update generate.go Adjustment to the function that determines whether to import AvroTypeGen in the code to be generated Sep 3, 2024
@thiagodpf thiagodpf requested a review from rogpeppe September 3, 2024 01:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants