-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved error handling #70
Conversation
Signed-off-by: Ziv Nevo <[email protected]>
Signed-off-by: Ziv Nevo <[email protected]>
Signed-off-by: Ziv Nevo <[email protected]>
Signed-off-by: Ziv Nevo <[email protected]>
Signed-off-by: Ziv Nevo <[email protected]>
Added unit tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope I shared the details in the comments quite clearly - feel free to ask about details if needed.
From my perspective, these are the "must change" parts:
- The error interface returning an
error
instead of a string - Distinguish type of errors and utilize errors wrapping (
errors.Wrapf
) - Rethink the packages organization taking into consideration the view of the consumer (including the minimization of changes required when we need to break compatibility)
I think the rest of my comments are of type "should" and "could" and I wrote them to share our point of view on the code especially focusing on future maintainability, extensibility, and testability.
I am not voting green/red, as I do not know the rules that your org is following to decide whether a PR can be merged.
pkg/common/error_types.go
Outdated
DocID int // the number of YAML document where the error originates from (0-based) | ||
} | ||
|
||
func (e *FileProcessingError) Error() string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning string
here would not allow the consumer to unwrap errors. Having exported the methods File
, LineNo
, and DocumentID
allows the consumer to craft the string representation of the error message as long as we have the original error, so I would suggest to keep the returned type as error
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/controller/utils.go
Outdated
deployObjects, err := parseK8sYaml(mfp) | ||
fileScanErrors = append(fileScanErrors, err...) | ||
if len(deployObjects) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a tricky part. When the error is returned, then we actually should assume that the deployObjects
may be in any state - this is how such situation is typically handled in go when we return a result and the error.
Here, the situation may make sense, as the yaml file may contain some documents that are fully okay, while the others are not. To distinguish this, we would need to have one error object per one yaml document. The issues here as as follows:
- We return
err
suggesting that this is a single error while it is in fact a sliceFileProcessingErrorList
- We cannot distinguish what kind of error is this without looking at the error message - by looking at the code, we see that there can be at least 4 types of errors: reading the file, splitting the yaml, marshalling the yaml doc, or not recognizing the resource as a k8s resource.
I would suggest to redesign here following an idea like this:
- Split yaml into docs, on error collect the error (cannot split, cannot marshal, or a wrap of both), move to the next file. When all files are done, go to the next step
- for each document parse the yaml doc into a k8s object, on error collect the error (cannot parse doc as a k8s object of type X), else collect the result.
These steps could be separated into 2 functions and covered with tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened issue #86 to handle this.
pkg/controller/utils.go
Outdated
} | ||
|
||
func splitByYamlDocuments(data []byte) []string { | ||
decoder := yaml.NewDecoder(bytes.NewBuffer(data)) | ||
func splitByYamlDocuments(mfp string) ([]string, common.FileProcessingErrorList) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could separate the parts of this function into separate functions:
(1) reading the file and
(2) splitting the string (yaml) into many strings.
This would allow us to benefit from better testability.
Signed-off-by: Ziv Nevo <[email protected]>
Signed-off-by: Ziv Nevo <[email protected]>
Signed-off-by: Ziv Nevo <[email protected]>
Signed-off-by: Ziv Nevo <[email protected]>
Signed-off-by: Ziv Nevo <[email protected]>
Signed-off-by: Ziv Nevo <[email protected]>
Signed-off-by: Ziv Nevo <[email protected]>
Signed-off-by: Ziv Nevo <[email protected]>
Signed-off-by: Ziv Nevo <[email protected]>
Signed-off-by: Ziv Nevo <[email protected]>
Signed-off-by: Ziv Nevo <[email protected]>
…-topology-analyzer into improved_error_handling Signed-off-by: Ziv Nevo <[email protected]>
Signed-off-by: Ziv Nevo <[email protected]>
Signed-off-by: Ziv Nevo <[email protected]>
Signed-off-by: Ziv Nevo <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As agreed, I reviewed only policies_synthesizer.go
, logger.go
and error_types.go
. I left many comments, but the blockers are marked with (MUST)
at the beginning of the comment.
I am happy to chat about the rest when we have some time. I would propose to address the MUSTs and merge this (if the tests are passing) and continue iterating on the code in smaller PRs afterwards.
pkg/controller/logger.go
Outdated
Debugf(format string, o ...interface{}) | ||
Infof(format string, o ...interface{}) | ||
Warnf(format string, o ...interface{}) | ||
Errorf(format string, o ...interface{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(MUST) This method misses the error
parameter.
Errorf(format string, o ...interface{}) | |
Errorf(err error, format string, o ...interface{}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
connections, errs := extractConnections(args, ps.stopOnError) | ||
policies := []*networking.NetworkPolicy{} | ||
if !returnOn1StError(ps.stopOnError, errs) { | ||
policies = synthNetpols(connections) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may work okay (and it fulfills the interface) but it is a code smell.
The extractConnections
seems to stop on the first error only in the first part of the processing. It does not stop when error happens in this block:
resources, links, parseErrors := parseResources(dObjs, args)
This is not a blocker for the integration but it may need some refactoring soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
|
||
connections, errs := extractConnections(args, ps.stopOnError) | ||
policies := []*networking.NetworkPolicy{} | ||
if !returnOn1StError(ps.stopOnError, errs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function returnOn1StError
checks only the last error for its type. While I suspect that this may be due to willingness to optimize things, I would like to highlight that we do not have any guarantee that a fatal error would be always the last one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
df.l.Printf(format, o...) | ||
} | ||
|
||
var activeLogger Logger = NewDefaultLogger() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need package-global variable? I would suggest to avoid such vars unless we really need them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Until we refactor the code and put more functions under PoliciesSynthesizer
the only alternative I see is to pass the logger to every function that uses it, which will uglify the code much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is what we want. If the code looks ugly with the logger passed to each function, then we know that we have a smell in the code that needs to be improved. So it is actually a good thing that it is ugly :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be handled in the context of Issue #81
pkg/controller/error_types.go
Outdated
func malformedYamlDoc(filePath string, lineNum, docID int, err error) *FileProcessingError { | ||
msg := fmt.Sprintf("YAML document is malformed: %v", err) | ||
return newFileProcessingError(msg, filePath, lineNum, docID, false, true) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, the way the errors are created in this section may cause some maintainability in the future. Let me try to share what I have in mind when reading this.
This function introduces an error type that is write-only. It means that you can use this function to produce this error, but there is no possibility to check whether a given error is of this type.
Consider the following alternative:
package main
import (
"fmt"
"github.com/pkg/errors"
)
// let's define what error types do we have
var (
ErrMalformedYamlDoc = errors.New("YAML document is malformed")
ErrNoK8sResourcesFound = errors.New("no relevant Kubernetes resources found")
ErrUnknown = errors.New("unknown error")
)
// ErrorSeverity - let's define what kind of severities do we have
// must be exported
type ErrorSeverity string
const (
ErrorSeverityFatal ErrorSeverity = "fatal"
ErrorSeverityCritical ErrorSeverity = "critical"
ErrorSeverityOther ErrorSeverity = "other"
)
// errorTypes maps error types to their severity
// should be unexported
var errorTypes = map[error]ErrorSeverity{
ErrMalformedYamlDoc: ErrorSeverityFatal,
ErrNoK8sResourcesFound: ErrorSeverityCritical,
ErrUnknown: ErrorSeverityOther,
}
type FileProcessingError struct {
cause error
errType error
filePath string
lineNum int
docID int
}
func (e *FileProcessingError) Error() error {
if e.cause != nil {
return errors.Wrap(e.cause, e.errType.Error())
}
return e.errType
}
// Cause is >>>optional<<< for us, but it is a chance to implement the 'causer' interface:
// https://pkg.go.dev/github.com/pkg/errors#readme-retrieving-the-cause-of-an-error
func (e *FileProcessingError) Cause() error {
return e.cause
}
// Severity is a method to check error severity without giving the user the possibility to change it
// can be exported
func (e *FileProcessingError) Severity() ErrorSeverity {
if severity, ok := errorTypes[e.errType]; ok {
return severity
}
return ErrorSeverityOther
}
func (e *FileProcessingError) Type() error {
return e.errType
}
func newFileProcessingError(cause, errType error, filePath string, lineNum, docID int) *FileProcessingError {
if errType == nil {
errType = ErrUnknown
}
return &FileProcessingError{
cause,
errType,
filePath,
lineNum,
docID,
}
}
// consumer
func main() {
cause := errors.New("illegal character xyz")
for _, err := range []*FileProcessingError{
newFileProcessingError(cause, ErrMalformedYamlDoc, "/etc/passwd", 999, 7),
newFileProcessingError(nil, ErrNoK8sResourcesFound, "/tmp", 0, -1),
} {
fmt.Printf("Error message: %s\n", err.Error())
if c := err.Cause(); c != nil {
fmt.Printf("Explicit cause: %s\n", c.Error())
}
fmt.Printf("Error Type: %s\n", err.Type())
fmt.Printf("Error Severity: %s\n", err.Severity())
}
}
I ignored a lot of boilerplate in this example, but I want to highlight the following:
- Instead of custom constructor for each type, have a single constructor and all possible types listed.
- Use
errors.Wrap
always when you think about usingsprintf + error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this looks nice. How would you handle parameterized error messages?
e.g., "configmap %s does not have key %s (referenced by %s)"
For now, let's leave it as is. We can introduce error types later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened Issue #85 to handle this
pkg/controller/error_types.go
Outdated
if fpe.IsSevere() || fpe.IsFatal() { | ||
activeLogger.Errorf(logMsg) | ||
} else { | ||
activeLogger.Warnf(logMsg) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(MUST) Please no logging in the custom error type. Additionally this uses a global var - we must not do this, because the caller would not be able to disable this or test this properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separated logging of errors from the custom error interface
pkg/controller/error_types.go
Outdated
} | ||
|
||
func notK8sResource(filePath string, docID int, err error) *FileProcessingError { | ||
msg := fmt.Sprintf("Yaml document is not a K8s resource: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please when working with errors always wrap the errors in a message, otherwise we loose information on such operations like this one.
Instead try this:
msg := fmt.Sprintf("Yaml document is not a K8s resource: %v", err) | |
err2 := errors.Wrap(err,"Yaml document is not a K8s resource") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
ps.errors = errs | ||
for _, err := range errs { | ||
if err.IsFatal() { | ||
return nil, err.Error() | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine and it implements the agreed interface. But it sparkles some thoughts about potential confusion source.
I though that we can guarantee by definition that there is always 0 or 1 fatal errors (max 1, as seeing one fatal error forces us to stop processing). What we see here is our classification of the severity fatal
which means that the lib could theoretically continue work till the end if the user has chosen so.
I think that there are the following type of errors here:
fatal
where the processing must stop now - e.g., file not found, not enough memory- The severity that we called fatal = the result is garbage
- The severity that we called critical = the result may be incomplete
Now, using word fatal for 1 and 2 makes it confusing. Maybe we could reiterate the naming?
In line 66, we return the second fatal, whereas when I proposed it I was thinking about the first fatal.
Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't really have errors that are of type 1 but not type 2. i.e., whenever we have to stop immediately, the results cannot be used. I cannot really imagine a case where the results (NetworkPolicies) will be meaningful if we ran out of memory. I prefer to be on the safe side here.
Signed-off-by: Ziv Nevo <[email protected]>
Signed-off-by: Ziv Nevo <[email protected]>
Signed-off-by: Ziv Nevo <[email protected]>
Signed-off-by: Ziv Nevo <[email protected]>
Signed-off-by: Ziv Nevo <[email protected]>
Signed-off-by: Ziv Nevo <[email protected]>
Signed-off-by: Ziv Nevo <[email protected]>
Thanks for the fixes @zivnevo ! I will give them a try by trying to integrate directly with this branch. I will let you know if there are any issues with anything that would be blocking. I am not sure I finish this today (Tuesday), but tomorrow the latest I should be back with success report or with a list of required corrections. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to partially integrate with this branch and I see the library working correctly in simple scenarios (I haven't checked anything more complex yet).
As this PR is really big and almost impossible to review, I propose the following:
- merge it as it is,
- address potential issues in separate, smaller PRs if needed.
Does it sound ok?
Yes, this sounds good. I opened issues to address most of the comments. If you think I missed something, feel free to open a new issue. |
For now, without unit tests. These will be added later.