Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved error handling #70

Merged
merged 30 commits into from
Sep 7, 2022
Merged

Improved error handling #70

merged 30 commits into from
Sep 7, 2022

Conversation

zivnevo
Copy link
Member

@zivnevo zivnevo commented Aug 23, 2022

For now, without unit tests. These will be added later.

@zivnevo zivnevo requested a review from vikin91 August 23, 2022 12:41
Signed-off-by: Ziv Nevo <[email protected]>
@zivnevo
Copy link
Member Author

zivnevo commented Aug 23, 2022

Added unit tests

Copy link

@vikin91 vikin91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope I shared the details in the comments quite clearly - feel free to ask about details if needed.

From my perspective, these are the "must change" parts:

  • The error interface returning an error instead of a string
  • Distinguish type of errors and utilize errors wrapping (errors.Wrapf)
  • Rethink the packages organization taking into consideration the view of the consumer (including the minimization of changes required when we need to break compatibility)

I think the rest of my comments are of type "should" and "could" and I wrote them to share our point of view on the code especially focusing on future maintainability, extensibility, and testability.

I am not voting green/red, as I do not know the rules that your org is following to decide whether a PR can be merged.

DocID int // the number of YAML document where the error originates from (0-based)
}

func (e *FileProcessingError) Error() string {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning string here would not allow the consumer to unwrap errors. Having exported the methods File, LineNo, and DocumentID allows the consumer to craft the string representation of the error message as long as we have the original error, so I would suggest to keep the returned type as error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

pkg/common/error_types.go Outdated Show resolved Hide resolved
pkg/common/error_types.go Outdated Show resolved Hide resolved
pkg/controller/engine.go Outdated Show resolved Hide resolved
pkg/analyzer/scan.go Show resolved Hide resolved
pkg/controller/utils.go Outdated Show resolved Hide resolved
Comment on lines 75 to 77
deployObjects, err := parseK8sYaml(mfp)
fileScanErrors = append(fileScanErrors, err...)
if len(deployObjects) > 0 {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a tricky part. When the error is returned, then we actually should assume that the deployObjects may be in any state - this is how such situation is typically handled in go when we return a result and the error.
Here, the situation may make sense, as the yaml file may contain some documents that are fully okay, while the others are not. To distinguish this, we would need to have one error object per one yaml document. The issues here as as follows:

  1. We return err suggesting that this is a single error while it is in fact a slice FileProcessingErrorList
  2. We cannot distinguish what kind of error is this without looking at the error message - by looking at the code, we see that there can be at least 4 types of errors: reading the file, splitting the yaml, marshalling the yaml doc, or not recognizing the resource as a k8s resource.

I would suggest to redesign here following an idea like this:

  • Split yaml into docs, on error collect the error (cannot split, cannot marshal, or a wrap of both), move to the next file. When all files are done, go to the next step
  • for each document parse the yaml doc into a k8s object, on error collect the error (cannot parse doc as a k8s object of type X), else collect the result.

These steps could be separated into 2 functions and covered with tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened issue #86 to handle this.

}

func splitByYamlDocuments(data []byte) []string {
decoder := yaml.NewDecoder(bytes.NewBuffer(data))
func splitByYamlDocuments(mfp string) ([]string, common.FileProcessingErrorList) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could separate the parts of this function into separate functions:
(1) reading the file and
(2) splitting the string (yaml) into many strings.

This would allow us to benefit from better testability.

pkg/controller/utils.go Outdated Show resolved Hide resolved
pkg/controller/utils_test.go Show resolved Hide resolved
zivnevo and others added 16 commits August 29, 2022 11:33
Signed-off-by: Ziv Nevo <[email protected]>
…-topology-analyzer into improved_error_handling

Signed-off-by: Ziv Nevo <[email protected]>
Signed-off-by: Ziv Nevo <[email protected]>
Copy link

@vikin91 vikin91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As agreed, I reviewed only policies_synthesizer.go, logger.go and error_types.go. I left many comments, but the blockers are marked with (MUST) at the beginning of the comment.

I am happy to chat about the rest when we have some time. I would propose to address the MUSTs and merge this (if the tests are passing) and continue iterating on the code in smaller PRs afterwards.

Debugf(format string, o ...interface{})
Infof(format string, o ...interface{})
Warnf(format string, o ...interface{})
Errorf(format string, o ...interface{})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(MUST) This method misses the error parameter.

Suggested change
Errorf(format string, o ...interface{})
Errorf(err error, format string, o ...interface{}) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 57 to 61
connections, errs := extractConnections(args, ps.stopOnError)
policies := []*networking.NetworkPolicy{}
if !returnOn1StError(ps.stopOnError, errs) {
policies = synthNetpols(connections)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may work okay (and it fulfills the interface) but it is a code smell.

The extractConnections seems to stop on the first error only in the first part of the processing. It does not stop when error happens in this block:

resources, links, parseErrors := parseResources(dObjs, args)

This is not a blocker for the integration but it may need some refactoring soon.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


connections, errs := extractConnections(args, ps.stopOnError)
policies := []*networking.NetworkPolicy{}
if !returnOn1StError(ps.stopOnError, errs) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function returnOn1StError checks only the last error for its type. While I suspect that this may be due to willingness to optimize things, I would like to highlight that we do not have any guarantee that a fatal error would be always the last one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

df.l.Printf(format, o...)
}

var activeLogger Logger = NewDefaultLogger()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need package-global variable? I would suggest to avoid such vars unless we really need them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until we refactor the code and put more functions under PoliciesSynthesizer the only alternative I see is to pass the logger to every function that uses it, which will uglify the code much.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is what we want. If the code looks ugly with the logger passed to each function, then we know that we have a smell in the code that needs to be improved. So it is actually a good thing that it is ugly :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be handled in the context of Issue #81

Comment on lines 107 to 110
func malformedYamlDoc(filePath string, lineNum, docID int, err error) *FileProcessingError {
msg := fmt.Sprintf("YAML document is malformed: %v", err)
return newFileProcessingError(msg, filePath, lineNum, docID, false, true)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, the way the errors are created in this section may cause some maintainability in the future. Let me try to share what I have in mind when reading this.

This function introduces an error type that is write-only. It means that you can use this function to produce this error, but there is no possibility to check whether a given error is of this type.
Consider the following alternative:

package main

import (
	"fmt"
	"github.com/pkg/errors"
)

// let's define what error types do we have
var (
	ErrMalformedYamlDoc    = errors.New("YAML document is malformed")
	ErrNoK8sResourcesFound = errors.New("no relevant Kubernetes resources found")
	ErrUnknown             = errors.New("unknown error")
)

// ErrorSeverity - let's define what kind of severities do we have
// must be exported
type ErrorSeverity string

const (
	ErrorSeverityFatal    ErrorSeverity = "fatal"
	ErrorSeverityCritical ErrorSeverity = "critical"
	ErrorSeverityOther    ErrorSeverity = "other"
)

// errorTypes maps error types to their severity
// should be unexported
var errorTypes = map[error]ErrorSeverity{
	ErrMalformedYamlDoc:    ErrorSeverityFatal,
	ErrNoK8sResourcesFound: ErrorSeverityCritical,
	ErrUnknown:             ErrorSeverityOther,
}

type FileProcessingError struct {
	cause    error
	errType  error
	filePath string
	lineNum  int
	docID    int
}

func (e *FileProcessingError) Error() error {
	if e.cause != nil {
		return errors.Wrap(e.cause, e.errType.Error())
	}
	return e.errType
}

// Cause is >>>optional<<< for us, but it is a chance to implement the 'causer' interface:
// https://pkg.go.dev/github.com/pkg/errors#readme-retrieving-the-cause-of-an-error
func (e *FileProcessingError) Cause() error {
	return e.cause
}

// Severity is a method to check error severity without giving the user the possibility to change it
// can be exported
func (e *FileProcessingError) Severity() ErrorSeverity {
	if severity, ok := errorTypes[e.errType]; ok {
		return severity
	}
	return ErrorSeverityOther
}

func (e *FileProcessingError) Type() error {
	return e.errType
}

func newFileProcessingError(cause, errType error, filePath string, lineNum, docID int) *FileProcessingError {
	if errType == nil {
		errType = ErrUnknown
	}
	return &FileProcessingError{
		cause,
		errType,
		filePath,
		lineNum,
		docID,
	}
}

// consumer

func main() {
	cause := errors.New("illegal character xyz")
	for _, err := range []*FileProcessingError{
		newFileProcessingError(cause, ErrMalformedYamlDoc, "/etc/passwd", 999, 7),
		newFileProcessingError(nil, ErrNoK8sResourcesFound, "/tmp", 0, -1),
	} {
		fmt.Printf("Error message: %s\n", err.Error())
		if c := err.Cause(); c != nil {
			fmt.Printf("Explicit cause: %s\n", c.Error())
		}
		fmt.Printf("Error Type: %s\n", err.Type())
		fmt.Printf("Error Severity: %s\n", err.Severity())
	}

}

I ignored a lot of boilerplate in this example, but I want to highlight the following:

  1. Instead of custom constructor for each type, have a single constructor and all possible types listed.
  2. Use errors.Wrap always when you think about using sprintf + error

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this looks nice. How would you handle parameterized error messages?
e.g., "configmap %s does not have key %s (referenced by %s)"

For now, let's leave it as is. We can introduce error types later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened Issue #85 to handle this

Comment on lines 26 to 30
if fpe.IsSevere() || fpe.IsFatal() {
activeLogger.Errorf(logMsg)
} else {
activeLogger.Warnf(logMsg)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(MUST) Please no logging in the custom error type. Additionally this uses a global var - we must not do this, because the caller would not be able to disable this or test this properly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separated logging of errors from the custom error interface

}

func notK8sResource(filePath string, docID int, err error) *FileProcessingError {
msg := fmt.Sprintf("Yaml document is not a K8s resource: %v", err)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please when working with errors always wrap the errors in a message, otherwise we loose information on such operations like this one.

Instead try this:

Suggested change
msg := fmt.Sprintf("Yaml document is not a K8s resource: %v", err)
err2 := errors.Wrap(err,"Yaml document is not a K8s resource")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 63 to 68
ps.errors = errs
for _, err := range errs {
if err.IsFatal() {
return nil, err.Error()
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine and it implements the agreed interface. But it sparkles some thoughts about potential confusion source.

I though that we can guarantee by definition that there is always 0 or 1 fatal errors (max 1, as seeing one fatal error forces us to stop processing). What we see here is our classification of the severity fatal which means that the lib could theoretically continue work till the end if the user has chosen so.

I think that there are the following type of errors here:

  1. fatal where the processing must stop now - e.g., file not found, not enough memory
  2. The severity that we called fatal = the result is garbage
  3. The severity that we called critical = the result may be incomplete

Now, using word fatal for 1 and 2 makes it confusing. Maybe we could reiterate the naming?

In line 66, we return the second fatal, whereas when I proposed it I was thinking about the first fatal.
Let me know what you think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't really have errors that are of type 1 but not type 2. i.e., whenever we have to stop immediately, the results cannot be used. I cannot really imagine a case where the results (NetworkPolicies) will be meaningful if we ran out of memory. I prefer to be on the safe side here.

@zivnevo zivnevo requested a review from vikin91 September 5, 2022 14:38
@vikin91
Copy link

vikin91 commented Sep 6, 2022

Thanks for the fixes @zivnevo ! I will give them a try by trying to integrate directly with this branch. I will let you know if there are any issues with anything that would be blocking. I am not sure I finish this today (Tuesday), but tomorrow the latest I should be back with success report or with a list of required corrections.

Copy link

@vikin91 vikin91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to partially integrate with this branch and I see the library working correctly in simple scenarios (I haven't checked anything more complex yet).

As this PR is really big and almost impossible to review, I propose the following:

  • merge it as it is,
  • address potential issues in separate, smaller PRs if needed.

Does it sound ok?

@zivnevo zivnevo merged commit 82fcc36 into main Sep 7, 2022
@zivnevo zivnevo deleted the improved_error_handling branch September 7, 2022 08:42
@zivnevo
Copy link
Member Author

zivnevo commented Sep 7, 2022

I was able to partially integrate with this branch and I see the library working correctly in simple scenarios (I haven't checked anything more complex yet).

As this PR is really big and almost impossible to review, I propose the following:

* merge it as it is,

* address potential issues in separate, smaller PRs if needed.

Does it sound ok?

Yes, this sounds good. I opened issues to address most of the comments. If you think I missed something, feel free to open a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Accumulate file-processing errors. Main entry point should return them all.
2 participants