Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Service to actor pattern #172

Merged
merged 9 commits into from
Dec 12, 2022
Merged

Conversation

poszu
Copy link
Contributor

@poszu poszu commented Dec 1, 2022

With an actor pattern, all state of Service is mutated from a single thread. The select in Service::loop is a synchronization point.

In order to prevent accidental access to the state of Service from the public methods that can be called on other goroutines, a ServiceClient was factored out. It can talk with Service via a channel, sending func(*Service) to access/mutate Service safely. The sent functions are executed serially, which prevents race conditions and should be kept short and must not block.

No goroutine shares a reference to Service hence there is no need for locking at all.

Closes #69
Closes #140
Closes #170

@poszu poszu self-assigned this Dec 1, 2022
@poszu poszu requested a review from fasmat December 1, 2022 12:17
service/service.go Show resolved Hide resolved
}

func (s *Service) Start(b Broadcaster, atxProvider challenge_verifier.Verifier) error {
func (s *Service) Start(b Broadcaster, verifier challenge_verifier.Verifier) error {
s.Lock()
defer s.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can get rid of the mutex using sync.Once?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably yes. It would be explicit that Service cannot be restarted after Shutdown(). I will explore this idea in a separate PR as it is not really related to the scope of this one, OK?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure. up to you. it's used often in go-spacemesh.

It would be explicit that Service cannot be restarted after Shutdown()

i don't see there is any way to restart it after the Shutdown() RPC call. is there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there isn't even a shutdown RPC. Shutdown() is called on program exit.

service/service.go Show resolved Hide resolved
service/service.go Show resolved Hide resolved
service/service.go Show resolved Hide resolved
service/service.go Outdated Show resolved Hide resolved
service/service.go Show resolved Hide resolved
service/service.go Outdated Show resolved Hide resolved
service/service.go Show resolved Hide resolved
service/service.go Show resolved Hide resolved
service/service.go Outdated Show resolved Hide resolved
service/service.go Show resolved Hide resolved
}

func (s *Service) Start(b Broadcaster, atxProvider challenge_verifier.Verifier) error {
func (s *Service) Start(b Broadcaster, verifier challenge_verifier.Verifier) error {
s.Lock()
defer s.Unlock()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably yes. It would be explicit that Service cannot be restarted after Shutdown(). I will explore this idea in a separate PR as it is not really related to the scope of this one, OK?

service/service.go Show resolved Hide resolved
service/service.go Show resolved Hide resolved
service/service.go Show resolved Hide resolved
service/service.go Outdated Show resolved Hide resolved
service/service.go Show resolved Hide resolved
service/service.go Show resolved Hide resolved
service/service.go Show resolved Hide resolved
Copy link
Member

@fasmat fasmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still going through the code and trying to understand it so more comments might follow 🙂

service/service.go Show resolved Hide resolved
sync.Mutex
}

type Command func(*Service)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some documentation to Command, e.g. that the command must terminate in a timely manner and should not do file I/O.

Imo a command should also receive the context of the function that actually executes it, so it can return early if the work it does isn't required anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the context of the function that creates and schedules the Command or the context of Service::loop? The former is already available as a variable captured by a closure. If I pass the context of Service::loop there will be two contexts available which will be confusing.

Copy link
Member

@fasmat fasmat Dec 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The context of the caller of Command, not the creator of it.

type Command func(ctx context.Context, *Service)

// creator
cmdChan <- func(ctx context.Context, s *Service) {
	select {
	case <- ctx.Done():
		return
	default:
	}

	s.variable = newValue
}

// caller (b.loop)
cmd := <-cmdChan
cmd(ctx, s) // ctx here is the context that cancels loop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be confusing because it will look like this:

func (s *ServiceClient) Info(ctx context.Context) (*InfoResponse, error) {
	s.command <- func(serviceCtx context.Context s *Service) {
		// THERE ARE 2 Contexts here, it's confusing
		select {
		case <-ctx.Done():
			return // Info's context is cancelled - GRPC is cancelled
		case <- serviceCtx.Done():
			return // Service's context is cancelled - shutting down
		default:
		}
		s.variable = newValue
	}
}

Relying on the requirement that Command is supposed to be short, I don't see a need to select on any of these. It should just assign newValue / run and return.

Copy link
Member

@fasmat fasmat Dec 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code you posted is wrong. A Command should never check the context of the function that created it. That doesn't make any logical sense.

With the same argument I could say that a command shouldn't have a Service parameter, because it already has access to the service from its parents scope.

That being sad: I think ServiceClient isn't required as a type. imo inlining it into Service makes the code much more straight forward than having an embedded service client.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reading service.go after your change, the differentiation between Service and ServiceClient is still unclear to me.
i don't understand why ServiceClient cannot be inlined.
it seems the public APIs Info/Submit/SetBroadcaster/SetChallengeVerifier all can submit through the command channel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ServiceClient could be inlined into Service. Not inlining it prevents a whole class of bugs as there is literally no way to access the Service state from ServiceClient in any other way than safely via the commands channel. Thanks to this, it is impossible to accidentally create a data race from the Submit and Info methods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree with the general approach to move it to the actor pattern.

tho as is, Service still requires a lock that protects it from doing multiple Start() calls, where members of Service is mutated. more over, there is a ServiceClient that separates out data that need changed during the lifetime of this server.

Service and ServiceClient is not distinguishable from the user of this service. it serves the same purpose and have the same life-span.

i think the cleanest way to do this

  • just one struct Service (get rid of ServiceClient)
  • make Start() and SetChallengeVerifier() also submit the actual work to commands channel.
  • remove all mutex (as the only thing that will be mutated is the commands channel)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to start the loop when service is created too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @countvonzero. I'm going to move the start of the event loop to where the Service is created, which will allow me to remove the last mutex. I'm going to do this in a PR removing broadcasting as this is the only reason for delaying the Service start (it cannot operate w/o a broadcaster).

service/service.go Show resolved Hide resolved
service/service.go Show resolved Hide resolved
service/service.go Outdated Show resolved Hide resolved
service/service_test.go Show resolved Hide resolved
service/service.go Outdated Show resolved Hide resolved
service/service.go Outdated Show resolved Hide resolved
@poszu poszu requested review from fasmat and countvonzero December 9, 2022 09:39
service/service.go Outdated Show resolved Hide resolved
@poszu poszu requested a review from fasmat December 9, 2022 11:49
Copy link
Contributor

@countvonzero countvonzero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approving as there are more upcoming work for the poet spam attack.

giving the author the choice of addressing feedback in this PR, or leave TODO/FIXME now and address them in the upcoming PRs.

@poszu poszu merged commit 208756f into develop Dec 12, 2022
@poszu poszu deleted the 170-refactor-service-to-actor branch December 12, 2022 08:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants