Remove race in checking for sigma #164

btracey · 2017-02-23T00:15:06Z

Methods in Normal and StudentsT should either use sigma or not, they shouldn't conditionally use sigma depending on if it has been set. The current code races with setSigma(). Fixing it with a mutex is not strictly a data race, but it does mean the behavior of the program depends on execution order. Remove this behavior entirely by just computing the relevant entry of the covariance matrix.

btracey · 2017-02-23T00:16:02Z

Fixes #163

btracey · 2017-02-23T00:16:33Z

PTAL @kortschak . This fixes the race while also removing complexity in the implementation.

btracey · 2017-02-23T00:19:35Z

Opened #165 to address the other issue you brought up in the discussion of #163

kortschak · 2017-02-23T00:22:36Z

I'm not convinced (I am opposed); I don't believe there is a race if the type is used correctly - i.e. with a lock provided by the user. What is the performance penalty here in the case with n.sigma == nil?

btracey · 2017-02-23T00:32:28Z

The performance penalty is O(n) if sigma has been set. The bigger picture though is that it seems agreed that the original code is bad in general. See the discussion in golang/go#19245 and golang/go#6123 . Even though this isn't strictly a data race in the corruption sense of the word, the code paths executed depend on the execution order, which isn't specified. It seems that it's accepted this kind of code is a bad idea.

If we want to keep that code path behavior, I strongly believe we should do it interior to the struct and not force the user to do the synchronization. It's difficult to know which methods need to the covariance matrix, and which methods do not. For example, it's likely somewhat surprising (to the average user) you can find the marginal in the first place without needing to compute the covariance matrix. Having the synchronization within the struct means all methods can be called in parallel. Having the synchronization outside the struct means there is a complicated graph of methods that are okay to call in parallel with one another.

kortschak · 2017-02-23T00:54:44Z

The performance penalty is O(n) if sigma has been set.

Yes, but in practice, what does that measure as. The issue is whether the O(n) is in practice worse than locking or other checks.

I just did an experiment on removing the sync.Once (just using the state of the sigma field) and I see this:

MarginalNormalReset10-8  8.01µs ± 3%  7.66µs ± 2%  -4.37%  (p=0.000 n=18+19)

My position on the race behaviour is that it isn't a race unless you try hard for it to be one - I don't yet see an argument that changes my mind on this. If there were a significant benefit from being able to call all the methods in parallel, then maybe, but that does seem rather unlikely to me. Can you explain the magnitude of the benefit from being able to do that? At the moment, it is simple and fast. With locking it may be simple, but slower for most cases with (from a quick survey) a reasonably non-trivial locking scheme.

The scheme would use a sync.RWKLock with the following methods.

// setSigma computes, stores and returns the covariance matrix of the distribution.
func (n *Normal) setSigma() *mat64.SymDense {
	defer n.mu.Unlock()
	n.mu.Lock()
	if n.sigma == nil {
		n.sigma = mat64.NewSymDense(n.Dim(), nil)
		n.sigma.FromCholesky(&n.chol)
	}
	return n.sigma
}

// getSigma returns the current state of the covariance matrix of the distribution.
func (n *Normal) getSigma() *mat64.SymDense {
	defer n.mu.RUnlock()
	n.mu.RLock()
	return n.sigma
}

You can see here that most paths hit the slower lock path (ignoring contention which is likely to be rare I would guess). We could put in a call to getSigma in setSigma, check the return is nil and return early if it is, then take the slower path, but that's getting quite baroque.

btracey · 2017-02-23T01:47:44Z

So, first of all, I discovered this race because it comes up in a real program with mine. I didn't try "hard to" see it, I was just calling the methods in parallel because we had designed the methods to be able to be called in parallel (or so I thought).

The actual code I have is complicated, but in short,
Loop over the dimension of the Normal:

Call MarginalSingle for that dimension.
Use quad to integrate over the marginal distribution and:
2a) Compute the Quantile based on a bunch of numbers
2b) Use ConditionNormal to get an updated normal distribution based on a complicated function of the quantiles
2c) Return some value based on the returned distribution

I then today changed my code to do each dimension in parallel. Each dimension is independent, so this should be a linear speedup (in particular, note that I'm not generating random numbers, so there's no use of src). The race condition happens because ConditionNormal sets sigma, while MarginalSingle checks if sigma == nil, which is a race condition when called concurrently.

Assuming the sync.Once is removed, I have two options:

I can copy the Normal distribution onto each of k <= n Goroutines. This costs me O(k*n^2) extra memory, because each Normal individually takes O(n^2) memory. This then also costs me O(k*n^3) in computation, because I need to compute sigma in each individual goroutine separately, whereas before it only got computed once.
I can use a mutex to prevent any instance of MarginalSingle or ConditionNormal from being computed at the same time. Before there was no contention between the goroutines but now there is contention at 2 out of the 3 steps. This significantly reduces the effective concurrency (from what had been linear) since ConditionNormal is a significant chunk of the computation time. There's no way to use the mutex judiciously.

If we really want the synchronization to go away, then I think we have two options

Just always store sigma. This is probably what people expect anyway given other implementations. It is unfortunate, however, as it makes Normal impossible to use without an O(n^3) operation. Right now, one can use NewNormalChol if they already have the Cholesky. It's not super rare to know the Cholesky, for example I believe one can do a Bayesian update to a Normal from one new sample with a rank-one update.
Make SetSigma public and document all of the functions that need sigma to be set. This solves my issue above, as I could call SetSigma once before the concurrent loop (which was the only place of contention). I don't really like this though, because first of all it means that users must be much more knowledgeable about how to use the methods. I suspect with this implementation scheme there will be frequent panics due to forgetting to call setSigma, while at present all methods just work after construction. Secondly, at that point it kind of feels like we should have a storeSigma bool field to the NewNormal routines.

kortschak · 2017-02-23T02:00:05Z

The third option for your use case is to cause setSigma to be called once at the start since you know that there will be concurrent reads. Then you have no races and no need for synchronisation. This is essentially what you are suggesting in solution 2, but without exposing the knob for that. This is obviously not ideal in the longer term. I agree that exposing SetSigma opens up issues with incorrect use, and probably should be avoided - though note that we could expose it and still call it as happens now, with a note about why you might want to call it first.

I think there is a third option: keep the behaviour we have now (but with the sync.Once removed as I have here - and possibly exposing SetSigma) and provide an example of how the type could be used concurrently. This highlights that it is not thread safe, and gives people an concrete example of why it might be useful to hint at pre-calculation.

btracey · 2017-02-23T02:10:04Z

Why don't we just keep sigma? It removes all of these issues entirely, doesn't cost that much, and even saves cost if you're setting a Normal from a covariance matrix and then need to use one of the methods that set sigma.

If it happens to be in the (quite specific) case where 1) you have the Cholesky and 2) you can update it without reconstructing the covariance, then you can just implement those particular cases themselves. I suspect the most common operation in that case is LogProb, and the implementation is 2 lines of code.

kortschak · 2017-02-23T02:11:59Z

What do you mean by keep sigma? Retain the *mat64.SymDense that has been handed in?

I don't have an example, but this seems pretty resonable https://github.com/gonum/stat/tree/setsigma. What do you think?

btracey · 2017-02-23T02:19:49Z

Yes, I mean retaining the original *mat64.SymDense.

The main advantages are implementation simplification (i.e. this discussion), retain no "gotchas" with concurrency (don't have to call SetSigma first), saving computational cost (don't have to recompute sigma), and no loss of precision in reconstructing sigma (for example if the condition number of the original matrix was poor).

Exposing SetSigma is okay, but I don't like that it adds caveats to writing correct concurrent code as opposed to correct serial code. My past exposure having methods like that is not positive, because I've found it tends to compose poorly with interfaces. At that point, you start calling SetSigma just in case, at which point we may as well just store it in the first place.

btracey · 2017-02-23T02:25:41Z

I can provide an example of it composing poorly with interfaces if you'd like

kortschak · 2017-02-23T02:35:57Z

I can provide an example of it composing poorly with interfaces if you'd like

Yes please.

While there are fewer gotchas with concurrency in one direction, there is the caveat that must be adhered to that the sigma passed in must not be altered by the user. I don't know which is worse.

btracey · 2017-02-23T02:36:43Z

Sorry, my mistake. I don't mean retaining the original Sigma, I mean retaining a copy of it.

btracey · 2017-02-23T02:39:53Z

And I additionally mean constructing a new one in the case of NewNormalChol. That is, when a Normal is created, it has a valid sigma, and none of the methods change that.

btracey · 2017-02-23T06:24:32Z

The main issue with interfaces is that concurrency is built from the bottom up. If any piece of the puzzle cannot be called concurrently, then the whole routine cannot be called concurrently. In contrast, if one is careful to make sure that each piece works well in a concurrent environment, then it's easy to throw a loop around the last part and gain big parallel speedups. The lower-level the break in concurrency, the larger the gymnastics necessary to work around it. All of Gonum's components work well with concurrency, and so it's really easy to make gonum code concurrent. As an extreme contrast, a python package I interact with creates temporary files, and so it's not even clear to me I can launch multiple executions of it with different parameters.

In the particular case, my code consists of a number of packages which I call in different ways. The fact that there are these multiple packages is important for me to manage the complexity of everything.

Two of the fundamental interfaces are

type Marginal interface {
    Quantile(p float64) float64
    Rand() float64
}

// Belief represents a belief over functions at a finite set of xs.
type Belief interface {
	// Condition returns an updated belief over the functions conditioned on
	// the new marginal. The returned belief is n-1 dimensional.
	ConditionSingle(idx int, y float64) Belief
	SingleMarginal(idx int) Marginal
	// Card is the number of X locations
	Dim() int

There are shell types which easily wrap Normal and StudentsT, as well as some types I have for myself. The pseudo-code I have above was one step of the algorithm, but I really do that recursively. Based on the result from quad, I get a specific {idx, y} pair, this gives me a new Belief, and then I run the procedure again. In a typical distribution, these methods should not have any problems with being concurrent. ConditionSingle returns a new belief, so it can create all its own memory, SingleMarginal doesn't need to write any data to the belief, and Dim is just a getter method. With that assumption correct, I can then introduce parallelism at whatever level I need. If I'm only doing one evaluation, that can be done during the call to quad, if I'm doing a Monte Carlo over the entire procedure, I can just introduce the goroutines at the highest level.

When writing the wrapper type for Normal to implement Belief, it has no way of knowing that it will be called in parallel. If the type is implemented such that all of the methods can be called in parallel, then it doesn't matter. Furthermore, there's no way with the interface to communicate properties that come up here, such as "Don't call SingleMarginal and ConditionSingle concurrently". This restriction is not a property of the interface, there's no reason why they can't be called at the same time for some implementations, this is a property of a specific type. As a result, if I want to maintain the flexibility of a concurrent implementation, I will always call SetSigma after constructing a Normal.

I don't think this is specific to my code. I want to enable concurrency, and so concurrency must be kept in mind at each level. It seems, thus, that any time someone wants to generate a Normal to be used as a building block, they have to call SetSigma, or they are severely limiting the concurrent ability of the client.

At that point, we should just SetSigma ourselves, to reduce the future errors, or we should find a workaround (such as Once and RWMutex) so that all methods that should be able to be called in parallel are actually called in parallel.

kortschak · 2017-02-23T23:06:34Z

Furthermore, there's no way with the interface to communicate properties that come up here, such as "Don't call SingleMarginal and ConditionSingle concurrently". This restriction is not a property of the interface, there's no reason why they can't be called at the same time for some implementations, this is a property of a specific type. As a result, if I want to maintain the flexibility of a concurrent implementation, I will always call SetSigma after constructing a Normal.

Interfaces are defined not just by the methods that they implement, but by the documentation that goes with them. The implied documentation that you have for the interfaces above is that the types being used are safe for concurrent use. This is not true for Normal and StudentsT in one situation - where a marginal single is called concurrently when a conditional single is called (or other methods that set sigma). However, it can be trivially made safe by ensuring that the marginal single happens after the conditional single (or set sigma). This isn't always necessary and it is true that there are many non-concurrent idioms that need more than just the prefixing of a go keyword to code blocks in a for loop - concurrent read/write to maps is a good example.

It seems, thus, that any time someone wants to generate a Normal to be used as a building block, they have to call SetSigma, or they are severely limiting the concurrent ability of the client.

Only if they are wanting to perform the operations above concurrently. The only requirement is that people think about the work that is being done and we provide adequate documentation about what should be done in particular circumstances.

At that point, we should just SetSigma ourselves, to reduce the future errors, or we should find a workaround (such as Once and RWMutex) so that all methods that should be able to be called in parallel are actually called in parallel.

The synchronisation approaches are fraught or baroque, I would like them to go away here. I remain convinced that this is less of a problem than you make out, but I think you care about it more than I do, so I will relent and ask that we go with the copying of the passed in sigma in the case that it is provided and generated it at construction if given a Cholesky decomposition. The arguments that convince me about this are less the issue with concurrency and more the issues relating to numerical stability in the case that an exact sigma has been provided, but may be degraded on reconstruction.

btracey · 2017-03-07T22:46:29Z

As one last thing:

I agree with you that interfaces are defined by a documentation and methods.

I think the real issue here is not about concurrency at all, but about side effects. In general, functions and methods should not have side effects that are visible, unless it is necessary for them to have those side effects. Non-obvious side effects are hard to remember, and so easy to cause bugs, and also increase the entanglement in the "dependency graph", which good code keeps disentangled as much as possible.

My argument isn't* so much that all methods on all types should be able to be called in parallel (the types in optimize cannot, and there is no good way around that), but that there shouldn't be unnecessary side effects (as is the current case under setSigma). If there are no side effects, then concurrency is enabled. I suspect this intuition is also partially why you feel the synchronization types are baroque.

Closing to open the pull request with the intended fix.

*isn't any more. It was, and I still think we should enable concurrency as much as practical, but I've come to a better understanding of the heart of the issue.

kortschak · 2017-03-07T22:55:45Z

Fair enough. The side effects though are only visible in a concurrent access regime.

The synchronisation types themselves are not baroque, but the complexity of synchronisation actions required for this particular system would be. Synchronisation is just a thing, when you have multiple competing accesses the collections of those things become difficult to reason about.

btracey · 2017-03-07T23:01:07Z

Agreed on all accounts, with the addition that synchronization becomes much more difficult when there are a lot of side effects and effects that are only visible concurrently.

btracey closed this Mar 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove race in checking for sigma #164

Remove race in checking for sigma #164

btracey commented Feb 23, 2017

btracey commented Feb 23, 2017

btracey commented Feb 23, 2017

btracey commented Feb 23, 2017

kortschak commented Feb 23, 2017

btracey commented Feb 23, 2017

kortschak commented Feb 23, 2017

btracey commented Feb 23, 2017

kortschak commented Feb 23, 2017

btracey commented Feb 23, 2017

kortschak commented Feb 23, 2017

btracey commented Feb 23, 2017

btracey commented Feb 23, 2017

kortschak commented Feb 23, 2017

btracey commented Feb 23, 2017

btracey commented Feb 23, 2017

btracey commented Feb 23, 2017

kortschak commented Feb 23, 2017

btracey commented Mar 7, 2017

kortschak commented Mar 7, 2017

btracey commented Mar 7, 2017

Remove race in checking for sigma #164

Remove race in checking for sigma #164

Conversation

btracey commented Feb 23, 2017

btracey commented Feb 23, 2017

btracey commented Feb 23, 2017

btracey commented Feb 23, 2017

kortschak commented Feb 23, 2017

btracey commented Feb 23, 2017

kortschak commented Feb 23, 2017

btracey commented Feb 23, 2017

kortschak commented Feb 23, 2017

btracey commented Feb 23, 2017

kortschak commented Feb 23, 2017

btracey commented Feb 23, 2017

btracey commented Feb 23, 2017

kortschak commented Feb 23, 2017

btracey commented Feb 23, 2017

btracey commented Feb 23, 2017

btracey commented Feb 23, 2017

kortschak commented Feb 23, 2017

btracey commented Mar 7, 2017

kortschak commented Mar 7, 2017

btracey commented Mar 7, 2017