You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
None that I've found. Very probably this is just applying momentum in the right place(s). :)
The setting I'm looking at is \min_{X\in C, Y} f(X + Y) + g(Y), where f is an expectation of smooth functions, g is non smooth and we can compute an LMO on C. Ideally I want to avoid variance reduction to apply this to neural nets: typically f will be non convex.
Thanks for the quick summary, @GeoffNN, very helpful =)
To be honest, I don't think this is relevant, but:
there is some work on stochastic FW for f + g where g is non smooth and either f or (f and g) are stochastic, e.g. Vladarean et al 2020. But these works generally use some kind of variance reduction and assume convexity. They also don't assume split operators so that the argument to g is also assumed to be in C. We will also have some work coming out soon on this topic which I will link to as soon as possible.
I am curious, why not variance reduction for neural nets?
Stochastic version of PR #52
The text was updated successfully, but these errors were encountered: