Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix double std::move on objects in 'lenses::getset' on set call #165

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

dimula73
Copy link
Contributor

@dimula73 dimula73 commented Nov 21, 2022

This is the final version of the patch that fixes both, double-move problem and extra getter call on set(lens,...) operation.

The patch ensures the following requirements:

  1. When view(lens, whole) is called:

    • the setter lambda is neither called nor constructed. It avoids unnecessary copies of the 'whole' object.

    • the whole object is directly forwarded into the getter lambda, without any copies.

  2. When set(lens, whole, part) is called:

    • no getter is called at the toplevel lens (which value was previously discarded).

    • on the lower lenses, the getter receives an 'lvalue' reference to the whole object, and setter receives an 'rvalue' reference to the whole object. It ensures that there is no double-move problem.

Fixes #160

Basically, we shouldn't calculate the getter when we know that the
value will be dropped.

The patch partially fixes arximboldi#160, because now the getter is not called
at all during the set operation. But the double-move can still
happen when both, getter and setter calls are performed.
error: 'auto' not allowed in function prototype
@codecov-commenter
Copy link

codecov-commenter commented Nov 22, 2022

Codecov Report

Merging #165 (0b6ab3e) into master (24887ac) will increase coverage by 0.19%.
The diff coverage is 96.68%.

@@            Coverage Diff             @@
##           master     #165      +/-   ##
==========================================
+ Coverage   94.30%   94.50%   +0.19%     
==========================================
  Files          80       80              
  Lines        2616     2910     +294     
==========================================
+ Hits         2467     2750     +283     
- Misses        149      160      +11     
Impacted Files Coverage Δ
test/lenses.cpp 98.31% <96.37%> (-1.69%) ⬇️
lager/lenses.hpp 100.00% <100.00%> (ø)
lager/lenses/attr.hpp 100.00% <100.00%> (ø)
lager/lenses/unbox.hpp 100.00% <100.00%> (ø)
lager/context.hpp 88.23% <0.00%> (-2.68%) ⬇️
lager/lenses/optional.hpp 92.30% <0.00%> (-1.03%) ⬇️
lager/lenses/at.hpp 100.00% <0.00%> (ø)
lager/lenses/tuple.hpp 100.00% <0.00%> (ø)
test/type_erased_lens.cpp 100.00% <0.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@dimula73 dimula73 changed the title WIP: Fix extra call to 'getter' in 'lenses::getset' on set call Fix double std::move on objects in 'lenses::getset' on set call Dec 22, 2022
@dimula73
Copy link
Contributor Author

Hi, @arximboldi!

The patch should now be ready for the final revew and merge. It fixes all the known issues with use-after-move and adds a comprehensive set of tests for that :)

@@ -103,15 +224,10 @@ namespace lenses {
//! @{

template <typename Getter, typename Setter>
auto getset(Getter&& getter, Setter&& setter)
auto getset(const Getter& getter, const Setter& setter)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note: the change to const Getter& getter is not a regression, because they were previously bound to a const reference inside a const (immutable) lambda.

The patch ensures the following requirements:

1) When `view(lens, whole)` is called:

   * the setter lambda is neither called nor constructed.
     It avoid unnecessary copies of the 'whole' object.

   * the `whole` object is directly forwarded into the getter lambda,
     without any copies.

3) When `set(lens, whole, part)` is called:

   * no getter is called at the toplevel lens (which
     value was previously discarded).

   * on the lower lenses, the getter receives an 'lvalue'
     reference to the `whole` object, and setter receives an
     'rvalue' reference to the `whole` object. It ensures
     that there is no double-move problem.

Fixes arximboldi#160
When the moved-from object is assigned, it becomes valid again. None
of our code seems to use it though.
Such lens causes:

1) Compilaion failure due to the missing const functor in getset_t
   (also fixed in the patch)

2) A crash caused by double-move in the attr implementation
   (cannot be fixed without rewriting attr into getset)
The previous implementation cased severe issues:

1) Double-move of `p` argument when passing a prvalue into a
   nested chain of two attr lenses, which usually causes a
   crash.

2) `f = LAGER_FWD(f)` creates a `const decltype(f) f` object
   inside the lambda, which prevents the value to be perfect-
   forwarded with `identity_functor` and `const_functor` when
   setting/fetching it.
});
};
});
return getset(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @arximboldi!

I had to rewrite attr into (the fixed version of) getset, because otherwise it also caused double-move when nested and passed with a prvalue. Basically, a code like that is enough to crash it:

auto birthday_lens = attr(&debug_person::birthday);
auto person_lens = attr(&debug_freelancer::person);
auto freelancer_birthday_lens = person_lens | birthday_lens;

auto birthday =
    set(freelancer_birthday_lens,
        debug_freelancer(freelancer_copy_info, person_copy_info, birthday_copy_info),
        debug_yearday(birthday_copy_info));

I have a weird feeling like all the lenses in lager have to be rewritten like that... :( Not only they do double-move, capturing functors into immutable lambdas prevents proper forwarding of the identity functors in view and set.

The functor should be forwarded into the lens.
@arximboldi
Copy link
Owner

Hi @dimula73 !

Thank you a lot for looking into this... this is definitelly not an easy problem... Oh, and before we dive right in, happy holidays!!

I have some questions, firstly, you mention the view and set cases, what about the over case? In that case it sounds to me, the getter needs to be called.

Also, with regards to double moves... I can imagine some cases where double-moves are saves, for example, somewhere where what happens is something like this:

struct S { std::string a; };

...
auto x = S{"..."};
auto b = std::move(s).a;
auto y = std::move(x);
y.a = std::move(b);
return y;

But I get that what is happening in the following invalid scenario, correct?

...
auto x = S{"..."};
auto&& b = std::move(s).a;
auto y = std::move(x);
y.a = std::move(b);
return y;

There is however something that makes me a bit unease about just trying to apply all sorts of heuristics in that getset implementation... It looks very fiddly, and how can other lenses implementors then provide their own correct lenses in Van-Laarhoven style? Do you expect simply users to implement all their lenses in terms of getset instead of in Van-Laarhoven style?

I guess that's fair enough... at the end of the day, lenses implemented with getset are more intelligible for the general public. It has the problem though that then you need to capture the context twice instead of once (in the getter and setter)...

Which makes me wonder... should we just ditch the Van-Laarhoven representation, and use a simpler lenses definition of the form:

struct name_lens 
{
  decltype(auto) get(auto&& x) { 
    return x.name; 
  }
  auto set(auto x, auto&& y) { 
    x.name = FWD(y);
    return x; 
  }
};

The problem, of course, is lens composability: we could then not use zug::comp to compose them, but we would need to provide our own composition. Not very hard though with something like:

template <class L1, class L2>
struct composed_lens 
{
  L1 l1;
  L2 l2.
  decltype(auto) get(auto&& x) { 
    return l1.get(l2.get(FWD(x))); 
  }
  auto set(auto x, auto&& y) { 
    l1.set(x, l2.set(l1.get(x), FWD(y))));
    return x; 
  }
};

There are still some inefficiencies for some cases because basically we do not know which parts of are consumed by the nested lenses. Sometimes we could do moves that we are not doing if we know that the nested lenses is consuming things indepdendly to some of the other stuff. This can also be said about some of the cases for the Van-Laarhoven lens. Also in this case over may incur extra cost, but maybe in practice we use over less than the other lenses...

Anyway... long story to say that I'm not sure what to do about this. There is some elegance about the Van-Laarhoven lens that maybe it would be nice to preserve. I wonder if we could also improve things by changing the way we call into the functors instead... E.g. instead of passing the value directly into the functor on the "getter bit", we can pass a lambda that returns the value, basically we simulate lazy-evaluation by explicitly passing thunks, for example:

template <typename Member>
auto attr(Member member)
{
    return zug::comp([member](auto&& f) {
        return [&, f = LAGER_FWD(f)](auto&& p) {
            return f([&p] { return LAGER_FWD(p).*member; )})([&](auto&& x) {
                auto r    = LAGER_FWD(p);
                r.*member = LAGER_FWD(x);
                return r;
            });
        };
    });
}

...
// const_functor would not change, but identity_functor would:
template <typename T>
struct identity_functor
{
    T value;

    template <typename Fn>
    auto operator()(Fn&& f) &&
    {
        return make_identity_functor(
            std::forward<Fn>(f)() // extra () here!
              (std::forward<T>(value)));
    }
};

That solves the problem of not calling the getter too often when we are only interested in the setter part.

There is still the problem of being safe for moves without introducing too many intermediate copies. I guess to really do that safely in a general performant way we need to somehow annotate the lens to know wheter it's "independent" or not. E.g whether this is safe or not:

auto lens = ...;
auto whole =  ...;
auto part = view(lens, std::move(whole));
auto whole2 = set(lens, std::move(whole), std::move(part));

For most lenses that just "zoom in" into an individual part, that holds. But for some others that's not generally safe...

Anyway... lots of food for thought... But I agree we should find a solution at some point...

@dimula73
Copy link
Contributor Author

Hi, @arximboldi!

Happy holidays to you too! :)

I have some questions, firstly, you mention the view and set cases, what about the over case?

Speaking truly, I haven't much tested it. When I looked into it originally, it seemed to me that it doesn't have the extra-get-call issue. Though I'm not sure anymore. It would be better to test it.

But I get that what is happening in the following invalid scenario, correct?

I'm not sure I understand what you mean by s variable in your example. What happens there is:

struct S { std::string a; };
struct T { S s; };

auto lens = attr(&T::s) | attr(&S::a);

// in scenario when we build a lens for a prvalue, T&& binds to 
// it directly and becomes an lvalue
S toplevel_getter(T t) {
    return T.s;
}

// same thing happens in the setter, because it is 
// passed with a prvalue
S toplevel_setter(S s, T t) {
    s.t = t;
    return s;
}

T set_the_top_lens(T t, std::string a) {
    // std::forward translates into std::move for a prvalue
    S subordinate = toplevel_getter(std::move(t));

    // ask the lower level lens
    subordinate = lowerlevel_setter(std::move(subordinate ), a);   

    // std::forward translates into std::move again, so it
    // moves `t` again :(
    return toplevel_setter(std::move(t), subordinate );
}

Do you expect simply users to implement all their lenses in terms of getset instead of in Van-Laarhoven style?

Yes...

...at the end of the day, lenses implemented with getset are more intelligible for the general public. It has the problem though that then you need to capture the context twice instead of once (in the getter and setter)...

Could you elaborate a bit on that topic? As far as I can see, even in the current implementation we do capture the context for the setter even when it is not called. Or you mean something else?

There is some elegance about the Van-Laarhoven lens that maybe it would be nice to preserve

Speaking truly, I don't much find it elegant :) Even after spending really lots of hours with it this year [1] I still cannot reason about it without the help of the debugger. I kind of understand how to write a lens and how to use it, but I cannot reason about internals of it. I just don't know how to visualize it on a piece of paper, which is usually not a good sign :)

E.g. instead of passing the value directly into the functor on the "getter bit", we can pass a lambda that returns the value, basically we simulate lazy-evaluation by explicitly passing thunks

It solves only one(!) issue of the three(!). Namely, it solves the issue of making an extra get() call in the toplevel lens. It neither solves the double-move problem in nested lenses, nor extra copy-constructions in identity_functor and const_functor when passing the value to and fro.

template <typename Member>
auto attr(Member member)
{
    return zug::comp([member](auto&& f) {
        return [&, f = LAGER_FWD(f)](auto&& p) {
//                 ^
//                 + - problem number one: here we create
//                     `const identity_functor<T> f` whose
//                     constness prevents moves of the value;
//                     the value is **copied** at every step
//                     of the call chain

            return f([&p] { return LAGER_FWD(p).*member; )})
//                    ^                      ^
//                    + -------------------- + - the first move happens 
// somewhere here, I'm not sure where

                ([&](auto&& x) {

                auto r    = LAGER_FWD(p);
//                                    ^
//                                    + - second move
                r.*member = LAGER_FWD(x);
                return r;
            });
        };
    });
}

Now when thinking about it, I think that Van-Laarhoven lenses have two requirements that we break when trying to use them in C++:

  1. Van-Laarhoven lens expects founctions to be pure, which is not true in C++, where a function can move-consume its arguments, hence have a side-effect;

  2. Van-Laarhoven lens expects a guarantee that a pure function is not called when its result is known to be discarded. [2]

  3. C++'s tools (perfect forwarding, RVO and NRVO) are supposed to optimize function return values, but Van-Laarhoven lens returns the value using lambda-capture and class-members, which are not optimized.

Theoretically, if we make all our functions inside lenses work with const-ref only, we will make them somewhat "pure". It might solve all these three issues. But...

It will require Lager to drop this requirement:

auto name = attr(&person::name);
auto birthday_month = attr(&person::birthday) | attr(&yearday::month);

CHECK(&view(name, p1) == &p1.name);
int& x = view(birthday_month, p1);

This requirement doesn't seem to be useful for the Lager's cursors framework, but it might be a nice feature of lenses in general.

We could actually have two sets of lenses: const and non-const, but it would be really painful to implement with lambdas anyway...

Which makes me wonder... should we just ditch the Van-Laarhoven representation, and use a simpler lenses definition of the form

As you might have guessed already, I would support this decision ;) Actually, my first attempt to fix the issue was to implement a separate subset of lenses without the use of Van-Laarhoven representation. But I wanted them to be interchangeable and composable with stuff declared with zug::comp, which appeared to be impossible.

PS:
Even though I haven't really found the Van-Laarhoven lenses elegant or nice, I think transduces are a nice thing. I've been actually thinking about pairing them with SYCL engine. It could potentially make some of our algorithms much simpler and maintainable. Though now, when I found out about these capture/forward issues, I have some doubts... do they cause the same move/copy issues?

[1] - I spent almost 11 months on porting Krita's brush editor into lager. In case you are interested in our usage of lager, here is the working branch and the documentation.

[2] - I'm not familiar with Haskel, does it have such a guarantee?

kdesysadmin pushed a commit to KDE/krita that referenced this pull request Mar 11, 2023
It fixes a critical bug, so we'd better use this one.

The upstream MR:
arximboldi/lager#165
@arximboldi
Copy link
Owner

Sorry @dimula73 for having left for so long. Your comment on #175 about how this is a must for you to use makes me think that we should probably find a workable solution for this.

Pinging @Siapran as she has ended up digging way deeper than me into lenses. What do you think @Siapran about this PR? Or should we go one step further and completely ditch the Van-Laarhoven lenses?

@dimula73
Copy link
Contributor Author

dimula73 commented May 30, 2023

Hi, @arximboldi and @Siapran!

Is there any update on the topic? We plan to make the first public (beta-) release of Krita on July, 17th, so it would be nice to have at least some version of Lager released by then (to make Linux distributions happy).

Our final release deadline set on the 1st of September, so distributions will start running like crazy after lager on that date, unless we invent anything.

On the topic of the MR: Krita can technically work without this MR because all our lenses use const-refs for the reading part, so it should be safe. The only worry might be that lager may change API a bit between different released versions. Which, I guess, is fine if the first released version is something like 0.1.0 :)

@arximboldi
Copy link
Owner

arximboldi commented Jun 26, 2023

Hi @dimula73 !

I have been meditating on this and I have decided that I am cool with moving away from Van-Laarhoven lenses. That's a big revamp but if you wanna to take on that job I will be happy to accept that PR. We would need to provide a get/setter based representation supported by our own definition of comp and operator|. To be fair, I think everything would become so much simpler then. The only downside is that sometimes there would be an extra capture in memory (one for the getter and one for the setter) when using lambdas to define the lens, which is one of the reasons I thought initially Van-Laarhoven was also interesting from a performance POV. In many cases we can work around that by defining an object for the lens though.

@arximboldi
Copy link
Owner

The question remains, what to do about this one? The implementation is complicated and I haven't even managed to work through all the details. However the unit testing is very rigurous, so I would be happy to merge it in a leap of faith, if you think this helps as an interim step before we move away from Van-Laarhoven lenses.

@dimula73
Copy link
Contributor Author

The question remains, what to do about this one?

I guess the project should be split into two independent steps: one to port all the lenses to getset() and the other one to implement getset() via some different implementation.

What do you think? Are there any lenses that can be implemented easier without getset(), but via some base API?

@arximboldi
Copy link
Owner

Hmmm... I think some lenses could be implemented more efficiently without getset (one closure vs two) however moving from a getset based implementation to a direct one should be easy (it would still be two functions, but sharing the closure). This is what I mean specifically:

A getset based implementation for attr

template <typename Member>
auto attr(Member member)
{
  return getset(
    [member] (auto&& x) -> decltype(auto) {
       return LAGER_FWD(x).*member;
    },
    [member] (auto x, auto&& v){
       x.*member = LAGER_FWD(v);
       return x;
    })
}

Vs. a "direct" one:

template <typename Member>
struct attr_t
{
  Member member;

   template <typename T>
   decltype(auto) get (T&& x) {
      return LAGER_FWD(x).*member;
   }

   template <typename T, typename U>
   T set(T x, U&& v) {
      x.*member = LAGER_FWD(v);
      return x;
   }
}

template <typename Member> auto attr(Member m) { return attr_t<Member>{m}; }

They are very similar they just change in how we close over member in order to share the data between getter and setter.

Am I making sense?

@dimula73
Copy link
Contributor Author

Yes, this case sounds sensible

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use of destructed object in lager::lenses::getset
3 participants