Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultisetExpression rework, again #203

Open
HighDiceRoller opened this issue Sep 14, 2024 · 5 comments
Open

MultisetExpression rework, again #203

HighDiceRoller opened this issue Sep 14, 2024 · 5 comments

Comments

@HighDiceRoller
Copy link
Owner

HighDiceRoller commented Sep 14, 2024

Basically the idea is that expressions outside of multiset_function would attach to generators, and expressions inside would attach to evaluators. This is mostly driven by cache reusability.

Here the "cut point" for unbinding would be at the first non-fully-bound expression node, rather than immediately at the generator.

@HighDiceRoller HighDiceRoller changed the title Revisit ExpressionGenerator MultisetExpression rework, again Sep 21, 2024
@HighDiceRoller
Copy link
Owner Author

HighDiceRoller commented Sep 21, 2024

This could turn into a more comprehensive rework of MultisetExpression. Issues to look at:

  • A lot of the overhead is due to the cost of function calls. I don't think it's conscionable to do opcodes and a giant if-else, but perhaps we can keep it to one function call per node. This may also call for inlining some of the current operators.
  • Somewhat against this, currently the expression nodes themselves are responsible for propagating next_state etc. to the subexpressions, which makes writing new expression types kind of annoying. Possible alternatives include:
    • We could have next_state call an abstract function that only handles the current node and implement the subexpression handling in the base class. Though this would introduce an extra function call per node.
    • We could put more of the intelligence in the evaluator, though I'm not sure this would work well on the generator side.
  • We may need to make next_state only consider local information anyways, because the generation and the evaluation will probably use it in different ways. The generator needs to produce a cacheable unit, which is currently an instance itself, and probably should remain so. On the other hand, the evaluator side uses an separate state object rather than an instance itself.

@HighDiceRoller
Copy link
Owner Author

Possible class hierarchy:

  • MultisetExpression
    • MultisetGenerator: Leaf nodes. Has no inputs.
    • MulitsetTransform: Interior nodes. Recursively inherits generator attributes from leaves.

@HighDiceRoller
Copy link
Owner Author

HighDiceRoller commented Dec 25, 2024

One annoyance: we want to be able to have the == operator for expressions to determine multiset equality, but we also need it to have a truth value for cache. Options:

  • Attach a truth value to the result. Only fully-bound expressions get cached, so the result is always a die?
  • Expressions are not hashable. Store a hash key inside the cache rather than the expression itself.

@HighDiceRoller
Copy link
Owner Author

HighDiceRoller commented Dec 26, 2024

Various TODOs:

  • Clean up "generator" vs. "expression" terminology inside MultisetEvaluator. Maybe use "input"?
  • We're again running into the question of which attributes should be properties and which should be methods.
    • I briefly had a bug where I was missing @property on _local_hash_key, which was causing some hashes to hash the function rather than its return value.
    • Going to revisit this later.
  • Now that any expression can be used as a cache key, should unbinding happen a) at the leaf nodes, or b) as far from the leaf nodes as possible, i.e. when we first find a fully-bound expression?
    • If unbinding happens far from the leaf nodes, the operator state would be stored in MultisetEvaluator._cache's keys, since the entire expression is the cache key; if unbinding happens at the leaf nodes, then the operator state would be stored in the cache's values, since we probably want to reuse the expression object as the state object. This may incline towards the former since it stores the operator state fewer times.
  • Is using flat iteration a good idea?
    • Going to revisit this later.
  • Figure out a good MultisetFunctionEvaluator.__str__().
  • Check output arity is equal to 1 in expressions?
  • Expression hashing has bad performance.
    • I'll separate this to another post.

HighDiceRoller added a commit that referenced this issue Dec 27, 2024
@HighDiceRoller
Copy link
Owner Author

HighDiceRoller commented Dec 29, 2024

The first attempt at the overhaul is done, and allows for persistent caching of chained expressions, but there is an issue: the performance of @multiset_function is much worse, roughly 3x. Much of the additional time is due to hashing. It seems that the previous implementation benefited from a smaller state representation for operations. When we perform an evaluation, the characteristics of the expression can be separated into three scopes:

  • The raw expression.
    • Implicitly, the tree structure of the expression.
  • The expression after _generate_initial. Currently this is only used by MixtureGenerator. (Maybe we can extend this to a MixtureExpression?)
  • The dynamic state during an evaluation, e.g. remaining keep_tuple, side advantage in match expressions, etc.

At time of writing, everything above is appended to each evaluator state, which makes for relatively expensive hashing. This is in contrast to the old version, which effectively hashed only the last category, and is wasteful since the data in the first two scopes are constant during any evaluation. To get back this performance, we could re-separate the data in these scopes.

  • The evaluator cache might gain another level, becoming initialized_expression -> dynamic_expression_state (including generators?) -> evaluator_state -> quantity.
    • Bound expressions could still produce multiple initialized_expressions, so they are not necessarily constant per evaluator instance.
  • Should dynamic expression state be stored flat or nested? Should external expressions be evaluated in a separate stage or should we consider this as a single DAG?
  • How separate should generators and operators be?
    • Generators tend to care more about order (generate_min vs. generate_max vs. next_state) and can produce multiple results per update.
    • All generators have purely dynamic state?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant