`MultisetExpression` rework, again #203

HighDiceRoller · 2024-09-14T00:36:35Z

Basically the idea is that expressions outside of multiset_function would attach to generators, and expressions inside would attach to evaluators. This is mostly driven by cache reusability.

Here the "cut point" for unbinding would be at the first non-fully-bound expression node, rather than immediately at the generator.

The text was updated successfully, but these errors were encountered:

HighDiceRoller · 2024-09-21T05:36:40Z

This could turn into a more comprehensive rework of MultisetExpression. Issues to look at:

A lot of the overhead is due to the cost of function calls. I don't think it's conscionable to do opcodes and a giant if-else, but perhaps we can keep it to one function call per node. This may also call for inlining some of the current operators.
Somewhat against this, currently the expression nodes themselves are responsible for propagating next_state etc. to the subexpressions, which makes writing new expression types kind of annoying. Possible alternatives include:
- We could have next_state call an abstract function that only handles the current node and implement the subexpression handling in the base class. Though this would introduce an extra function call per node.
- We could put more of the intelligence in the evaluator, though I'm not sure this would work well on the generator side.
We may need to make next_state only consider local information anyways, because the generation and the evaluation will probably use it in different ways. The generator needs to produce a cacheable unit, which is currently an instance itself, and probably should remain so. On the other hand, the evaluator side uses an separate state object rather than an instance itself.

HighDiceRoller · 2024-12-25T04:36:15Z

Possible class hierarchy:

MultisetExpression
- MultisetGenerator: Leaf nodes. Has no inputs.
- MulitsetTransform: Interior nodes. Recursively inherits generator attributes from leaves.

HighDiceRoller · 2024-12-25T05:48:59Z

One annoyance: we want to be able to have the == operator for expressions to determine multiset equality, but we also need it to have a truth value for cache. Options:

Attach a truth value to the result. Only fully-bound expressions get cached, so the result is always a die?
Expressions are not hashable. Store a hash key inside the cache rather than the expression itself.

HighDiceRoller · 2024-12-26T04:07:17Z

Various TODOs:

Clean up "generator" vs. "expression" terminology inside MultisetEvaluator. Maybe use "input"?
We're again running into the question of which attributes should be properties and which should be methods.
- I briefly had a bug where I was missing @property on _local_hash_key, which was causing some hashes to hash the function rather than its return value.
- Going to revisit this later.
Now that any expression can be used as a cache key, should unbinding happen a) at the leaf nodes, or b) as far from the leaf nodes as possible, i.e. when we first find a fully-bound expression?
- If unbinding happens far from the leaf nodes, the operator state would be stored in MultisetEvaluator._cache's keys, since the entire expression is the cache key; if unbinding happens at the leaf nodes, then the operator state would be stored in the cache's values, since we probably want to reuse the expression object as the state object. This may incline towards the former since it stores the operator state fewer times.
Is using flat iteration a good idea?
- Going to revisit this later.
Figure out a good MultisetFunctionEvaluator.__str__().
Check output arity is equal to 1 in expressions?
Expression hashing has bad performance.
- I'll separate this to another post.

…ion of `@multiset_function` #203

HighDiceRoller · 2024-12-29T01:43:20Z

The first attempt at the overhaul is done, and allows for persistent caching of chained expressions, but there is an issue: the performance of @multiset_function is much worse, roughly 3x. Much of the additional time is due to hashing. It seems that the previous implementation benefited from a smaller state representation for operations. When we perform an evaluation, the characteristics of the expression can be separated into three scopes:

The raw expression.
- Implicitly, the tree structure of the expression.
The expression after _generate_initial. Currently this is only used by MixtureGenerator. (Maybe we can extend this to a MixtureExpression?)
The dynamic state during an evaluation, e.g. remaining keep_tuple, side advantage in match expressions, etc.

At time of writing, everything above is appended to each evaluator state, which makes for relatively expensive hashing. This is in contrast to the old version, which effectively hashed only the last category, and is wasteful since the data in the first two scopes are constant during any evaluation. To get back this performance, we could re-separate the data in these scopes.

The evaluator cache might gain another level, becoming initialized_expression -> dynamic_expression_state (including generators?) -> evaluator_state -> quantity.
- Bound expressions could still produce multiple initialized_expressions, so they are not necessarily constant per evaluator instance.
Should dynamic expression state be stored flat or nested? Should external expressions be evaluated in a separate stage or should we consider this as a single DAG?
How separate should generators and operators be?
- Generators tend to care more about order (generate_min vs. generate_max vs. next_state) and can produce multiple results per update.
- All generators have purely dynamic state?

HighDiceRoller changed the title ~~Revisit ExpressionGenerator~~ MultisetExpression rework, again Sep 21, 2024

HighDiceRoller added a commit that referenced this issue Dec 25, 2024

start work on multiset expression rework #203

0327ef9

HighDiceRoller added a commit that referenced this issue Dec 25, 2024

more progress on expression rework #203

ee7c305

HighDiceRoller added a commit that referenced this issue Dec 25, 2024

convert over adjust_counts #203

49f752f

HighDiceRoller added a commit that referenced this issue Dec 26, 2024

migrate remaining MultisetExpression methods #203

bf49e74

HighDiceRoller added a commit that referenced this issue Dec 26, 2024

remove old expression directory #203

b810a60

HighDiceRoller added a commit that referenced this issue Dec 26, 2024

fix various bugs in new MultisetExpression #203

a7ac9dd

HighDiceRoller added a commit that referenced this issue Dec 27, 2024

add MultisetExpression._apply_variables() and finish re-implementat…

090a52d

…ion of `@multiset_function` #203

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`MultisetExpression` rework, again #203

`MultisetExpression` rework, again #203

HighDiceRoller commented Sep 14, 2024 •

edited

Loading

HighDiceRoller commented Sep 21, 2024 •

edited

Loading

HighDiceRoller commented Dec 25, 2024

HighDiceRoller commented Dec 25, 2024 •

edited

Loading

HighDiceRoller commented Dec 26, 2024 •

edited

Loading

HighDiceRoller commented Dec 29, 2024 •

edited

Loading

MultisetExpression rework, again #203

MultisetExpression rework, again #203

Comments

HighDiceRoller commented Sep 14, 2024 • edited Loading

HighDiceRoller commented Sep 21, 2024 • edited Loading

HighDiceRoller commented Dec 25, 2024

HighDiceRoller commented Dec 25, 2024 • edited Loading

HighDiceRoller commented Dec 26, 2024 • edited Loading

HighDiceRoller commented Dec 29, 2024 • edited Loading

`MultisetExpression` rework, again #203

`MultisetExpression` rework, again #203

HighDiceRoller commented Sep 14, 2024 •

edited

Loading

HighDiceRoller commented Sep 21, 2024 •

edited

Loading

HighDiceRoller commented Dec 25, 2024 •

edited

Loading

HighDiceRoller commented Dec 26, 2024 •

edited

Loading

HighDiceRoller commented Dec 29, 2024 •

edited

Loading