You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The aggregate_all_ever relationship can cause exploding joins if self-joins an activity with high cardinality. For example, on a customer stream with a placed_order activity, a dataset with primary activity placed_order and appended activity aggregate_all_ever_placed_order would create such an issue.
Proposal
Add conditional logic in the dataset macro:
If the relationship is aggregate_all_ever, aggregate the appended activity directly from the stream without joining to the primary activity CTE (change in here)
If the relationship is aggregate_all_ever, alter the join in the final rejoin CTE to join on entity_id instead of activity_id (change in here)
The text was updated successfully, but these errors were encountered:
@tnightengale ran into this in my own version of the project, and the proposed fix worked great - a dataset with >30MM rows that wasn't materializing in 20+ minutes is now running successfully in 30 seconds. This issue definitely creates a scaling problem for users, so good to resolve quickly. I can take a stab at a fix, but let me know if you have any questions.
Sounds good! I'm on vacation the rest of this week but can tackle next week. If you want to get this fixed sooner, though, have at it! For reference, My approach in my own project was to add conditional logic based on the relationship of the appended activity. For everything but aggregate_all, use the existing join logic. Otherwise, don't join back to the primary CTE - just select the appended activity and aggregate on its own customer column. Then add the same conditional logic in the final join statement so that those CTEs join on customer instead of activity_id
Description
The
aggregate_all_ever
relationship can cause exploding joins if self-joins an activity with high cardinality. For example, on a customer stream with aplaced_order
activity, a dataset with primary activityplaced_order
and appended activityaggregate_all_ever_placed_order
would create such an issue.Proposal
Add conditional logic in the
dataset
macro:aggregate_all_ever
, aggregate the appended activity directly from the stream without joining to the primary activity CTE (change in here)aggregate_all_ever
, alter the join in the final rejoin CTE to join onentity_id
instead ofactivity_id
(change in here)The text was updated successfully, but these errors were encountered: