Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allocation of Total_Warehouse_Period_DBUs to queries to be challenged when trying to allocate based on user usage #11

Open
enr0c opened this issue Nov 27, 2024 · 1 comment
Assignees

Comments

@enr0c
Copy link

enr0c commented Nov 27, 2024

First of all - Great work!!!

Iin my project we strive for allocating approximate usage allocation towards users. And i am looking into your solutions.

Challenge:

When using the dashboard and consuming 'Query Cost Allocation' and there the 'Query Signature Cost Allocatio Detail' the 'Allocated $' vary if one select different start and end times.

Example:

  • Query Signature = abcd, Allocated $ = $5, when selecting start time = day1, end time = day2
  • Query Signature = abcd, Allocated $ = $500, when selecting start time = day1, end time = day5

From my PoV this is not correct, as a Query shall not be dependent on the start/end time, but only dependent on the approximated Usage.

I think the issue arises from the query 'Query Statement Resource Attribution' and there from the section of filtered_warehouse_usage
In there the period of a warehouse gets summed up based on start and end time. Afterwards Top Ns TotalStatementTimeUsed gets divided by the TotalWarehouseTimeUsed and finally multiplied with dollars/dbus. In this equation TotalStatementTimeUsed is a absolute number, while TotalWarehouseTimeUsed is not fixed. Hence one sees different values when selecting different start/end time.

What i would propose:

Usage is based on hourly intervals in system.billing.usage, hence the queries (and their attribution in terms of usage) have to be sliced as well in hourly intervalls. And then the hourly based DBUs can be allocated to either number of queries run in the interval, or the execution time, or the IO ops, or a combination of all.
Once this preparatory work would be done, one can aggregate on user or query level and filter the dimension one would like in the Query Statement Cost Allocation Advisor.

Happy to discuss and learn:)

@CodyAustinDavis
Copy link
Owner

Hey there! We are actually thinking through this now and updating the logic. Id love to meet to discuss some options and ideas around this. Is there an email I can reach you at?

@CodyAustinDavis CodyAustinDavis self-assigned this Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants