Description
Problem
We're developing a system in which users submit inquiries (tickets, issues), for which the status is tracked (so very much like GitHub issues).
We store the following in the database:
- Inquiries
- Status changes for each inquiry
Now, the idea is to present a graph in the user interface which shows how many inquiries there were in system for each status at a specific time. Here's an example:
For this purpose, we built a table with a snapshot for each day (which represents the finest granularity that is required) and inquiry, which is exposed as cube inquiry_snapshots
.
day | inquiry | status |
---|---|---|
2025-01-01 | 1 | open |
2025-01-01 | 2 | open |
2025-01-02 | 1 | open |
2025-01-02 | 2 | resolved |
With the following query, the results are as expected:
{
"measures": [
"inquiry_snapshots.count"
],
"dimensions": [
"inquiry_snapshots.status"
],
"timeDimensions": [
{
"dimension": "inquiry_snapshots.day",
"dateRange": [
"2025-01-01",
"2025-01-31"
],
"granularity": "day"
}
]
}
Our problem is how to accomplish the same with higher granularities (week
, month
, etc.), as now the inquiry could've been in more than one status over the span of each time frame (as seen in the above example: inquiry 2), but only the last status (in each time frame) is relevant.
Related schema
cubes:
- name: inquiry_snapshots
sql_table: inquiry_snapshots
dimensions:
- name: day
sql: day
type: time
primary_key: true
- name: inquiry
sql: inquiry
type: string
primary_key: true
- name: status
sql: status
type: string
measures:
- name: count
sql: inquiry
type: count_distinct
Practical example
As we can see, in week 10 there were many status changes (to resolved
), which causes inflated numbers, because a single inquiry is counted more than once (i.e. once for each status it was in during the given week).
Did someone face a similar challenge and could point us in the right direction about how to model this? Any help would be much appreciated!