-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue #930 - Add new dimension to matchmaking search duration in prometheus #1010
Issue #930 - Add new dimension to matchmaking search duration in prometheus #1010
Conversation
@@ -19,8 +19,9 @@ | |||
|
|||
|
|||
class MatchmakerSearchTimer: | |||
def __init__(self, queue_name): | |||
self.queue_name = queue_name | |||
def __init__(self, queue_name: str, players_in_queue: int): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered changing the constructor to store a reference to the queue itself rather than store the number of players when the search starts, but I didn't want to hold open any potential connections.
Closing this PR due to discuss in #930 to move this data to the DB rather than prometheus. |
Prometheus seems the right place for analytical data with historical updating. The db should only contain the "current" state. |
I think this could use more discussion then as @BlackYps was saying that the DB would be the better location. |
Depends on what is supposed to be done with it. Look at a graph to see the development and derive decision from it? -> Prometheus |
My thoughts were to put it into Prometeus to build a model. Once we have that model, we can implement the model predictions into the server code to provide a predicted wait time. I'm not super great with statistics so I'm open to suggestions on tackling this a different way. |
My idea was to store the date in the db to be able to retrieve up-to-date information. Implementation idea: Store the deques in the db once a day and load them on server startup so we don't lose the info on a restart We could in a first step only collect the info and then get a db dump to analyze the data For estimates: data could be visualized in a rating - queue size plane with the wait time of data points coded as color |
I think we'd be better off going for the estimation option rather than trying to worry about accuracy. We know that even with perfect historical data, we are always going to be off to some degree with a prediction. I'd think the linear regression (or some other regression model) would be much faster to implement and have a decent accuracy. |
Sorry, but I don't understand what exactly you are comparing and arguing against? |
My bad. I think I misread your comment. I spent some time last night trying to figure out if we can perform this kind of analysis with Prometheus. I'm not super well versed with Prometheus, but I was having a really hard time trying to figure out a query that works. I think @BlackYps may be right here that we should have a dequeue history table in the DB. Here is a structure of the table I had in mind. Any additional columns we could use in this table? |
I took a stab at the database approach. I set up the database table as above. I wrote a stored procedure that takes the number of players in queue and the queue ID and returns the wait time in seconds using linear regression. I've limited it to using 1000 entries as the query is not super optimal. We can fiddle with this number as needed. The image below shows the simulated data in red (random points generated in red) and what the server calculated in blue. I can put up a draft PR for this to demonstrate how this works. My concerns with this approach would be load on the database as the query is not optimal. To mitigate this, we could set up a job to cache the results of this query for each queue and update it at a set interval. |
About the columns: |
Yes, I agree that we should cache the db query in some way to not constantly query the db |
These are good suggestions that I want to respond to one at a time to make sure I don't miss any thing.
I think this is spot on. I was thinking it may be helpful to know the identity of the player, but we really only care about rating of the player. I thought about factoring in average rating of the queue as well into this prediction.
I am thinking we probably don't need this information. Really, this is what the linear regression is doing for us. I think storing the event rather than meta information is better as we can still calculate the metadata from the raw events.
Yeah, total online players could play a factor, but I would think this could introduce a lot of error. For instance, if we have some event going on where one queue is seeing a lot more traffic than the others, the calculations may confuse larger online player count with shorter queue times. But in reality, this scenario would have the wait times for other queues be pretty much normal.
This one I disagree with. Wait time is but a difference of start time and end time. This is the same thing as above where we can calculate the metadata if we need it, but it would be better to store the raw data and calculate from there.
I think we can do a generalized estimation for queues (given any player, what will be their wait time?) and do a more nuanced estimation once the player joins the queue or something like that. |
Makes #930 possible by adding the queue population dimension to the search time in prometheus.