-
Notifications
You must be signed in to change notification settings - Fork 10
Fingerprint in bulk #56
base: master
Are you sure you want to change the base?
Conversation
Hotsapi.Uploader.Common/Manager.cs
Outdated
try { | ||
var file = await processingQueue.TakeAsync(); | ||
await rateLimitParsing.Locked(() => { | ||
_ = WorkerPool.RunBackground(async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you want to wait for this 'WorkerPool.RunBackground' task to finish here, otherwise I think you fire off the background thread and leave the lock immediately, meaning you will attempt to parse all replays at once instead of respecting 'MaxThreads'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is messed up, the parse pool is limited by the background pool. This shouldn't be protected by the semaphore at all. Good catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah okay so it was working as it should, I just got stuck on the wrong bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so you are saying the 'while (await processingQueue.OutputAvailableAsync())' will always return true, it will just wait until something is available for processing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will return true untill processingQueue.doneAdding()
is called, which signals that there will be no more entries. That doesn't happen in this codebase, so so far, it's the equivalent of while (true)
It will finish if something every does call that. It could be part of a graceful shutdown procedure for example.
Hotsapi.Uploader.Common/Manager.cs
Outdated
_log.Error(ex, "Error in upload loop"); | ||
} | ||
private async Task ParseLoop() { | ||
using (var rateLimitParsing = new SemaphoreSlim(MaxThreads)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be wrapped in a 'while (true)'? I think this 'ParseLoop' thread may be ending the first time there are no more replays to parse, and not starting again when new games are played while the uploader is open
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not according to the contract of that enumerator, it should block on next until new stuff is thrown on the queue. It's a different datastructure than the one I implemented in this PR, and has no bulk dequeue (which I need for bulk fingerprint checking)
This looks really good and the review is definitely knowledgable and way over my head :) Do either of you have a means of testing? It would be great to be able to incorporate some tests into the repo itself, or at least docs on how to set that up. I do almost no desktop dev so sorry fi that's a silly question. |
Yes I forgot to say I just reviewed the code here, I haven't actually tested it, so someone else probably should test it locally |
I meant more like is there way to make a docker instance that can build this and then try uploading some sample replay files, something that we could pipeline. |
All pull requests are supposed to be built by our CI, but it is broken after project transfer to a separate org. Testing could be done via uploading to a local hotsapi instance. I'll try to look into it more closely soon. |
@MGatner Yeah you definitely can set up test cases for desktop C# software I'm a lazy, selfish developer, so I don't like to write test cases :) I like it when other people write them though In this case, if my review comments actually are issues, I don't think they would have been caught in tests anyway, which is part of why I'm lazy |
Integration with the data services is pretty tight, which makes this hard to unit test. To make it testable, the backend needs to be factored out and injected: uploader has to be abstracted to an interface, passed as a mock service, and become a constructor parameter of manager. It's a lot of work, and arguably worth it (and arguably not), but maybe not in the scope of this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate the detailed comments 👍
re:testibility, I opened #57 to track that. |
This is looking great. I'm very ignorant of the architecture so I'm relying on you all to say when this is ready (unless @poma wants to review and decide). |
@MGatner I think it's ready! |
I didn't have a local API client, so my own testing has been rather limited - up until my uploads were done, after that I had dupes. A little more trying it out would be good. |
rebased on top of #62 |
Looking into the validation failure |
Made sure that uploads are uploaded in order, by holding the concurrent uploads that younger, to make sure the older ones complete first. |
three queues, one for parsing, one for (bulk) duplicate checks, one for uploads, each running at its own pace
rebased on master, dropping the commits shared with #62 |
Hmmm why is it closed? |
I started it on a big batch and there were two issues.
Fixing the concurrent uploads issue also fixed the UI freezes (so I suspect that was a resource contention issue), so I'm re-opening. |
We can just mark it as WIP in title while working on it, easier to track progress this way, nobody looks into closed PRs |
It should be good now though |
Restructure manager so that it has three separate queues. The OG queue, a fingerprinting queue, and an uploading queue.
All three queues run independently. The OG queue parses the replays on background threads, and puts the results on the fingerprinting queue.
The fingerprinting queue checks the fingerprinting API in bulk, marks duplicates as such, and puts non-duplicates on the upload queue.
The upload queue uploads the file.
Tested to a limited amount, as I don't have an environment set up for a local API, and a limited amount of replays available.
Intended to fix #17