Skip to content

Fingerprint in bulk #56

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Conversation

martijnhoekstra
Copy link
Collaborator

Restructure manager so that it has three separate queues. The OG queue, a fingerprinting queue, and an uploading queue.

All three queues run independently. The OG queue parses the replays on background threads, and puts the results on the fingerprinting queue.

The fingerprinting queue checks the fingerprinting API in bulk, marks duplicates as such, and puts non-duplicates on the upload queue.

The upload queue uploads the file.

Tested to a limited amount, as I don't have an environment set up for a local API, and a limited amount of replays available.

Intended to fix #17

try {
var file = await processingQueue.TakeAsync();
await rateLimitParsing.Locked(() => {
_ = WorkerPool.RunBackground(async () => {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want to wait for this 'WorkerPool.RunBackground' task to finish here, otherwise I think you fire off the background thread and leave the lock immediately, meaning you will attempt to parse all replays at once instead of respecting 'MaxThreads'

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is messed up, the parse pool is limited by the background pool. This shouldn't be protected by the semaphore at all. Good catch!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay so it was working as it should, I just got stuck on the wrong bit

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so you are saying the 'while (await processingQueue.OutputAvailableAsync())' will always return true, it will just wait until something is available for processing

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will return true untill processingQueue.doneAdding() is called, which signals that there will be no more entries. That doesn't happen in this codebase, so so far, it's the equivalent of while (true) It will finish if something every does call that. It could be part of a graceful shutdown procedure for example.

_log.Error(ex, "Error in upload loop");
}
private async Task ParseLoop() {
using (var rateLimitParsing = new SemaphoreSlim(MaxThreads)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be wrapped in a 'while (true)'? I think this 'ParseLoop' thread may be ending the first time there are no more replays to parse, and not starting again when new games are played while the uploader is open

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not according to the contract of that enumerator, it should block on next until new stuff is thrown on the queue. It's a different datastructure than the one I implemented in this PR, and has no bulk dequeue (which I need for bulk fingerprint checking)

@MGatner
Copy link

MGatner commented Dec 4, 2019

This looks really good and the review is definitely knowledgable and way over my head :)

Do either of you have a means of testing? It would be great to be able to incorporate some tests into the repo itself, or at least docs on how to set that up. I do almost no desktop dev so sorry fi that's a silly question.

@barrett777
Copy link

Yes I forgot to say I just reviewed the code here, I haven't actually tested it, so someone else probably should test it locally

@MGatner
Copy link

MGatner commented Dec 4, 2019

I meant more like is there way to make a docker instance that can build this and then try uploading some sample replay files, something that we could pipeline.

@poma
Copy link
Member

poma commented Dec 4, 2019

All pull requests are supposed to be built by our CI, but it is broken after project transfer to a separate org. Testing could be done via uploading to a local hotsapi instance. I'll try to look into it more closely soon.

@barrett777
Copy link

@MGatner Yeah you definitely can set up test cases for desktop C# software

I'm a lazy, selfish developer, so I don't like to write test cases :) I like it when other people write them though

In this case, if my review comments actually are issues, I don't think they would have been caught in tests anyway, which is part of why I'm lazy

@martijnhoekstra
Copy link
Collaborator Author

Integration with the data services is pretty tight, which makes this hard to unit test. To make it testable, the backend needs to be factored out and injected: uploader has to be abstracted to an interface, passed as a mock service, and become a constructor parameter of manager.

It's a lot of work, and arguably worth it (and arguably not), but maybe not in the scope of this PR.

Copy link

@barrett777 barrett777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the detailed comments 👍

@martijnhoekstra
Copy link
Collaborator Author

re:testibility, I opened #57 to track that.

@MGatner
Copy link

MGatner commented Dec 5, 2019

This is looking great. I'm very ignorant of the architecture so I'm relying on you all to say when this is ready (unless @poma wants to review and decide).

@barrett777
Copy link

@MGatner I think it's ready!

@martijnhoekstra
Copy link
Collaborator Author

I didn't have a local API client, so my own testing has been rather limited - up until my uploads were done, after that I had dupes. A little more trying it out would be good.

@martijnhoekstra
Copy link
Collaborator Author

rebased on top of #62

@martijnhoekstra
Copy link
Collaborator Author

Looking into the validation failure

@martijnhoekstra
Copy link
Collaborator Author

Made sure that uploads are uploaded in order, by holding the concurrent uploads that younger, to make sure the older ones complete first.

Martijn Hoekstra added 6 commits January 3, 2020 11:46
@martijnhoekstra
Copy link
Collaborator Author

rebased on master, dropping the commits shared with #62

@poma
Copy link
Member

poma commented Jan 3, 2020

Hmmm why is it closed?

@martijnhoekstra
Copy link
Collaborator Author

I started it on a big batch and there were two issues.

  • Max concurrent uploads weren't respected
  • UI freezes during upload

Fixing the concurrent uploads issue also fixed the UI freezes (so I suspect that was a resource contention issue), so I'm re-opening.

@poma
Copy link
Member

poma commented Jan 3, 2020

We can just mark it as WIP in title while working on it, easier to track progress this way, nobody looks into closed PRs

@martijnhoekstra
Copy link
Collaborator Author

It should be good now though

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parallel processing
4 participants