-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bulk Label Deletion Task #285
Comments
@nathanielrindlaub This is turning out to be more complicated than I imagined -- I'm wondering, if an image has an object with multiple validated labels, and we remove the 'first validated,' should the image remain locked (because there's still a validated label) or should it be unlocked? I think if it remains locked, this query is simplified. Assuming that's the case we could:
|
I think my instinct is to unlock the object and not assume that just because another label is validated it should automatically become the new most-accurate source-of-truth. I also think it's probably good to provide some indication to users that something has changed on this image an may need to be re-reviewed, which unlocking the object would do. I could be swayed otherwise though. I hadn't thought about the approach you're describing and I want to keep entertaining it, but I had been envisioning looking into iterating over an array of |
@nathanielrindlaub Gotcha. Thanks for explaining the use case for having that unlocked behavior.
Similarly, I didn't think of this 😄. I think we're both hitting at the same thing from different angles though (reduce the number of db calls by 'categorizing' operations in some way and then doing a Like you said, I was finding that the query gets complex really fast. I'm going to explore the approach you mentioned as I think that will be easier to read (from our perspective) and more performant than what we have now. My general approach will be:
|
That sounds great @lessej. If you don't mind doing some bench-marking on the current execution times that would be good to record too. |
@nathanielrindlaub Happy holidays! I did a little bit of work on this feature. The unlocking operation is the one I think we're going to have most trouble with because we aren't able to interact with the 'images.objects' collection directly (we can only use it in the context of an image). I would like to pick your brain about filtering operations that are available in MongoDB. |
Sub-task of #148.
Label deletion is particularly expensive because we need to read each affected Object and Image into memory and inspect the entire Objects' Labels array to determine how it needs to be updated once the Label we're trying to delete is removed. This makes it hard to optimize with a
bulkWrite()
operation, but maybe not impossible?The first step would be to try to see what efficiency gains we can make by restructuring the
deleteAnyLabels()
anddeleteAnyLabel()
methods. The second is likely to move the operation to thetask
Lambda so that we aren't limited by the 30 second timeout.Some open questions/to-dos:
bulkWrite()
? Would there be memory implications?Note: in addition to
deleteAnyLabels()
, we have adeleteLabels()
method that is only used when revertinglabelsAdded
from the frontend. This gets called when a user adds a bunch of Labels to a bunch of Objects using the multi-image selection menu, then uses ctrl-z to revert them.deleteLabels()
is less of a worry right now because it usesbulkWrite()
, so is probably pretty fast and can probably remain a synchronous operation, but I would like too know how many Images/Objects we can accommodate within the 30 second timeout is. Also, becauseDeleteLabelsInput
is an array of objects that contain animageId
,objectId
, andlabelId
, I imagine we also face the POST request payload-size bottleneck (the payload must be shorter than 262144 bytes). So it would be good to understand those limits and enforce them / handle them gracefully.The text was updated successfully, but these errors were encountered: