-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(api): Remove concurrency from LegacyContextPlugin
#16098
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a nice simplification, and I'm okay with doing it. I will say that OT-2 usage is on the decline, and this can't happen on the Flex, so it's possible that maybe most OT-2 users are using the older protocol API level which is why we don't see the problem.
Closing for now: this isn't immediately helpful for anything, so why risk it. If that changes, we can reopen. |
Reopening after discussion with @sanni-t. #16171 noticed some concurrency problems. We want to improve the situation by:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This definitely helps improve our background tasks. I'm all for it. 🙌
Once it's merged, we should do some thorough benchmarking tests to document the changes in performance.
I ran a quick test with requirements = {"robotType": "OT-2", "apiLevel": "2.13"}
def run(protocol):
tip_rack = protocol.load_labware("opentrons_96_tiprack_300ul", 1)
well_plate = protocol.load_labware("biorad_96_wellplate_200ul_pcr", 2)
pipette = protocol.load_instrument(
"p300_single_gen2", mount="right", tip_racks=[tip_rack]
)
pipette.pick_up_tip()
for _ in range(10000):
pipette.touch_tip(well_plate["H12"])
pipette.return_tip() |
Overview
This removes a potential source of deadlocks and race conditions.
I originally thought this would help with EXEC-417. I'm not sure it actually will at this point, but maybe we should still do it.
Background
When the robot executes a Python protocol, the
ProtocolEngine
and higher-level HTTP stuff live in the main thread, and the user's Python script runs in a worker thread. For older (apiLevel
≤2.13) protocols,LegacyContextPlugin
transports progress reports from the worker thread to the main thread.When we initially built all of this, we had severe performance problems where the main thread would get bogged down serializing a big HTTP response or something, and then the backpressure through
LegacyContextPlugin
would stall the protocol thread, causing visible jankiness in robot motion. We did a bunch of stuff to try to fix this. One fix was to insert an infinite non-blocking queue between the two threads to remove the backpressure.I strongly suspect that that fix is unnecessary today. As evidence: It only ever applied to older,
apiLevel
≤2.13, protocols. Newer,apiLevel
≥2.14 ones, have always run through a different pipeline, which still has backpressure. And the newer pipeline is apparently not perceived to have janky motion.Removing the fix would remove concurrency, which would be a meaningful simplification. For instance, I've seen hints that this part of run cleanup is racy, because it depends on whether all the
pickUpTip
/dropTip
commands got consumed from the queue by the time we reach there.Test Plan and Hands on Testing
touch_tip()
s) and haveapiLevel
≤ 2.13. Click around in the app and induce some network requests. There might be some jankiness, but make sure it's not severe. Really, I think we just need to make sure it's not any worse thanapiLevel
≥ 2.14.apiLevel≥2.14
, so if we're concerned about this, retaining this non-blocking queue inLegacyContextPlugin
is not the correct solution.Changelog
Review requests
Is this a good idea?
Risk assessment
Medium. There is a risk that this is covering up enduring jankiness problems, and removing it will bring the problems back. This may not be obvious to us in the office because we're more focused on Flexes, which don't run
apiLevel
≤2.13 protocols.But there's also risk to having the concurrency, so... 🤷