refactor(api): Remove concurrency from `LegacyContextPlugin` #16098

SyntaxColoring · 2024-08-22T15:59:04Z

Overview

This removes a potential source of deadlocks and race conditions.

I originally thought this would help with EXEC-417. I'm not sure it actually will at this point, but maybe we should still do it.

Background

When the robot executes a Python protocol, the ProtocolEngine and higher-level HTTP stuff live in the main thread, and the user's Python script runs in a worker thread. For older (apiLevel≤2.13) protocols, LegacyContextPlugin transports progress reports from the worker thread to the main thread.

When we initially built all of this, we had severe performance problems where the main thread would get bogged down serializing a big HTTP response or something, and then the backpressure through LegacyContextPlugin would stall the protocol thread, causing visible jankiness in robot motion. We did a bunch of stuff to try to fix this. One fix was to insert an infinite non-blocking queue between the two threads to remove the backpressure.

I strongly suspect that that fix is unnecessary today. As evidence: It only ever applied to older, apiLevel≤2.13, protocols. Newer, apiLevel≥2.14 ones, have always run through a different pipeline, which still has backpressure. And the newer pipeline is apparently not perceived to have janky motion.

Removing the fix would remove concurrency, which would be a meaningful simplification. For instance, I've seen hints that this part of run cleanup is racy, because it depends on whether all the pickUpTip/dropTip commands got consumed from the queue by the time we reach there.

Test Plan and Hands on Testing

On an OT-2, run some protocols that are movement-intensive (e.g. lots of touch_tip()s) and have apiLevel ≤ 2.13. Click around in the app and induce some network requests. There might be some jankiness, but make sure it's not severe. Really, I think we just need to make sure it's not any worse than apiLevel ≥ 2.14.
- I'm noticing jank lasting ~0-1 seconds when you: navigate to the device details page (understandable); navigate through the device settings (weird); and navigate to the device list page (concerning). All of this applies equally to apiLevel≥2.14, so if we're concerned about this, retaining this non-blocking queue in LegacyContextPlugin is not the correct solution.

Changelog

Delete the non-blocking queue between the Python protocol thread . Replace it with simple inter-thread function calls.

Review requests

Is this a good idea?

Risk assessment

Medium. There is a risk that this is covering up enduring jankiness problems, and removing it will bring the problems back. This may not be obvious to us in the office because we're more focused on Flexes, which don't run apiLevel≤2.13 protocols.

But there's also risk to having the concurrency, so... 🤷

sfoster1

I think it's a nice simplification, and I'm okay with doing it. I will say that OT-2 usage is on the decline, and this can't happen on the Flex, so it's possible that maybe most OT-2 users are using the older protocol API level which is why we don't see the problem.

SyntaxColoring · 2024-08-28T19:35:51Z

Closing for now: this isn't immediately helpful for anything, so why risk it.

If that changes, we can reopen.

SyntaxColoring · 2024-09-18T15:25:09Z

Reopening after discussion with @sanni-t. #16171 noticed some concurrency problems. We want to improve the situation by:

Making sure higher-level code cleans things up that need to be cleaned up (fix(robot-server): matching RTPs check during analysis of non-RTP protocols #16171 and future work)
Removing unnecessary concurrency and unnecessary sensitivity to cleanup order (this PR)
Making sure server shutdown cleans things up in the correct order (a follow-up PR)

sanni-t

This definitely helps improve our background tasks. I'm all for it. 🙌
Once it's merged, we should do some thorough benchmarking tests to document the changes in performance.

SyntaxColoring · 2024-09-18T17:40:50Z

I ran a quick test with opentrons_simulate on my laptop and 10000 touch_tip() calls. I didn't find a performance difference with/without this PR.

requirements = {"robotType": "OT-2", "apiLevel": "2.13"}


def run(protocol):
    tip_rack = protocol.load_labware("opentrons_96_tiprack_300ul", 1)
    well_plate = protocol.load_labware("biorad_96_wellplate_200ul_pcr", 2)
    pipette = protocol.load_instrument(
        "p300_single_gen2", mount="right", tip_racks=[tip_rack]
    )
    pipette.pick_up_tip()
    for _ in range(10000):
        pipette.touch_tip(well_plate["H12"])
    pipette.return_tip()

…16098)" This reverts commit 6451bdb.

SyntaxColoring added 2 commits August 21, 2024 17:13

Remove concurrency from LegacyContextPlugin.

0c59270

Use run_coroutine_threadsafe().

45e14ca

SyntaxColoring requested review from TamarZanzouri, sanni-t, jbleon95, CaseyBatten and a team August 22, 2024 15:59

SyntaxColoring requested a review from a team as a code owner August 22, 2024 15:59

sfoster1 approved these changes Aug 23, 2024

View reviewed changes

SyntaxColoring closed this Aug 28, 2024

SyntaxColoring deleted the non_concurrent_action_dispatch branch August 28, 2024 19:39

SyntaxColoring restored the non_concurrent_action_dispatch branch September 18, 2024 14:47

SyntaxColoring reopened this Sep 18, 2024

Merge branch 'edge' into non_concurrent_action_dispatch

f15b987

sanni-t approved these changes Sep 18, 2024

View reviewed changes

SyntaxColoring merged commit 6451bdb into edge Sep 18, 2024
39 checks passed

SyntaxColoring deleted the non_concurrent_action_dispatch branch September 18, 2024 17:41

SyntaxColoring added a commit that referenced this pull request Sep 20, 2024

Revert "refactor(api): Remove concurrency from LegacyContextPlugin (#…

a129325

…16098)" This reverts commit 6451bdb.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(api): Remove concurrency from `LegacyContextPlugin` #16098

refactor(api): Remove concurrency from `LegacyContextPlugin` #16098

SyntaxColoring commented Aug 22, 2024 •

edited

Loading

sfoster1 left a comment

SyntaxColoring commented Aug 28, 2024

SyntaxColoring commented Sep 18, 2024 •

edited

Loading

sanni-t left a comment

SyntaxColoring commented Sep 18, 2024 •

edited

Loading

refactor(api): Remove concurrency from LegacyContextPlugin #16098

refactor(api): Remove concurrency from LegacyContextPlugin #16098

Conversation

SyntaxColoring commented Aug 22, 2024 • edited Loading

Overview

Background

Test Plan and Hands on Testing

Changelog

Review requests

Risk assessment

sfoster1 left a comment

Choose a reason for hiding this comment

SyntaxColoring commented Aug 28, 2024

SyntaxColoring commented Sep 18, 2024 • edited Loading

sanni-t left a comment

Choose a reason for hiding this comment

SyntaxColoring commented Sep 18, 2024 • edited Loading

refactor(api): Remove concurrency from `LegacyContextPlugin` #16098

refactor(api): Remove concurrency from `LegacyContextPlugin` #16098

SyntaxColoring commented Aug 22, 2024 •

edited

Loading

SyntaxColoring commented Sep 18, 2024 •

edited

Loading

SyntaxColoring commented Sep 18, 2024 •

edited

Loading