-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move from crossbar/autobahn to gRPC #1469
Conversation
Crossbar and autobahn are used to communicate between client and coordinator as well as between exporter and coordinator. Both are unfortunately not very well maintained anymore. The crossbar component was moved to its own virtualenv for quite a while to cope with conflicting dependencies. python3.13 support for crossbar is still not available (at least not in a version releases on PyPi). That's why labgrid will now move to gRPC. It's a well maintained RPC framework. As a side effect, the message transfer is more performant and the import times are shorter. gRPC relies on the protocol buffers compiler (protoc) to generate code. Use grpcio-tools 1.62.2 to do that. It's used in Yocto scarthgap and is incompatible with generated code from newer grpcio-tools. Co-developed-by: Rouven Czerwinski <[email protected]> Co-developed-by: Bastian Krause <[email protected]> Signed-off-by: Jan Luebbe <[email protected]>
The coordinator binds to port 20408 by default. The client tries to connect to that port if no port is specified via --cordinator/-x, LG_COORDINATOR or via coordinator_address option in the environment config. Only the exporter does not yet default to 20408. Change that by appending the default port 20408 to the address given via --coordinator/-c or LG_COORDINATOR if no port was specified. Closes #1429 Signed-off-by: Bastian Krause <[email protected]>
If one of these fails, the others should not be skipped. Signed-off-by: Jan Luebbe <[email protected]>
_update_acquired_places() is only called when resources are added by the exporter or when they are removed after it has disconnected. So we never want to call back to the exporter for removed resources. Signed-off-by: Jan Luebbe <[email protected]>
In future commits, more code will call the newly introduced _acquire_resource() method. Signed-off-by: Jan Luebbe <[email protected]>
In cases when an exporter used by an acquired place disconnects, the place still references those orphaned resources. If the exporter reconnects, they are reaquired by the coordinator automatically. Signed-off-by: Jan Luebbe <[email protected]>
If an exporter disconnects, corresponding resources that are acquired become orphaned. If the exporter reconnects, the resources are not locked. Requiring the user to unlock and lock the corresponding place again is inconvenient and would change the previous behavior. Prior to this commit, reacquiring did not work due to a logic error: ExporterCommand.complete() und ExporterCommand.wait() are both called in ExporterStream.request_task(). The blocking wait() prevents further processing of exporter messages. That also means responses for ExporterSetAcquiredRequest are not handled anymore. This leads to a state where resources cannot be acquired/released by their place anymore. Ultimatively, this leads to an inconsistent state requiring a coordinator restart. Refactor handling of these orphaned resources to solve this. Signed-off-by: Jan Luebbe <[email protected]>
It's not really clear how keepalive_timeout_ms and the ping_timeout_ms experiment should interact, so we set them both. Signed-off-by: Jan Luebbe <[email protected]>
We only want to run one instance of the coordinator, so enabling so_reuseport can lead to confusing situations where multiple coordinator are running in parallel. Signed-off-by: Jan Luebbe <[email protected]>
Instead of using asyncio.get_event_loop() in various places in the Coordinator class, store it in an attribute called loop. Signed-off-by: Bastian Krause <[email protected]>
Signed-off-by: Bastian Krause <[email protected]>
Signed-off-by: Bastian Krause <[email protected]>
These checks are already performed by get_idle_place(). Signed-off-by: Bastian Krause <[email protected]>
…n allow()/release_from() Signed-off-by: Bastian Krause <[email protected]>
Calling asyncio.get_event_loop() with no current event loop is deprecated since Python 3.10 and will be an error in some future Python release [1]. Whenever we don't expect to run with an event loop, create one explicitly. In coroutine and callbacks from asynchronous code, use the more explicit asyncio.get_running_loop() to get the loop. Note that this does not work in labgrid.resources.ethernetport.EthernetPortManager: This code is usually not called in coroutines and callbacks from asynchronous code, so asyncio.get_running_loop() does not work there. So stick to asyncio.get_event_loop() there and just expect to be called with a running event loop (which is the non-deprecated use case for this function). Users that do not have an event loop running will see a justified DeprecationWarning with Python >= 3.12 and an error in some future Python version. [1] https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.get_event_loop Signed-off-by: Bastian Krause <[email protected]>
Calling asyncio.get_event_loop() with no current event loop is deprecated since Python 3.10 and will be an error in some future Python release [1]. Using it in labgrid.remote.client.start_session() causes errors in IPython when using a RemotePlace: In [1]: from labgrid.resource.remote import RemotePlace ...: from labgrid import Target ...: ...: target = Target("example") ...: RemotePlace(target, name="example-place") [...] RuntimeError: There is no current event loop in thread 'MainThread'. For labgrid.remote.client.start_session() there is no reliable way of retrieving the thread's event loop without being called from an async context (which we cannot assume here). Instead of using asyncio.get_event_loop(), use a new helper function ensure_event_loop() that returns the first available loop instance from: - externally provided event loop - stashed event loop - OS thread's running event loop (when called from async code) - new event loop The returned loop is stashed for future calls. See also [2] for a similar approach. start_session() now accepts a new optional argument "loop" for providing an external event loop. [1] https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.get_event_loop [2] jupyter/jupyter_core#387 Signed-off-by: Bastian Krause <[email protected]>
The Environment was always optional. Before users of ClientSession had to pass an explicit None for this attribute. While we're changing the ClientSession for gRPC anyway, let's make env really optional. This again allows us to make start_session()'s extra argument optional, too. It is used to pass extra arguments to the ClientSession, which means it can be an empty dictionary now. Signed-off-by: Bastian Krause <[email protected]>
During the move to gRPC, start_session()'s arguments changed. Since this is one of the functions used from outside of labgrid, add typing hints and force the kwargs to be passed with names. This should make users aware of the changes, so their code can be adapted. Signed-off-by: Bastian Krause <[email protected]>
Calling asyncio.get_event_loop() with no current event loop is deprecated since Python 3.10 and will be an error in some future Python release [1]. SNMPEthernetPort expects a running event loop. So create and set one. [1] https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.get_event_loop Signed-off-by: Bastian Krause <[email protected]>
This allows stopping and starting the exporter during a test. More functionality will be moved into the class in future commits. Signed-off-by: Bastian Krause <[email protected]>
…idComponent Most functionality of this class is not exporter- or coordinator-specific, so move the common parts into a common class. Both Exporter (and a future) Coordinator class will inherhit it. Signed-off-by: Bastian Krause <[email protected]>
The same was previously implemented for the exporter. Since we need the same functionality also for the coordinator (along with suspend_tree()/resume_tree() functionality to be moved in a future commit), let's refactor it now. Signed-off-by: Bastian Krause <[email protected]>
Previously only tests/test_client.py could use the suspend_tree() and resume_tree() functionality. Since this is also useful for exporter + coordinator tests, move it into the generic LabgridComponent class. Signed-off-by: Bastian Krause <[email protected]>
Previously the exporter had blocking issues when the coordinator was not available. Add a test to prevent future regressions. Signed-off-by: Bastian Krause <[email protected]>
A previous commit added a test for exporter startup with an unreachable coordinator. Now also add a test simulating a dissappearing coordinator during operation. The exporter should notice the coordinator disappearing and should exit with exitcode 100. This way systemd can try restarting the exporter regularly until the coordinator is available again. Signed-off-by: Bastian Krause <[email protected]>
Signed-off-by: Rouven Czerwinski <[email protected]> Signed-off-by: Bastian Krause <[email protected]>
Signed-off-by: Bastian Krause <[email protected]>
Signed-off-by: Bastian Krause <[email protected]>
Signed-off-by: Bastian Krause <[email protected]>
Signed-off-by: Bastian Krause <[email protected]>
Signed-off-by: Bastian Krause <[email protected]>
Signed-off-by: Bastian Krause <[email protected]> Signed-off-by: Rouven Czerwinski <[email protected]>
Signed-off-by: Bastian Krause <[email protected]> Signed-off-by: Rouven Czerwinski <[email protected]>
pylint finds false positives for Python 3.8: labgrid/remote/coordinator.py:188:21: E1136: Value 'dict' is unsubscriptable (unsubscriptable-object) labgrid/remote/coordinator.py:194:24: E1136: Value 'dict' is unsubscriptable (unsubscriptable-object) labgrid/remote/coordinator.py:195:22: E1136: Value 'dict' is unsubscriptable (unsubscriptable-object) Since Python 3.8 reaches EOL in October 2024 anyways, let's just skip pylint for this version until then. Signed-off-by: Bastian Krause <[email protected]>
The latest crossbar release was not compatible with python3.12. Now that the crossbar dependency is gone, we can finally advertise python3.12 compatibility. Closes #1260 Signed-off-by: Bastian Krause <[email protected]>
Rebased on latest master (due to conflicts with #1470) . |
Signed-off-by: Bastian Krause <[email protected]>
With the autobahn/crossbar dependencies removed, numpy is no longer an indirect dependency of labgrid. Signed-off-by: Bastian Krause <[email protected]>
Signed-off-by: Bastian Krause <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although there are probably still some issues in this PR, I think it's better to fix those with smaller PRs on master instead of iterating here. Merging now would also make testing easier and hopefully lead to more feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested on my lab setup
Let's go. |
This marks a breaking change compared to the previous release version of labgrid, as this new development version now uses grpc instead of crossbar to communicate between the client, coordinator and exporters. This means all of these components need to be updated to the grpc-based versions at the same time. See labgrid-project/labgrid#1469 for more information about this change. Signed-off-by: Leonard Göhrs <[email protected]>
This marks a breaking change compared to the previous release version of labgrid, as this new development version now uses grpc instead of crossbar to communicate between the client, coordinator and exporters. This means all of these components need to be updated to the grpc-based versions at the same time. See labgrid-project/labgrid#1469 for more information about this change. Signed-off-by: Leonard Göhrs <[email protected]>
This marks a breaking change compared to the previous release version of labgrid, as this new development version now uses grpc instead of crossbar to communicate between the client, coordinator and exporters. This means all of these components need to be updated to the grpc-based versions at the same time. See labgrid-project/labgrid#1469 for more information about this change. Signed-off-by: Leonard Göhrs <[email protected]>
Description
As announced in #1417 and in the v24.0 release, labgrid will move from crossbar/autobahn to gRPC for communication between client/exporter and coordinator.
Crossbar/autobahn are unfortunately not very well maintained anymore. The crossbar component was moved to its own virtualenv to cope with the high number of dependencies leading to conflicts. Support for Python 3.13 is still not
available in a crossbar release on PyPI.
That's why labgrid moves to gRPC with this release. gRPC is a well maintained RPC framework with a lot of users. As a side effect, the message transfer is more performant and the import times are shorter.
Maintaining support for both crossbar/autobahn as well as gRPC in labgrid would be a lot of effort due to the different architectures of those frameworks. Therefore, a hard migration to gRPC is deemed the lesser issue.
Due to the migration, this PR introduces the following breaking changes:
crossbar_url
was renamed tocoordinator_address
. The environment variableLG_CROSSBAR
was renamed toLG_COORDINATOR
.crossbar_realm
is now obsolete as well as the environment variableLG_CROSSBAR_REALM
.labgrid-coordinator
(instead ofcrossbar start
). No additional configuration file is required.contrib/systemd/
were updated.Our plan is to merge this before the next release v24.1.
Checklist
Closes #1059
Closes #1260
Fixes #1429