Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle Multithread Requests #410

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

TommasoPino
Copy link

In order to safely use the spice library in a multithread pool process, it is necessary to lock access to the spice resource.

This is tested with 10000 calls with 16 concurrent threads.

@TommasoPino TommasoPino changed the title definition of locking decorator Handle Multithread Requests May 5, 2021
Copy link
Owner

@AndrewAnnex AndrewAnnex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @TommasoPino, first of all thanks for your contribution, as Spice is explicitly single threaded this is not something I've thought about adding myself as of yet but it could make things kinder to users trying to use spiceypy in such a manner without impacting single-thread users. I posted some general review comments that would firstly need to be answered/addressed but I see a bigger issue in that tests are needed and necessary to demonstrate this working. That is, I would like to see one or two of the tests, like for b1900 and spkezr, duplicated and then modified to make multiple threads to illustrate how this can be used practically. I also think a documentation page/mini tutorial describing this / demoing it would be a great addition although I won't make that a requirement before merging.

A secondary issue is that I am a little out of practice for using/doing multithreading in Python, so there are some questions I have regarding the particular lock used, and if instead spiceypy should use the synchronised decorator available in the wrapt package (http://wrapt.readthedocs.io/en/latest/examples.html). I also wonder a bit about if external users are implementing their own thread locks and monkey patching the library and how this would effect them, maybe it make sense to optionally disable the decorator in the same manner I provide context managers for the foundflags/error raising decorators.

spiceypy/spiceypy.py Outdated Show resolved Hide resolved
spiceypy/spiceypy.py Outdated Show resolved Hide resolved
spiceypy/spiceypy.py Outdated Show resolved Hide resolved
@TommasoPino
Copy link
Author

First of all thanks for the fast replay and the suggestions. I will produce an example file to test the current version as you requested.
I prefer the usage of a native package to respect an external one in order to keep the dependency to the minimum.
But, I will study it too

Thanks for the suggestions. I will come back when the modification status will be a more mature state. Thanks for your time

@jdiazdelrio
Copy link
Contributor

jdiazdelrio commented May 6, 2021

Hi @TommasoPino and @AndrewAnnex. I'm curious about the outcome of this activity :)

The CSPICE library is not thread-safe in itself. For example, CSPICE would not work on a multi-threaded server where the CSPICE library is dynamically loaded and shared among all threads. Imagine a use case where two users are loading/unloading kernels in different threads started at exactly the same time (T), following this sequence:

  • User 1 on thread 1 at T+0s loads a kernel, an LSK;
  • User 2 on thread 2 at T+ 5s clears the kernel pool;
  • User 1 on thread 1 at T+20s calls str2et to convert an UTC string to ET.

If this is implemented using CSPICE library using the SPICE Standard error handling settings, thread 1 will produce an error at T+20s upon calling str2et since there will be no kernels loaded at the time user 1 is trying to convert from UTC to ET, and terminate the complete application (both threads and main).

@TommasoPino
Copy link
Author

Hi @jdiazdelrio, the usage I meant for this modification is not for a server-like environment but for guidances database generation that required for different geometries the invocation of the kernels. The guidances are independent of each other then a concurrent computation could be done. Spawn different processes that load a new instance of the spice library could be a solution but cost a lot in performances and retrieving the separated result is a pain. I found in multithreading a good comprimise.
Obviously, the example you pointed out required a completely different approach.

To answer the @AndrewAnnex's questions I prepared an example file that uses the current 4.0.0 version of and testes the functionalities for multithreading showing the issues in spkezr routine.

For b1900 the issue is not evident because it reads a variable without modifying it.

I found in this link an appropriate way to disable the decorator in order to allow some user that has a personal solution for multithreading calls to disable it in their environments (or enable if we decide to disable it by default).

@pep8speaks
Copy link

pep8speaks commented May 6, 2021

Hello @TommasoPino! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 236:1: E302 expected 2 blank lines, found 1
Line 203:74: W291 trailing whitespace
Line 201:80: W291 trailing whitespace
Line 189:84: W291 trailing whitespace
Line 187:81: W291 trailing whitespace
Line 184:1: E302 expected 2 blank lines, found 1
Line 99:1: W293 blank line contains whitespace
Line 83:1: E302 expected 2 blank lines, found 1

Line 5979:1: W293 blank line contains whitespace
Line 89:1: E302 expected 2 blank lines, found 1
Line 87:20: E712 comparison to True should be 'if cond is True:' or 'if cond:'
Line 86:21: E712 comparison to False should be 'if cond is False:' or 'if not cond:'
Line 81:1: E302 expected 2 blank lines, found 1
Line 72:29: E226 missing whitespace around arithmetic operator
Line 70:24: E226 missing whitespace around arithmetic operator
Line 65:1: E302 expected 2 blank lines, found 1
Line 60:93: E231 missing whitespace after ','
Line 58:32: E251 unexpected spaces around keyword / parameter equals
Line 58:30: E251 unexpected spaces around keyword / parameter equals
Line 57:29: E226 missing whitespace around arithmetic operator
Line 55:24: E226 missing whitespace around arithmetic operator
Line 45:1: E302 expected 2 blank lines, found 1

Comment last updated at 2021-11-27 10:44:23 UTC

Copy link
Owner

@AndrewAnnex AndrewAnnex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey thanks for continuing to work on this PR. I may not be entirely clear in my contribution documentation, but tests need to be integrated into the existing pytest test infrastructure contained in SpiceyPy (https://github.com/AndrewAnnex/SpiceyPy/blob/main/spiceypy/tests/test_wrapper.py or in a separate file as necessary in that module) rather than the way they are presented here. Sorry if that was not clear. I think that basically the first two (or 4 if you consider the lock/nonlock) tests as written can be added as 2/4 test functions in test_wrapper. You can run the tests yourself by running "pytest" within the top spiceypy directory on your machine, assuming you have installed all of the test-dependencies (pytest and pandas IIRC). As I also handle downloading test kernels for other functions, you should be able to copy/paste what you need from there to follow the existing style established by other tests (like clearing the kernel pool before/after each test). As is these tests are just "dead code" and would not be run by the CI infrastructure. If you are not able to spend more time on this I can probably make the changes needed for you.

spiceypy/__init__.py Outdated Show resolved Hide resolved
spiceypy/spiceypy.py Outdated Show resolved Hide resolved
spiceConcurrentCall_test.py Outdated Show resolved Hide resolved
spiceConcurrentCall_test.py Outdated Show resolved Hide resolved
spiceConcurrentCall_test.py Outdated Show resolved Hide resolved
spiceConcurrentCall_test.py Outdated Show resolved Hide resolved
spiceConcurrentCall_test.py Outdated Show resolved Hide resolved
spiceConcurrentCall_test.py Outdated Show resolved Hide resolved
spiceConcurrentCall_test.py Outdated Show resolved Hide resolved
spiceConcurrentCall_test.py Outdated Show resolved Hide resolved
@AndrewAnnex
Copy link
Owner

@jdiazdelrio Good points on what SPICE can and can't do. My understanding was that this PR wouldn't actually solve the problem of that exact scenario, but would make it safer to say have multiple threads reading from the kernel pool, same as your example, minus user 2 clearing the kernel pool. For multiple users, multiple processes reading and writing, they would need to maintain separate kernel pools as I understand it. However, maybe there is nothing unsafe with multithreading reads from the kernel pool (ie multiple threads calling spkezr, with a main thread maintaining the kernel pool), in which case that may negate the need of this PR.

do you have more thoughts on whether this would be useful?

@jessemapel
Copy link
Contributor

We've seen SPICE errors from concurrent kernel pool reads using bodvar_c, so I don't think it's safe to just lock kernel loading and unloading.

@TommasoPino

This comment has been minimized.

@jdiazdelrio
Copy link
Contributor

jdiazdelrio commented May 7, 2021

@AndrewAnnex, when having concurrency accessing the kernel pool, it's important to know the state of the kernel pool itself. There are some parameters and features of the kernel pool the will be shared among all threads and that may have an impact on status of the loaded data, the status of the kernel pool itself, and even the performance. As an example using spkezr, the SPICE SPK system (actually, the underlying DAF subsystem) does buffering in order to improve performance: as records are read from the DAF files, they are saved in an internal buffer maintained by the DAF subsystem. If any part of that record is needed in order to compute a state, the record is returned without accessing the file. If two concurrent threads "compete" for data in the same file but in different records of that file, the final outcome might be way slower than accessing the data sequentially. Note that due to the nature of the the SPK subsystem, two calls that may look independent, e.g. the state of Phobos w.r.t to Mars and the position of the Earth w.r.t. the Sun, may both require to know the location of the Solar System Barycenter (e.g. if aberration corrections are applied). Similar issues may happen when reading CK, binary PCK or DSKs. Another case is accessing DLA or DAF subsystems directly.

Kernel priority is important as well. Since the kernel pool is shared among all threads, any modification to the kernel pool (adding a new kernel, or manually adding new data using "put-pool" routines pcpool, pipool or pdpool) may lead to unexpected results for those threads that do not expect such modifications.

The error subsystem is something to look into as well. If I'm not mistaken, SpiceyPy uses the underlying CSPICE error subsystem in "RETURN" mode, which means that until reset is called, all non "Error free." calls to CSPICE will return immediately, with undefined results, independently of which thread is calling them.

Any call to dskstl will have an impact on all threads, no matter if its locked or not, as it changes the DSK tolerance for all subsequent calls to the DSK subsystem.

@AndrewAnnex
Copy link
Owner

AndrewAnnex commented May 7, 2021

@jdiazdelrio thanks for the details. I think, unless I am mistaken, that this addition would actually help address some, but not all, of the situations you describe, although it does not/cannot address the underlying issues with CSPICE, obviously. As I wrote the following, I think there is a bit of a confusion between multi-threading and multi-processing in the comments above (including mine). In short, this PR could help some situations relating to multi-threading but does not/cannot address the larger issue of side-effect-free/deterministic use of spice with multiple process/threads.

For an example, for the error subsystem, only a single thread at a time would be allowed to access spice, so if an error occurs in one thread, the call to reset would occur and safely return because of the blocks for the other threads. Same with the example of spkezr, as each thread would be blocked from calling spekzr until the lock is lifted. This should then in effect enforce the serial access aspect, as only one thread at a time would be able to interact with spice. It could be that a lock is not actually appropriate to ensure this, and that instead a Semaphore of 1 should be used instead (equivalent to a mutex iirc), but I will need to think about it more/get some examples coded.

I think the kernel priority/kernel pool modification bit remains unsolved, and can not be addressed by any small code contribution. If I write a variable to the pool from one thread, delete it in another, and expect to read it from a third that wouldn't work even with the locks. Currently however, spiceypy provides no guard rails of any kind, so perhaps this PR should continue forward but with a different "scope" to just address the 1-at-a-time aspect of this question.

For dskstl, yes I think that and other functions that influence the global state this would be the case, but solving that problem would require using multiple processes in some way to maintain independent spice libraries. That is a bigger/different problem/question than the multithreading question that this PR is trying to help.

Is that how you understand it?

@jdiazdelrio
Copy link
Contributor

@AndrewAnnex, your answer goes in line with my understanding.

But my feeling, in aspects like spkezr, is that a user solution based on multiple threads to retrieve state data will be slower than a single-threaded one, because of the likely need of underlying buffering. Note that some high-level SPICE APIs feed from similar data: SPK data is needed for some frame transformations if these need to have aberration corrections; SCLK data is needed for CK access; CK and PCK data might be needed to lookup states on body-fixed or spacecraft-based reference frames... this would mean that this solution, for SPICE intensive applications, might slow down the overall process.

Regarding kernel pool modification, users should think about modification in the sense of reassigning a variable to a new value, or extending/reducing the length of an array within the kernel pool. Therefore, this solution does not help with kernel-pool or CSPICE global state modifications, as you point out.

My guess is that this solution is a good safeguard for non-SPICE-intensive multi-thread applications, where the core of the application can be divided among multiple threads, and each of them can occasionally perform read operations on the SPICE system.

As I said, I'm really curious about the outcome of this activity. If it works it'd be a great contribution to SpiceyPy, even if it's only for some uses (in which case, I'd document very well what can and cannot be done when using SpiceyPy in a multi-threaded environment).

@jdiazdelrio
Copy link
Contributor

@TommasoPino, is there a way to prohibit certain functions, e.g. furnsh, to be called from within a multi-thread environment?

@TommasoPino
Copy link
Author

TommasoPino commented May 8, 2021

@TommasoPino, is there a way to prohibit certain functions, e.g. furnsh, to be called from within a multi-thread environment?

Hi @jdiazdelrio , the only way to prevent accessing furnsh or other function that modifies the kernel pool from child threads is to memorize the name of the father thread and giving access to some specific functions only to it.

@TommasoPino
Copy link
Author

Hi @AndrewAnnex , I added some modifications and the test case. I am not familiar with pytest, I am not sure about the test itself, please check it.
Hope the PR is now enough mature, but any other comments are well accepted.

@TommasoPino
Copy link
Author

Hi @AndrewAnnex, I committed the required modifications for testing and the corrections for CI. Could you please approve the running workflow? Thanks

fixing test import
@codecov
Copy link

codecov bot commented May 30, 2021

Codecov Report

Merging #410 (7aeee8d) into main (dca3b8a) will decrease coverage by 0.03%.
The diff coverage is 100.00%.

❗ Current head 7aeee8d differs from pull request most recent head 3654fad. Consider uploading reports for the commit 3654fad to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##             main     #410      +/-   ##
==========================================
- Coverage   99.88%   99.84%   -0.04%     
==========================================
  Files          12       12              
  Lines       15206    15835     +629     
==========================================
+ Hits        15188    15810     +622     
- Misses         18       25       +7     
Impacted Files Coverage Δ
spiceypy/spiceypy.py 99.60% <ø> (-0.09%) ⬇️
spiceypy/config.py 100.00% <100.00%> (ø)
spiceypy/tests/test_wrapper.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dca3b8a...3654fad. Read the comment docs.

@AndrewAnnex
Copy link
Owner

apologies @TommasoPino, I've been preoccupied. This PR has improved greatly since it's initial posting! However, there appears to be some test failures that need to be investigated. I also would like to see short/small tests that demonstrate the usage of the context managers to provide code coverage of those codes (see codecov CI report). See the other context manager tests to see an example of this.

I also think that this feature needs a dedicated documentation page as @jdiazdelrio suggests to clearly explain situations this can/cannot be used for and demonstrate usage of the context managers.

In any case, I plan to do a release of spiceypy soon to capture small changes since the last release, this PR won't be merged until after that release to keep this change isolated given the breadth of the changes.

@functools.wraps(f)
def lock(*args, **kwargs):
if config.enable_threading_lock:
with _spicelock:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using contextlib.suppress here instead could remove the if statement and the duplicated code below in the else clause.
I believe this would work:

ctx = _spicelock if config.enable_threading_lock else contextlib.suppress()
with ctx: 
    try:
    ....

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I am not familiar with context manager. It is a good point to increase the readability. Thanks

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to mention in the initial post to add a note reminder that once python 3.6 is deprecated the statement could switch to use 'nullcontext' instead which was introduced in python 3.7. Maybe this post is a sufficient reminder to myself...

This was referenced Jun 28, 2021
@AndrewAnnex
Copy link
Owner

AndrewAnnex commented Jul 12, 2021

@TommasoPino I think that the build failures were due to changes I made to the test code, and that if we re-run the tests they should all pass, although I can't seem to restart them from within GitHub's ui. If you push any commit (can be empty) it should trigger a rebuild.

@TommasoPino
Copy link
Author

@AndrewAnnex a new commit has been pushed. Waiting for the approval for the running workflow.
Thanks

spiceypy/spiceypy.py Outdated Show resolved Hide resolved
@functools.wraps(f)
def lock(*args, **kwargs):
if config.enable_threading_lock:
with _spicelock:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to mention in the initial post to add a note reminder that once python 3.6 is deprecated the statement could switch to use 'nullcontext' instead which was introduced in python 3.7. Maybe this post is a sufficient reminder to myself...

@@ -47,6 +47,22 @@ def setup_module(module):
download_kernels()


def test_threading_lock():
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to add additional tests for the enable/disable methods and no_threading_lock to improve the codcov report, preferably as additional def test_... functions, but they don't need to be to as complicated as this test (you could just use a simple spice function like b1900 to avoid loading kernels etc)

@AndrewAnnex
Copy link
Owner

@TommasoPino yay the tests pass, I left some additional comments that are all pretty simple additions/edits to improve the code coverage. I still think a short documentation section is needed, something like https://github.com/AndrewAnnex/SpiceyPy/blob/main/docs/exceptions.rst#not-found-errors as a new '.rst' file dedicated to this addition. A short explanation derived from the discussions about thread vs process safety and spice would also go a long way. I think you could just contribute a first pass version of this and I could improve it as needed in later commits.

@michaelaye
Copy link
Contributor

As an example using spkezr, the SPICE SPK system (actually, the underlying DAF subsystem) does buffering in order to improve performance: as records are read from the DAF files, they are saved in an internal buffer maintained by the DAF subsystem.

@jdiazdelrio This sounds like even using Python's multiprocessing that doesn't use threads, it's still impossible to isolate the kernel pool properly, because it still uses the same installed CSPICE library underneath, correct?
So, the only way to make this safe is to actually use multiple CSPICE installations, IIUC?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants