Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Now message-box/bt class will restart thread if it was aborted because of an error. #101

Closed

Conversation

svetlyak40wt
Copy link
Contributor

No description provided.

@mdbergmann
Copy link
Owner

mdbergmann commented Jan 15, 2025

I'm wondering a bit. The condition handling in Actor-cell was meant to prevent that the message-box thread dies, on that level on the actor. This is intended behavior that nothing will happen to the message-box thread. An alternative behavior is a supervision, like Erlang and other actor frameworks do it. The thread handling you added in message-box goes in that direction. But honestly I'm not ready to change the behavior of this yet. Supervision is in the tickets as a TODO. But there is a lot more to it, so this needs thought through.

I'm also wondering, the serious-condition we're catching now, is error not cought by it? Anything that can potentially result going to the debugger or crash should be prevented by the handler-case in the message handler.

Saying all this, I was thinking, what is your exact use-case? Maybe those changes are all not needed? Effectively the message is handled by a pretty flat call-stack, there is effectively just the thread and I'm not sure if condition signalling will work as in non-event driven Common Lisp code because there is nowhere you could handle the signal except in the receive handler itself.

@svetlyak40wt
Copy link
Contributor Author

svetlyak40wt commented Jan 15, 2025

At least during debugging, when you insert a (break) point one of the restarts in the SLY/SLIME is ABORT which kills the thread. This way thread is dying and actor stops processing it's action. Currently I have to recreate the whole actor system to fix this issue.

The sad side of this problem is that thread dies silently - actor continues to accept messages, it's queue grow but these messages aren't processed.

@mdbergmann
Copy link
Owner

Yeah, so the thread shouldn't die. With the handler-case catching t does the thread still die when using (break)?

@svetlyak40wt
Copy link
Contributor Author

I'm worrying a little that my current solution could slowdown performance of the library because it is check if thread is alive on each message push.

I'll see if it possible to do such check if message handler unwinds stack unexpectedly.

@svetlyak40wt
Copy link
Contributor Author

Yeah, so the thread shouldn't die. With the handler-case catching t does the thread still die when using (break)?

I'll make a minimal test to cjeck it.

@mdbergmann
Copy link
Owner

Yeah, the message-box loop is performance critical.

From experience I can say that debugging (using the debugger) in a message driven context is difficult. Because you stop just one thread, but the rest is running normally and processing messages. Best is to use debug logging.

@svetlyak40wt
Copy link
Contributor Author

From experience I can say that debugging (using the debugger) in a message driven context is difficult. Because you stop just one thread, but the rest is running normally and processing messages. Best is to use debug logging.

I'd prefer to use a debugger, especially so powerful as in Common Lisp. It is a shame if we'll give up and fall back to "debugging by prints" :)))

@svetlyak40wt
Copy link
Contributor Author

Please, see my latest commit. I've added a test bt-box-resurrects-thread-after-abort-if-handler-catches-all-signals and turned off thread recreation for a while.

The problem is still there, because handler-case does not catches non-local exits made by INVOKE-RESTART.

@svetlyak40wt
Copy link
Contributor Author

By the way, do you have any performance benchmark?

I'd like to use it to see if there is some degradation because of my changes.

@svetlyak40wt
Copy link
Contributor Author

Another problem I've just found, is tests. They aren't running on CI because of this error:

debugger invoked on a SB-INT:SIMPLE-READER-PACKAGE-ERROR in thread
#<THREAD tid=2074 "main thread" RUNNING {1000BA8093}>:
  Package SENTI.TESTS does not exist.

    Stream: #<dynamic-extent STRING-INPUT-STREAM (unavailable) from "(unless ...">

Type HELP for debugger help, or (SB-EXT:EXIT) to exit from SBCL.

but the PR "check" is still green.

Probably this issue should be fixed too. I mean not tests running, but checking if they we run and if not, then failing the CI job.

@mdbergmann
Copy link
Owner

From experience I can say that debugging (using the debugger) in a message driven context is difficult. Because you stop just one thread, but the rest is running normally and processing messages. Best is to use debug logging.

I'd prefer to use a debugger, especially so powerful as in Common Lisp. It is a shame if we'll give up and fall back to "debugging by prints" :)))

Well, debuggers in IntelliJ or Eclipse in Java world are equally powerful, if not more powerful. And yet, when working with massive asynchronous environments debuggers are mostly not useful.

@mdbergmann
Copy link
Owner

Another problem I've just found, is tests. They aren't running on CI because of this error:

debugger invoked on a SB-INT:SIMPLE-READER-PACKAGE-ERROR in thread
#<THREAD tid=2074 "main thread" RUNNING {1000BA8093}>:
  Package SENTI.TESTS does not exist.

    Stream: #<dynamic-extent STRING-INPUT-STREAM (unavailable) from "(unless ...">

Type HELP for debugger help, or (SB-EXT:EXIT) to exit from SBCL.

but the PR "check" is still green.

Probably this issue should be fixed too. I mean not tests running, but checking if they we run and if not, then failing the CI job.

Yeah, definitely. No good if tests actually don't run.

@mdbergmann
Copy link
Owner

Please, see my latest commit. I've added a test bt-box-resurrects-thread-after-abort-if-handler-catches-all-signals and turned off thread recreation for a while.

The problem is still there, because handler-case does not catches non-local exits made by INVOKE-RESTART.

Maybe I don't understand. When do you do invoke-restart, from a handler-bind? In particular I don't understand the use-case in the message handling of the actor.

@svetlyak40wt
Copy link
Contributor Author

Maybe I don't understand. When do you do invoke-restart, from a handler-bind? In particular I don't understand the use-case in the message handling of the actor.

Imagine, that instead of HANDLER-BIND is a funcall to some function I gave to the actor to process his messages. In my current project actor is processing messages from Telegram bot API.

Now, there is a lot of code in this telegram message handling and I want to debug some place (in my handler, not in the Sento itself), and I put a (break) into my code to inspect my programm's environment such as telegram bot state and other variables on the stack.

Debugger in emacs shows me a bunch of restarts:

break
   [Condition of type SIMPLE-CONDITION]

Restarts:
 0: [CONTINUE] Return from BREAK.
 1: [RETRY] Retry SLY mREPL evaluation request.
 2: [ABORT] abort thread (#<THREAD tid=2354933 "sly-channel-1-mrepl-remote-1" RUNNING {1003CA0003}>)

If I hit 0, then my program continues processing the messages for the actor. But sometimes I hit A because has a habit to hit A after inspection of a trace of some error (not result of BREAK form). And when I hit A, debugger calls ABORT restart. It kills the thread and my program continues running while not processing messages for the actor :(

That is the case I'm wanting to fix.

(bt2:condition-wait withreply-cvar withreply-lock)))
(cond
(time-out
(bt2:with-lock-held (withreply-lock)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please try to remove the duplicated code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could move these three lines into a local function:

         (log:trace "~a: pushing item to queue: ~a" (name msgbox) push-item)
         (queue:pushq queue push-item)
         (ensure-thread-is-running msgbox)

is it ok for you?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Local function must still be redefined on runtime for each call of submit, and then a function call in itself must safe and restore registers and so on.
We're just talking about a handful of lines of code. I would say it was OK as before with the factored-out common code and just leave the differences. Effectively we just have the additional (ensure-thread-is-running msgbox) call, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've refactored this code and made these two branches outer for bt2:with-lock-held not because of (ensure-thread-is-running msgbox) it is a fix of another problem I've encounter trying to write a test.

The problem is that this call:

(submit box "The Message" t 1
                                     (list (lambda (msg)
                                             (reverse msg))))

when you specify both withreply-p = t and timeout != nil, then this submit call hangs. Why it hangs? Because of this old version of code in submit/reply:

(bt2:with-lock-held (withreply-lock)
      (log:trace "~a: pushing item to queue: ~a" (name msgbox) push-item)
      (queue:pushq queue push-item)

      (if time-out
          (wait-and-probe-for-msg-handler-result msgbox push-item)
          (bt2:condition-wait withreply-cvar withreply-lock)))

If time-out is specified, then withreply-lock is held during (wait-and-probe-for-msg-handler-result msgbox push-item). But process-queue-item function called in the box's thread, also tries to acquire withreply-lock (here). And it can't because the lock is already held by another thread. And because of this, box's thread can't execute handler and return a value which leads to a situation where (wait-and-probe-for-msg-handler-result) is running during given timeout seconds and then fails because didn't receive any results.

I'd call this situation an "almost dead-lock" :(

That is why I've made these two branches separate and for branch where time-out is given, we release lock before wait-and-probe-for-msg-handler-result call, whereas in branch where timeout is NIL, lock is released by call to bt2:condition-wait.

By the way, I've just found bt2:condition-wait has it's own timeout argument. Why you didn't use it instead of call to wait-and-probe-for-msg-handler-result? Probably this code could be simpler, if we handle timeout error from bt2:condition-wait.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, indeed there is a problem when timeout is big, then message is only processed after wait time is over.

I need to wrap my head around this code. I'm wondering right now why not just remove the lock altogether and why it is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lock is needed to use a condition variable. It's API requires lock to be used. Without condition you will have to use assert-cond with unlimited timeout.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, got it I think.
So I think this code is OK. We can keep that bit of duplication on each branch.

(in-suite message-box-tests)


(defun wait-while-thread-will-die (msgbox &key (timeout 10))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use one of the assert-cond or await-cond utils in miscutils package.

@mdbergmann
Copy link
Owner

By the way, do you have any performance benchmark?

I'd like to use it to see if there is some degradation because of my changes.

I used bench.lisp as benchmark.

runner-bt uses :pinned actor using the message-box/bt and is probably what you need.
You can run with and without reply (ask/ask-s).

@@ -179,6 +198,23 @@ This function sets the result as `handler-result' in `item'. The return of this
(bt2:condition-notify withreply-cvar)))
(handler-fun)))))


(declaim (ftype (function (message-box/bt)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of the declaim?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recently I've discovered that such type information sometimes helps to find issues during the compilation. So I started to add declarations to the code I'm touching in my own projects and also decided it would not harm to use here.

But if you wish, I'll remove it.

(bt2:with-lock-held (withreply-lock)
(log:trace "~a: pushing item to queue: ~a" (name msgbox) push-item)
(queue:pushq queue push-item)
(ensure-thread-is-running msgbox)
Copy link
Owner

@mdbergmann mdbergmann Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to add this to submit/no-reply. And then, probably it's better to put it to submit, to catch both ways of submitting.

In which case then, we indeed need the additional lock.

msgbox
(bt2:with-lock-held (thread-lock)
(unless (bt2:thread-alive-p queue-thread)
(log:trace "Restarting thread ~A"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to have this as a warn logging to know exactly when it happens. Should be an exception.

@mdbergmann
Copy link
Owner

By the way, do you have any performance benchmark?

I'd like to use it to see if there is some degradation because of my changes.

I used bench.lisp as benchmark.

runner-bt uses :pinned actor using the message-box/bt and is probably what you need.
You can run with and without reply (ask/ask-s).

By the way, do you have any performance benchmark?

I'd like to use it to see if there is some degradation because of my changes.

From what I measured it's roughly a difference of 4-5% using runner-bt.
But I think it's actually not that critical. I'm sure there could be made some performance improvements in that overall code.

@svetlyak40wt
Copy link
Contributor Author

From what I measured it's roughly a difference of 4-5% using runner-bt. But I think it's actually not that critical. I'm sure there could be made some performance improvements in that overall code.

Current benchmark code uses get-universal-time to measure time. It is not very precise, if you don't mind, I'll improve the benchmark code.

@mdbergmann
Copy link
Owner

From what I measured it's roughly a difference of 4-5% using runner-bt. But I think it's actually not that critical. I'm sure there could be made some performance improvements in that overall code.

Current benchmark code uses get-universal-time to measure time. It is not very precise, if you don't mind, I'll improve the benchmark code.

OK, but keep in mind I use this bench for various Lisp impls and on some INTERNAL-TIME-UNITS-PER-SECOND differs.

@mdbergmann
Copy link
Owner

That is the case I'm wanting to fix.

OK, thanks for explanation.
On the one hand I'd say: just make sure you press "0". :)
On the other hand it would be good to be more resilient against such things.

@mdbergmann
Copy link
Owner

From what I measured it's roughly a difference of 4-5% using runner-bt. But I think it's actually not that critical. I'm sure there could be made some performance improvements in that overall code.

Current benchmark code uses get-universal-time to measure time. It is not very precise, if you don't mind, I'll improve the benchmark code.

OK, but keep in mind I use this bench for various Lisp impls and on some INTERNAL-TIME-UNITS-PER-SECOND differs.

Having said this, I'm mostly looking at what time reports. The other ? get-universal-time` I just added to have a cross-check. So not if worth spending time on it.

@svetlyak40wt
Copy link
Contributor Author

Having said this, I'm mostly looking at what time reports. The other ? get-universal-time` I just added to have a cross-check. So not if worth spending time on it.

But this way it is hard to compare performance between runs or compare current performance with some historical value - the time reporting will depend on number of messages you sent.

What I want to change:

  1. Get rid of a limit on number of messages to send from each thread.
  2. Add a limit by time - you just set it like "I want to run the benchmark for 60 seconds".
  3. Add a times setting which will run benchmark N times and then report an average "messages per sec" + diviation.

This way it will be more convenient to see if there are some performance degradation because of my changes.

@svetlyak40wt
Copy link
Contributor Author

By the way, did you experience heap exhaustion proble when running a benchmark? If I run it for more than 15-20 seconds, SBCL dies because for some reason it's garbage collection does not delete all garbage.

I have limit 4G for the process, and if I run benchmark for 15 seconds, (room) shows there is about 2.5G of garbage. However (sb-ext:gc :full t) frees all this memory.

I'm wondering, why doesnt GC free more memory during the benchmark!?

@mdbergmann
Copy link
Owner

By the way, did you experience heap exhaustion proble when running a benchmark? If I run it for more than 15-20 seconds, SBCL dies because for some reason it's garbage collection does not delete all garbage.

I have limit 4G for the process, and if I run benchmark for 15 seconds, (room) shows there is about 2.5G of garbage. However (sb-ext:gc :full t) frees all this memory.

I'm wondering, why doesnt GC free more memory during the benchmark!?

I also have seen this. That's why there is conditionally reduced load for SBCL in bench.lisp.
Other impls have no such issues with GC, only SBCL.
I didn't try to get behind this.

@mdbergmann
Copy link
Owner

Having said this, I'm mostly looking at what time reports. The other ? get-universal-time` I just added to have a cross-check. So not if worth spending time on it.

But this way it is hard to compare performance between runs or compare current performance with some historical value - the time reporting will depend on number of messages you sent.

What I want to change:

1. Get rid of a limit on number of messages to send from each thread.

2. Add a limit by time - you just set it like "I want to run the benchmark for 60 seconds".

3. Add a `times` setting which will run benchmark N times and then report an average "messages per sec" + diviation.

This way it will be more convenient to see if there are some performance degradation because of my changes.

Yeah, ok. For me it was sufficient to calculate the msg/sec from the data time provided. Because we're not in Java where you have a warmup phase until the hotpath was optimized. CL is compiled directly, so the numbers you get are the numbers.
Though there is a bit of variation, like 100-200 milliseconds, maybe this is due to GC time, which differs slightly when running.

The most impact on time that I recognized is the queue implementation. I've experimented with a few. The current one in use is an implementation from the book "Programming Algorithms in Lisp" (see queue-locked.lisp). It requires a bit of more memory than the cons-queue used by lparallel, which is equally fast.
The fastest one I tried is the 'sb-concurrency:queue' it's based on CAS but for some reason it requires a lot of CPU time on idle (may be a bug), so I switched back to a lock based queue also for SBCL.

@svetlyak40wt
Copy link
Contributor Author

After some research, I developed a hypothesis as to why the GC does not clean up memory.

The point is that in the benchmark, N threads generate messages for a single actor. If the actor cannot process the messages fast enough, they accumulate in the queue. The test ends when all the messages in the queue are processed.

When there are many messages in the queue and the GC is triggered, it sees that there are references to these messages and cannot clean them up, so it moves these objects into an older generation. The longer the queue is being processed during the object generation phase, the more such objects end up in the older generations of the garbage collector.

When the test ends, there are no longer any references to the messages, but because the GC placed them in the older generations, it does not clean them up during regular runs, and they remain in memory. However, (gc :full t) does pick them up and clears them out.

How did I figure this out? I’d like to say: "Very easily!" but no :(

Initially, I decided to investigate the nature of the objects that remain in memory after the benchmark, and I wrote the following

(defun get-random-dynamic-object ()
  (let ((count 0))
    (sb-vm:map-allocated-objects (lambda (obj type size)
                                   (declare (ignore obj type size))
                                   (incf count))
                                 :dynamic)
    (let ((random-idx (random count))
          (found-obj nil)
          (current-idx 0))
      (sb-vm:map-allocated-objects (lambda (obj type size)
                                     (declare (ignore type size))
                                     (when (= current-idx random-idx)
                                       (setf found-obj
                                             (trivial-garbage:make-weak-pointer obj)))
                                     (incf current-idx))
                                   :dynamic)
      (values found-obj
              random-idx
              count))))

This function retrieves a random object from memory and returns a weak pointer to it. Why a weak pointer? To avoid creating an additional reference to the object.

It turned out that a significant portion of the objects are messages from the actor's queue:

#<weak pointer: (#<ACTOR path: /user/actor-365, cell: #<ACTOR actor-365, running: NIL, state: NIL, message-box: #<MESSAGE-BOX/BT mesgb-366, processed messages: 8000001, max-queue-size: 0, queue: #<QUEUE-UNBOUNDED {70050E0113}>>>>
                    NIL NIL)>

Next, I tried to figure out whether any references to these objects were being held. For this, SBCL has a function for searching roots:

CL-USER> (sb-ext:search-roots (loop repeat 3 collect (get-random-dynamic-object))
                              :print :verbose)
Path to "MODILETTERDA":
 6       70031144DF [   2] SB-IMPL::*ALL-PACKAGES*
 5       700518C41F [ 146] a (COMMON-LISP:SIMPLE-VECTOR 513)
 5       7005281513 [   8] #<PACKAGE "CL-UNICODE">
 5       70053D8533 [   2] a SB-IMPL::SYMBOL-TABLE
 5       70054FB6F7 [   1] a cons = (# . #)
 5       70056F16EF [  34] a (COMMON-LISP:SIMPLE-VECTOR 307)
 5       7005919B9F [   2] CL-UNICODE::*NAMES-TO-CODE-POINTS*
 5       7005B1CBC3 [   6] #<HASH-TABLE :TEST EQUALP :COUNT 33698 {7005B1CBC3}>
 5       700D21000F [38446] a (COMMON-LISP:SIMPLE-VECTOR 83559)
; No values

;; Let's check if such a key really exists in the dictionary:
CL-USER> (gethash "MODILETTERDA" CL-UNICODE::*NAMES-TO-CODE-POINTS*)
71199

This is just an example. But for the objects created as a result of the benchmark, search-roots did not return anything, which indicated that these objects were "hanging in the air," and the GC could have removed them.

Then I additionally tested my hypothesis that memory is not being freed because the queue is overloaded with too many objects. To do this, I modified the code that sends messages to actors so that every 10,000–20,000 messages, a (sleep 0.1) would occur. And this helped—the GC started cleaning up messages in a timely manner, and they stopped accumulating in the older generations of the GC.

But what was even more surprising was that slowing down message generation led to an increase in the actor's throughput. Without the sleep, it processed about 777,000 messages per second, but with the sleep, it managed to process 821,000.

This acceleration is likely due to the fact that with slower garbage generation, it does not accumulate in memory, and the GC spends less time collecting it.

Without sleep:

Times: 16000000
Evaluation took:
  20.572 seconds of real time
  58.627387 seconds of total run time (23.984544 user, 34.642843 system)
  [ Real times consist of 4.297 seconds GC time, and 16.275 seconds non-GC time. ]
  [ Run times consist of 3.778 seconds GC time, and 54.850 seconds non-GC time. ]
  284.98% CPU
  208 forms interpreted
  3,068,032,944 bytes consed

With delayed message generation:

Evaluation took:
  19.483 seconds of real time
  24.618729 seconds of total run time (17.348749 user, 7.269980 system)
  [ Real times consist of 0.044 seconds GC time, and 19.439 seconds non-GC time. ]
  [ Run times consist of 0.044 seconds GC time, and 24.575 seconds non-GC time. ]
  126.36% CPU
  208 forms interpreted
  3,065,746,080 bytes consed

From this data, it is clear that although the non-GC time increased by a few seconds, the GC time decreased by an order of magnitude.

Sometimes, slowing down leads to speeding up. That's how it is!

@svetlyak40wt
Copy link
Contributor Author

I think, the benchmark should be modified the way when it will not fill actors queue with thousands of messages. What do you think?

@mdbergmann
Copy link
Owner

mdbergmann commented Jan 18, 2025

I think, the benchmark should be modified the way when it will not fill actors queue with thousands of messages. What do you think?

Nice findings.
Yeah, that seems to be the issue, the queues are being filled too fast before they can be processed, in particular because there is a fight about acquiring the lock until queues are filled.
It seems though that GCs of other implementations can better handle that load.

I think in regards to robustness of the test itself adding a small delay will be fine. The test case is pretty artificial anyways but just to give a baseline for comparison and to get a rough idea about the throughput.

@svetlyak40wt
Copy link
Contributor Author

It seems though that GCs of other implementations can better handle that load.

No, I tried to run the benchmark under SBCL compiled with new parallel GC, and it has the same problem (while being 15% slower), and under LispWorks - the same issue here, memory is not released after the benchmark until gc is invoked manually in full mode.

@mdbergmann
Copy link
Owner

mdbergmann commented Jan 18, 2025 via email

@svetlyak40wt
Copy link
Contributor Author

@mdbergmann I've made a new version of the fix in a separate PR: #103

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants