Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model Checked - MPSC queue + Rework State Machines #128

Merged
merged 14 commits into from
May 9, 2020

Conversation

mratsim
Copy link
Owner

@mratsim mratsim commented May 9, 2020

Trying to fix:

The implementation is run through CDS checker https://github.com/computersforpeace/model-checker but it internally uses dlmalloc mspace to replace uer's malloc and somehow it runs out of space and trigger this assert: https://github.com/computersforpeace/model-checker/blob/5c4efe5cd8bdfe1e85138396109876a121ca61d1/mymemory.cc#L166-L171

Might need to complete my own at #127 or use Relaxy (which cannot properly deal with compiler relaxed reordering unfortunately).

@@ -133,8 +133,11 @@ proc tryRecv*[T](chan: var ChannelMpscUnboundedBatch[T], dst: var T): bool =
## This can fail spuriously on the last element if producer
## enqueues a new element while the consumer was dequeing it

let first = cast[T](chan.front.next.load(moRelaxed))
let first = cast[T](chan.front.next.load(moAcquire))
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This moRelaxed triggered the race detector every single time:

image

The rest of the changes were not checked

@mratsim
Copy link
Owner Author

mratsim commented May 9, 2020

@mratsim
Copy link
Owner Author

mratsim commented May 9, 2020

Tried to use the RCMC VM (why GPL code with no source published?) at http://plv.mpi-sws.org/rcmc/

But:

  • RCMC doesn't support malloc in threaded code, which makes it useless for testing queues (+ the source were removed ...)
  • CDSChecker exhibits the same memory bug as on my machine
  • Nidhugg doesn't implement C++11 memory model

So seems like even for C/C++ there is an lack of model checker to deal with C++11 concurrency bugs.

@mratsim
Copy link
Owner Author

mratsim commented May 9, 2020

Could there be a bug on OSX similar to https://linux-arm-kernel.infradead.narkive.com/jTmi6ANX/patch-arm-change-definition-of-cpu-relax-for-arm11mpcore

The cpu_relax() macro is often used in the body of busy-wait loops to ensure
that the variable being spun on is re-loaded for each iteration. On the
ARM11MPCore processor [where loads are prioritised over stores], spinning in
such a loop will prevent the write buffer from draining. If a write contained
in the write buffer indirectly affects the variable being spun on, there is a
potential for deadlock. This deadlock is experienced when executing the KGDB
testsuite on an SMP ARM11MPCore configuration.

This patch changes the definition of cpu_relax() to smp_mb() for ARMv6 cores,
forcing a flushing of the write buffer on SMP systems before the next load
takes place. If the Kernel is not compiled for SMP support, this will expand
to a barrier() as before.

Cc: Russell King - ARM Linux ***@arm.linux.org.uk
Acked-by: Catalin Marinas ***@arm.com
Signed-off-by: Will Deacon ***@arm.com

There is no obvious reason why this deadlock again, I don't use sleep:

template dummy_cpt(): untyped =
# Dummy computation
# Calculate fib(30) iteratively
var
fib = 0
f2 = 0
f1 = 1
for i in 2 .. 30:
fib = f1 + f2
f2 = f1
f1 = fib
proc sleepingLion(stop_ms: int64): int64 =
let start = getTime()
while true:
let elapsed = inMilliseconds(getTime() - start)
if elapsed >= stop_ms:
return elapsed
dummy_cpt()
proc main2() =
echo "Sanity check 3: isReady"
const target = 123
init(Weave)
echo "Spawning sleeping thread for ", target, " ms"
let start = getMonoTime()
let f = spawn sleepingLion(123)
while not f.isReady():
cpuRelax()
let stopReady = getMonoTime()
let res = sync(f)
let stopSync = getMonoTime()
exit(Weave)

So it might be that with cpuRelax() the main thread leaves priority forever and deadlock ...

And ARM deadlocked too:

@mratsim
Copy link
Owner Author

mratsim commented May 9, 2020

Found the bug, there is a race condition in the test

   let f = spawn sleepingLion(123) 
   while not f.isReady(): 
     cpuRelax() 

The spinlock was (intentionally :/) not dispatching incoming steal requests. But if between initialization of the threadpool and spawn the freshly created threads didn't have time to send steal requests, the spawn stays in the root thread queue and the spinlock doesn't loadBalance so we get a deadlock.

@mratsim mratsim changed the title [WIP] - Model Checked - MPSC queue Model Checked - MPSC queue + Rework State Machines May 9, 2020
@mratsim
Copy link
Owner Author

mratsim commented May 9, 2020

Still 2 bugs (maybe one):

Fixes for another time.

@mratsim mratsim closed this May 9, 2020
@mratsim mratsim reopened this May 9, 2020
@mratsim mratsim merged commit 46cf323 into master May 9, 2020
@mratsim mratsim deleted the model-checked-mpsc-queue branch May 17, 2020 23:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant