-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use a waitgroup to wait for reserve holders of MFiles before unmapping #876
base: main
Are you sure you want to change the base?
Conversation
13753ff
to
17e4d7a
Compare
17e4d7a
to
c4df472
Compare
if truncate_to_size && !@readonly && !@deleted && @fd > 0 | ||
code = LibC.ftruncate(@fd, @size) | ||
raise File::Error.from_errno("Error truncating file", file: @path) if code < 0 | ||
end | ||
|
||
# unmap if non has reserved the file, race condition prone? | ||
if @[email protected](:acquire).zero? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're in #close
I guess we can be sure here that counter won't be increased, only decreased? So no risk that counter is zero in this check but non-zero when unmap is called?
What about always spawning instead of this if-statement? It would be nice to not have to access an instance variable. Or maybe monkey patch WaitGroup with a done?
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"risk" that it is zero? the optimization here was for that fact that it will always be zero for single node clusters and for transient queues etc, and then we don't need to spawn a fiber.
could monkey patch though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean if it's zero when we do the check, but it has changed when we call unmap so we're unmapping even though it's reserved. However, I think we can assume that the counter will never be increased at this point since we're in #close
so the MFile
shouldn't be used anymore?
src/lavinmq/clustering/actions.cr
Outdated
@@ -14,6 +14,7 @@ module LavinMQ | |||
|
|||
abstract def lag_size : Int64 | |||
abstract def send(socket : IO, log = Log) : Int64 | |||
abstract def abort |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about naming it done
and call it from the action loop when an Action
has been sent, instead of assuming that send
will be called only once for an Action
and call unreserve
from inside send
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put done
in ensure
in the Action#send
method instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the assumption is that Action#send
is only called once
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of assuming that, which would be an internal implementation detail the caller may need to know about, I think it would make sense to do
action.send(lz4, Log)
action.done
in the action loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It gets ugly because you have to have a begin/recue block, and then you have two loops, so two blocks on top of that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True
Two things that we probably must remove
|
probably wont do much against race conditions, but at least in theory it could be better
I don't know, both call lavinmq/src/lavinmq/amqp/queue/message_store.cr Lines 231 to 237 in c8ca36a
|
Isn't it possible that a segment that isn't yet fully replicated is unmapped? If it's a large message I think the unmap may occur in the middle of the lz4 compression. Will these unmaps be necessary if we handle unmapping properly, which this PR hopefully solves? |
True, there's a chance for that, if the unmap is triggered just when we're rolling over to a new segment. And yes, shouldn't be needed anymore. |
also started to tinker with a ref counter as you did, but yes, it fast gets messy, as we return a class MFile
@counter = 0
@counter_lock = Mutex.new(:unchecked)
def borrow : self
@counter_lock.synchronize do
counter = @counter
if counter.zero?
mmap
end
@counter = counter + 1
end
self
end
def unborrow : Nil
@counter_lock.synchronize do
counter = @counter -= 1
if counter.zero?
unmap
end
end
end
end |
I'm starting to long for the io_uring implementation, then we could skip mmap:ings all together.. |
Yes! But I think this PR is working pretty well? I've done some runs without crashes. |
Can cause seg faults if a follower is still trying to replicate it.
So that the errors can be caught in specs. In lavinmq.cr is the exit done
Add a random delay to the stream queue GC loops, so that not all loops are executing at the same time.
WHAT is this pull request doing?
Please fill me in.
HOW can this pull request be tested?
Specs? Manual steps? Please fill me in.