Skip to content
This repository has been archived by the owner on Sep 2, 2024. It is now read-only.

Fix: LDK remove mutex #92

Merged
merged 6 commits into from
Mar 8, 2024
Merged

Fix: LDK remove mutex #92

merged 6 commits into from
Mar 8, 2024

Conversation

rolznz
Copy link
Collaborator

@rolznz rolznz commented Mar 6, 2024

#89 needs to be merged first

Tested by doing multiple successful multi_pay_invoice commands (each paying 10 invoices in parallel)

@rolznz rolznz marked this pull request as ready for review March 6, 2024 16:04
ldk.go Outdated
workdir string
node *ldk_node.LdkNode
ldkEventBroadcaster LDKEventBroadcaster
ctx context.Context
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: storing context in structs is discouraged. It's better to pass them to specific methods instead.

(I see it has been done this way before in other places in the codebase, so just FYI 😉)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, this wasn't even used. Thanks.

For the other ones I think it should be possible to remove them. We need to move the signal.NotifyContext out of service into the wails/http main functions and then it shouldn't be necessary to store it on the service. I'm not sure why I did it that way. I will create a separate issue for this.

eventListener := gs.subscribeLdkEvents()
defer gs.unsubscribeLdkEvents(eventListener)
ldkEventSubscription := gs.ldkEventBroadcaster.Subscribe()
defer gs.ldkEventBroadcaster.CancelSubscription(ldkEventSubscription)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking of it, I still feel a bit uneasy about this. Imagine this scenario:

  • We subscribe and defer cancelling the subscription;
  • When we initiate cancellation, we don't listen to ldkEventSubscription;
  • If the broadcaster happens to broadcast an event at this exact moment, it will block trying to send to ldkEventSubscription;
  • Since the dispatch loop is blocked, it will never get to read the removeListener channel;
  • Since that channel is never read, CancelSubscription() never unblocks;
  • Deadlock?

Not sure if this logic is correct though; WDYT?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rdmitr I changed the code to close the channel and then drain it as part of the CancelSubscription function. I think this should remove the chance of a deadlock. Then when sending events to the channel, I also recover in case an event is sent to a closed channel. What do you think?

CancelSubscription(<-chan *ldk_node.Event)
}

func NewLDKEventBroadcaster(logger *logrus.Logger, ctx context.Context, source <-chan *ldk_node.Event) LDKEventBroadcaster {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: the common idiom in Go is "return structs, accept interfaces". That is, unless you have a good reason to not expose the concrete type, it is recommended to return structs, not interfaces, from constructors. And the interfaces are generally implemented on the consumer side, not where the type is implemented :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why the concrete type should be exposed. The consumer only needs to access the public interface.

This is the same how we expose our LNClient to ensure the different implementations (LND, LDK, Greenlight, Breez) follow the same interface. Could you give a simple example of how we would re-structure this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a good discussion of the principle and why it is encouraged for Go: https://medium.com/@cep21/preemptive-interface-anti-pattern-in-go-54c18ac0668a

I'm not saying it's absolutely necessary to follow it — just felt I should mention it while I'm at it 😉

}

func (s *ldkEventBroadcastServer) CancelSubscription(channel chan *ldk_node.Event) {
close(channel)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this will not work either. After you close the channel, it doesn't make sense to drain it, as there's nothing to drain (and all reads will return nil, false anyways), and all attempts to write into it will panic.

This approach kind of goes against the intended usage scenario of channels in Go: it's the sender who drives communication and is responsible for closing channels, therefore the language is designed to panic on write into closed channel.

I suggest a different approach:

  • In CancelSubscription(), do not close the channel. Instead, make a read loop to drain just like in your code, but without the default clause (you don't need it, since it will likely break the loop before anything is read from it, which kind of defeats the purpose);
  • Remove the panic() handler in the broadcaster loop.

Additionally, you may want to update the broadcaster loop to discard events if they fail to be sent immediately (or within a small period of time).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rdmitr removing the cancel seems to still give a chance of deadlock because even if I read all the messages at that point, a new message could be received immediately after the loop

Copy link
Collaborator Author

@rolznz rolznz Mar 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is an example I tested which I think works quite well - you can see the broadcaster is unblocked if the channel is closed.

        testChannel := make(chan int)
	go func() {
		time.Sleep(5 * time.Second)
		close(testChannel)
	}()
	func() {
		defer func() {
			if r := recover(); r != nil {
				log.Printf("Failed to send event to listener: %v", r)
			}
		}()
		log.Printf("Waiting")
		testChannel <- 1
	}()
	log.Printf("Unblocked")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also added a timeout for sending events to the listeners.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will merge this now - it's not perfect but a definite improvement over what we have currently in master. Thanks for your review @rdmitr

@rolznz rolznz merged commit 028ec1a into master Mar 8, 2024
2 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants