Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpcclient: handle RPC requests based on their priority #2027

Closed
wants to merge 9 commits into from

Conversation

yyforyongyu
Copy link
Collaborator

This PR introduces a new flag lowPriority on RPC requests to differentiate their priorities so more urgent RPC requests can be processed earlier. This is needed as bitcoind will block when too many RPCs requests are made at the same time, yet our recent addition of mempool notifier puts a lot of pressure on it because it's calling getrawtransaction for every new transaction, causing other RPC requests to halt. To solve it, we introduce a new method GetRawTransactionLazy, which will make the RPC calls more friendly and leave room for other more urgent RPC calls to go through.

Resulted from running `go get "golang.org/x/sync/errgroup"` and `go mod tify`.
@coveralls
Copy link

coveralls commented Aug 30, 2023

Pull Request Test Coverage Report for Build 6038665903

  • 0 of 152 (0.0%) changed or added relevant lines in 2 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage decreased (-0.1%) to 55.121%

Changes Missing Coverage Covered Lines Changed/Added Lines %
rpcclient/rawtransactions.go 0 19 0.0%
rpcclient/infrastructure.go 0 133 0.0%
Files with Coverage Reduction New Missed Lines %
rpcclient/rawtransactions.go 1 0.0%
Totals Coverage Status
Change from base Build 5957307420: -0.1%
Covered Lines: 26914
Relevant Lines: 48827

💛 - Coveralls

Copy link
Collaborator

@guggero guggero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea and optimiziations! Though I think some assumptions around reading from channels have changed and might introduce new blocking behavior.

@@ -892,7 +929,7 @@ out:
// Send any messages ready for send until the shutdown channel
// is closed.
select {
case jReq := <-c.highPriorityPostQueue:
case jReq := <-c.nextPostRequest():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now no longer considers the c.shutdown channel. Because we'll block on the method call c.nextPostRequest() and then only actually select once that returns. So I think we need to consider c.shutdown within nextPostRequest() and then we can simply return the next request (instead of a channel) or a boolean that indicates we're shutting down.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, fixed!

@@ -905,7 +942,7 @@ out:
cleanup:
for {
select {
case jReq := <-c.highPriorityPostQueue:
case jReq := <-c.nextPostRequest():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, this breaks assumptions and will block forever on shutdown.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is a bit different as we wanna drain the channel, so changed to read the channels directly.

// TODO(yy): Ideally we should pass the `-rpcthreads` used in
// `bitcoind` here to better increase the performance.
const numThreads = 4
eg.SetLimit(numThreads)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we ever need to wait for this error group? Or we're basically using it as a simple "thread pool"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it's more like limiting concurrent calls, but think we should also wait in the end so now I added it.

eg.SetLimit(numThreads)

// lowReqCounter counts the number of low priority requests.
lowReqCounter := 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use an explicit uint64 type here to avoid rolling over into negative? Wouldn't really matter too much as we're only using the modulus, but would give us some more explicit behavior.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

// requests to bitcoind. This is only used by low priority messages.
// For every 4 messages, we would sleep this duration before making
// more post requests.
rpcRequestInterval = time.Millisecond * 10
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we maybe make this configurable when instantiating the client?

Copy link
Collaborator Author

@yyforyongyu yyforyongyu Aug 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a new commit to achieve this

// associated server and returns a response channel on which the reply will be
// delivered at some point in the future. It handles both websocket and HTTP
// POST mode depending on the configuration of the client.
func (c *Client) SendCmdLazy(cmd interface{}) chan *Response {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low prio: To me, the word "lazy" in programming indicates "don't do it immediately, only when we actually need the result". Which kind of fits, but also not fully. Wasn't able to come up with a better suffix though, so I asked ChatGPT. Not sure if any of those fit better:

SendCmdLight - Using "Light" suggests a less intensive or lower priority operation.
SendCmdLazy - "Lazy" often implies that the action isn't urgently prioritized.
SendCmdIdle - "Idle" might indicate that the task is of low urgency.
SendCmdSlow - Directly indicates that it's a less priority-sensitive function.
SendCmdBack - As in "backburner", where it's not the immediate focus.
SendCmdMinor - Signifies lesser importance without being too long.
SendCmdRelax - Suggests that the function can "relax" and doesn't need to hurry.
SendCmdSoft - A softer version, possibly less critical.
SendCmdCasual - Implying the task is non-urgent.
SendCmdChill - A colloquial way to denote low urgency.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe SendCmdWithLowPriority? kinda long...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's a bit long. My favorite from the list is actually SendCmdSlow.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switched to slow

This commit adds a new message queue `lowPriorityPostQueue` and a new
flag on `jsonRequest` so we can split the requests based on priorities.
This commit changes how the messages are handled based on their
priorities. For low priority messages, they are processed sequentially
and the processing would sleep for 10ms for every four messages. For
high priority messages, they will be handled in goroutines so they will
be sent to `bitcoind` faster.
This commit adds a new method `GetRawTransactionLazy` to handle low
priority requests. To support it, the underlying `SendCmd` is refactored
to make use of the priority flag.
@guggero guggero self-requested a review August 31, 2023 15:23
@Roasbeef
Copy link
Member

Roasbeef commented Sep 1, 2023

What about taking a more generic approach here? The request type is the paramter, and we can have a generic method that'll handle the priotization.

I also think we should step back a bit before proceeding to make sure we're actually fixing something here:

  • For bitcoind, the client runs in POST mode.
  • In POST mode, a single goroutine handles all the incoming requests (it'll block an retry up to 10 times before giving up):
    // sendPostHandler handles all outgoing messages when the client is running
    // in HTTP POST mode. It uses a buffered channel to serialize output messages
    // while allowing the sender to continue running asynchronously. It must be run
    // as a goroutine.
    func (c *Client) sendPostHandler() {
    out:
    for {
    // Send any messages ready for send until the shutdown channel
    // is closed.
    select {
    case jReq := <-c.sendPostChan:
    c.handleSendPostMessage(jReq, c.shutdown)
    case <-c.shutdown:
    break out
    }
    }
    // Drain any wait channels before exiting so nothing is left waiting
    // around to send.
    cleanup:
    for {
    select {
    case jReq := <-c.sendPostChan:
    jReq.responseChan <- &Response{
    result: nil,
    err: ErrClientShutdown,
    }
    default:
    break cleanup
    }
    }
    c.wg.Done()
    log.Tracef("RPC client send handler done for %s", c.config.Host)
    }
  • Similarly, bitcoind also has a set of RPC threads that consume the requests, and block, eventually returning an error if they can't be processed.

In short, adding another channel to consume requests off of, doesn't help to actually address the issue of head of line blocking. If we're sending 10 getblock requests, and one of them blocks because of bitcoind hitting a queue limit, then the entire sequence will block. I think what we're after here is either utilizing JSON-RPC batching properly, or doing a rechitcture to make the POST mode as concurrent as the normal ws mode.

The ws mode has proper pipelining: a message request adds an entry to match the response, then sends out the request, not blocking on the response:

// Add the request to the internal tracking map so the response from the
// remote server can be properly detected and routed to the response
// channel. Then send the marshalled request via the websocket
// connection.
if err := c.addRequest(jReq); err != nil {
jReq.responseChan <- &Response{err: err}
return
}
log.Tracef("Sending command [%s] with id %d", jReq.method, jReq.id)
c.sendMessage(jReq.marshalledJSON)
}
. This can further be improved by having multiple worker goroutines read off the queue to handle the responses.

@yyforyongyu
Copy link
Collaborator Author

The initial idea was to implement a special case that can be used by our mempool poller as it consumes a lot of RPC resources. The original observation was, a lot of getrawtransaction requests were queued, causing the call to getblockchaininfo to be blocked by calling getinfo in lnd.

However, after digging into bitcoind's debug log, the cause of the slowness seemed to be on the bitcoind side - whenever a new block tip is connected, or a new peer is connected, the daemon would require a lock, blocking the RPC responses until the aforementioned tasks are finished, as shown in the logs,

2023-08-31T09:17:35Z ThreadRPCServer method=getrawtransaction user=yy
2023-08-31T09:17:35Z ThreadRPCServer method=getrawtransaction user=yy
2023-08-31T09:17:39Z New outbound peer connected: version: 70016, blocks=805566, peer=24 (outbound-full-relay)
2023-08-31T09:17:42Z New outbound peer connected: version: 70016, blocks=805566, peer=22 (outbound-full-relay)
2023-08-31T09:17:42Z ThreadRPCServer method=getrawtransaction user=yy
2023-08-31T09:17:42Z ThreadRPCServer method=getrawtransaction user=yy
2023-08-31T09:17:48Z ThreadRPCServer method=getrawtransaction user=yy
2023-08-31T09:05:35Z ThreadRPCServer method=getrawtransaction user=yy
2023-08-31T09:05:39Z UpdateTip: new best=00000000000000000003134988bcace49938ffe175379a14c66529580c202473 height=805559 version=0x20400000 log2_work=94.390304 tx=886852495 date='2023-08-31T08:29:35Z' progress=0.999993 cache=53.6MiB(319643txo)
2023-08-31T09:05:48Z ThreadRPCServer method=getrawtransaction user=yy

Thus this PR is no longer the right fix. If we are looking for async RPC calls, we should instead use rpcclient.BatchNew as suggested by @Roasbeef.

@yyforyongyu yyforyongyu closed this Sep 4, 2023
@yyforyongyu yyforyongyu deleted the priority-queue branch September 4, 2023 04:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants