-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
All methods should have asynchronous versions. #57
Comments
Hi Ishan, |
Hi Dan,
Note the call only changes slightly. It requires an additional parameter of a callback which should implement a specific interface depending on the method call. Also note that once a method is called on a client object, it cannot be used till the method completes as it is backed by single SocketChannel. To overcome this limitation, I have also written some wrapping code around this to manage multiple client objects and choose a free one on a new call. An interesting property this leads to is that it is completely asynchronous and only an additional Selector thread runs throughout the execution of the program which is shared by all these clients for requests and responses. I am seeing a throughput of something like 4000 calls/sec using this approach. Also, Ill be happy to share this code. Currently, I am on a deadline but I would be interested later to integrate this async interface completely in Pelops. |
Hi Ishan, I'm really interested in an async mode; any idea when you might be able to share your code, even as a branch or fork? |
+1 for async support. I'm doing large-scale bulk loads and reporting, and async methods would save the need to spin up large numbers of threads simply to achieve acceptable performance. With async far fewer threads would be needed due to the use of callbacks. |
+1 for sharing :) |
Hi andrew, ben, |
I've forked Pelops and refactored it to use async calls internally, see: andrewswan@0bbdbff. If the committers are happy with what I've done so far, I can go the rest of the way and add a public async method for each existing synchronous method. I'd send a pull request, but I already have one outstanding and didn't want to upset the apple cart. |
@andrewswan it might be useful to outline the basic workflow of the async methods - off the top of my head I think we'd want something like:
Maybe @ishan and @benalexau (thanks for Spring Security btw) could provide feedback based on their experience and use cases? Also, @ishan made a comment above about the client object being backed by a single socket channel. We'll need understand the impact on the connection pooling as a result of this (you'd need a pool with a LOT of connections to get the most out of async with the current approach)... |
Dan, I work with @andrewswan and took a look over his commit before it was pushed; it looked good to me. I have an analytics system which has to handle (a) large numbers of bulk inserts every few minutes and (b) generation of very large report sets each day (over 100,000 report files are produced daily). The latter use case would especially benefit from async support because each report file requires several Cassandra queries to build the report file. At present I spin up lots of threads on the machine building the reports and run the queries, as most of the queries sit there blocking waiting for Cassandra to return the result. This requires complex thread pool tuning to balance (a) not overloading the JVM and (b) wanting enough threads so that Cassandra is as busy as possible to rapidly complete the millions of queries necessary to complete the report batch. With async support this approach would be avoided. I'd have a few threads running at a time and each would manage a single report. They'd simply present a callback that would be invoked when each query is complete. Tuning would be simpler because throttling is possible at the Polops level without needing to worry about blocking the thread that made the query; the calling code was explicitly programmed to use a callback as opposed to blocking, so it has completed as much other work as it can pending the query result becoming available. The result would be far fewer threads, elimination of thread pool tuning complexity, and the ability to add throttling at a lower lever without concern for the impact on the calling code. |
Ben, it sounds like a good challenging bit of work you have there. :) The suggested approach of presenting a callback that would be invoked when each query is complete sounds good. I notice that the BlockingCallback provided in the commit is well, blocking. I'm not sure I'd be keen for wholesale changes to the internals to switch everything to async, however I'd be more than happy with async methods that sit next to the sync methods. I get the impression that that's what you're thinking but I'd be very keen to get an example of the changes before putting my vote in (Selector.getColumnsFromRows would be a good choice). p.s. I revoke the comment about camel case instead of underscores - I just noticed they are defined by Cassnadra. |
I think the current pull request was aiming at a zero public API change and minimising the amount of code you guys would have to maintain in the future. BlockingCallback is intended to be used for the current blocking methods defined by Pelops, but for the async versions of those methods the caller would pass in a callback that would instead be used. I actually suggested this approach to Andrew to avoid breaking the existing API and adding lots more code to Pelops. But if you guys prefer both Client and AsyncClient be used, I'm guessing he could do that - just let him know. |
That's a fair comment and suggestion. My concern with this approach is the impact of the change on existing users (e.g. my company), going from approach that is proven to a new approach seems risky and we don't really have the resources to really understand and mitigate the risk. However, if you guys do have the resources to verify the change then great! Are any of your use cases user initiated and interactive (that would help ease my concerns)...? |
Hi Dan, we have this under load already and I can easily test any changes. Can you please email me balex at vmware dot com and I can discuss with you in more detail? |
Thrift supports generating asynchronous java clients in version 0.6, but pelops does not expose any asynchronous methods for its operations.
The text was updated successfully, but these errors were encountered: