-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Projected result of a SELECT query is too big #233
Comments
Could you please provide a querying example? |
Query is as follows: _sc.riakTSTable(ts_table_name).sql(s"SELECT * FROM $ts_table_name WHERE time > 1382846337500 and time < 1383846337500 and source = '123' and instance = 'instance1'") table is as follow: "CREATE TABLE FloatStream " + The query spans 10 quanta |
Additional info: val df = sqlContext.read.option("spark.riak.input.split.count", "100").option("spark.riak.connection.host","myhost.media.com:8087").format("org.apache.spark.sql.riak").load(ts_table_name).select("time", "source", "instance").filter(s"time >= $from ) AND time $to AND source= 'mysource' AND instance='myinstance'") I received the following error: coveragePlanResult - CoveragePlanOperation returns at least one coverage entry: 'CoverageEntry [host=0.0.0.0, port=8087, fieldName=time, lowerBound=1382846337500, lowerBoundInclusive=true, upperBound=1382918400000, upperBoundInclusive=false, description=FloatStream / time >= 1382846337500 and time < 1382918400000]' -- with IP address '0.0.0.0'. |
Managed to get past that one as well. Apparently I did not have the partitioning field configured. I added option("spark.riak.partitioning.ts-range-field-name", "time") and now the dataframe is created successfully. However, when I try to create an additional dataframe using the existing spark context I get the following error: Caused by: java.util.concurrent.ExecutionException: com.basho.riak.client.core.netty.RiakResponseException: overload |
@yadid The reason of the error you've got for the first DF case is a default Riak TS configuration. You just need to re-configure Riak TS to use real IPs instead of providing 0.0.0.0 for the PB listener. Here is an example http://docs.basho.com/riak/kv/2.2.3/using/running-a-cluster/#select-an-ip-address-and-port |
We are using the spark-riak-connector to query a riakTS cluster and getting a com.basho.riak.client.core.netty.RiakResponseException: Projected result of a SELECT query is too big.
We thought that by increasing the number of executors we would avoid this error but this does not seem to be the case.
Is the query range divided into several smaller range queries and then distributed to each of the executors ?
We are trying to avoid having to segment the range into several smaller ranges within in our code.
The text was updated successfully, but these errors were encountered: