-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Errors in calling a DAP within a DAP #57
Comments
Hi Shahamit, I could not find whether the APP.PARTITIONED_AIRLINE_TEST is a row table or a column table. For row table it should work and we will look into this for problems. But if the table is column type then problems are expected as we have not tested the DAP with SnappyData product. In fact we are thinking of removing the support of DAP completely. ( Still under deliberation though ) as SnappyData will have the capabilities of running both submitted jobs and any type of queries supported by Spark SQL on SnappyData. We think that probably all the capabilities of DAP will become redundant. |
Thanks for the follow up.
Thanks! |
Please see answers inline.. I didn't specify the "USING row OPTIONS()" or "USING column OPTIONS" in the create table statement. What is the default table type?
How do you get the capability that DAP provides i.e. executing custom code on the node where the data resides with snappy spark support?
Further investigating on the DAP execution, I found inconsistencies in the number of times the procedure gets called on the server when executed with the "ON TABLE WHERE" clause. I could see the procedure getting called twice with the connection id as -2 and remote flag set to true. This lead to NPE. Executing the DAP without the "ON TABLE WHERE" clause works as expected.
This is related question - With a partitioned table how can we query rows only from a single partition present on a node? This is basically to verify and test the DAP execution.
|
Some more follow up questions
I followed the discussion on implementing DAP capabilities in snappy spark execution. I wonder how would the spark api determine where the data resides? Would it query gemfire sys tables to point it to the server where the row resides?
The table on which I call the DAP on (the command is "CALL myProc ON TABLE PARTITIONED_TABLE_TEST WHERE COL1 = 'x' ") is a partitioned table without any replication. I assume that would not lead to having multiple copies of data on different nodes. Connecting to leader jmx port I could see that the table data (rows) are distributed across the cluster and total matches with the total number of rows in the table. My create table statement is as below for reference
The bug or an inconsistency which I am seeing right now with the DAP execution is that the DAP is getting executed sometimes multiple times on a node and subsequent calls of the nested call (kindly have a look at the code I have shared in my original post) fail because they are getting executed on an invalid connection object.
Would that be a right test? I want to basically see if by changing the WHERE clause in the CALL myproc statement actually goes and executes the code on different nodes in the cluster. Is there no way to know (from system tables) what rows are present on what partition? Thanks. |
I followed the discussion on implementing DAP capabilities in snappy spark execution. I wonder how would the spark api determine where the data resides? Would it query gemfire sys tables to point it to the server where the row resides?
The table on which I call the DAP on (the command is "CALL myProc ON TABLE PARTITIONED_TABLE_TEST WHERE COL1 = 'x' ") is a partitioned table without any replication. I assume that would not lead to having multiple copies of data on different nodes. Connecting to leader jmx port I could see that the table data (rows) are distributed across the cluster and total matches with the total number of rows in the table.
I looked at your code and the src code as well and looks like for nested calls there is a bug. Will file a jira and take this up. BTW can you please try a workaround. Instead of using the same connection for the recursive call, can you use another pool of connections, or a dedicated connection in thread local and try that. Would that be a right test? I want to basically see if by changing the WHERE clause in the CALL myproc statement actually goes and executes the code on different nodes in the cluster. Is there no way to know (from system tables) what rows are present on what partition?
|
I will elaborate on the use case we are trying to solve with DAP and you can suggest me if it can be done through snappy's spark api. We are trying to implement cell level dependencies within multiple tables. It's similar to the functionality excel offers just that in our case it could span across multiple tables. Imagine multiple cells in an excel worksheet having interdependent formula's where updating a value in one cell causes another cell value to change and that cell update causes another cell value to update and so on and so forth. It is kind of graph of dependencies that determines what is the next row cell that needs to be updated. Kindly let me know if you have follow up questions.
Thinking more about this test, I wonder how would transactions work when we create new connections for nested DAP calls? Can you please elaborate on it?
I will let you know if this helps. |
I will elaborate on the use case we are trying to solve with DAP and you can suggest me if it can be done through snappy's spark api. We are trying to implement cell level dependencies within multiple tables. It's similar to the functionality excel offers just that in our case it could span across multiple tables. Imagine multiple cells in an excel worksheet having interdependent formula's where updating a value in one cell causes another cell value to change and that cell update causes another cell value to update and so on and so forth. It is kind of graph of dependencies that determines what is the next row cell that needs to be updated. Kindly let me know if you have follow up questions. |
I was getting myself acquainted with the spark and snappy api's that you mention above. Following is the code flow I tried to visualize -
Below are few queries that I have
Let me know your thoughts on the approach and my queries. Thanks! |
With snappy 0.2.1 I am trying to call a data aware procedure (DAP) inside a DAP. There are couple of issues with the nested call
The DAP code is shared here.
I would appreciate for any inputs on this. Thanks!
The text was updated successfully, but these errors were encountered: