Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting java.base/java.util.ArrayDeque.removeFirst(ArrayDeque.java:363) exception intermittently #104

Open
shraddha-ca opened this issue Nov 14, 2023 · 10 comments

Comments

@shraddha-ca
Copy link

shraddha-ca commented Nov 14, 2023

Driver version

2.1.0.26

Redshift version

PostgreSQL 8.0.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.2 20041017 (Red Hat 3.4.2-6.fc3), Redshift 1.0.68555

Client Operating System

linux/amd64

Table schema

Problem description

We use a Kafka Connect Docker Image deployed over ECS with a JDBC sink connector v10.6.0 and the Redshift JDBC driver to transfer data from MSK into a Redshift Serverless instance.
This intermittently fails with the following stack trace:

Caused by: java.util.NoSuchElementException
	at java.base/java.util.ArrayDeque.removeFirst(ArrayDeque.java:363)
	at com.amazon.redshift.core.v3.QueryExecutorImpl.processResultsOnThread(QueryExecutorImpl.java:2368)
	at com.amazon.redshift.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1881)
	at com.amazon.redshift.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1873)
	at com.amazon.redshift.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:595)
	at com.amazon.redshift.jdbc.RedshiftStatementImpl.internalExecuteBatch(RedshiftStatementImpl.java:988)
	at com.amazon.redshift.jdbc.RedshiftStatementImpl.executeBatch(RedshiftStatementImpl.java:1016)
	at com.amazon.redshift.jdbc.RedshiftPreparedStatement.executeBatch(RedshiftPreparedStatement.java:1765)
	at io.confluent.connect.jdbc.sink.BufferedRecords.executeUpdates(BufferedRecords.java:196)
	at io.confluent.connect.jdbc.sink.BufferedRecords.flush(BufferedRecords.java:186)
	at io.confluent.connect.jdbc.sink.JdbcDbWriter.write(JdbcDbWriter.java:80)
	at io.confluent.connect.jdbc.sink.JdbcSinkTask.put(JdbcSinkTask.java:84)
	at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:587)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:336)
	at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:237)
	at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:206)
	at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:202)
	at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:257)
	at org.apache.kafka.connect.runtime.isolation.Plugins.lambda$withClassLoader$1(Plugins.java:181)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)

when an INSERT query has run after a long time. Redshift Serverless logs suggest that the query succeeds and then the connector fails.
Usually restarting the task fixes the issue but since it doesn't throw a java.sql.SQLException, we can't automate the restarts.

Reproduction code

The following configurations are set on the connector, all other values are left as default:

connection.url: jdbc:redshift://host:port/dbname
connection.user: username
connection.password: password
table.name.format: tablename

The connection parameters are fetched here and passed on to the driver here

@bhvkshah
Copy link
Contributor

Hi @shraddha-ca thanks for opening this issue. I will take a look and get back to you once I have an update

@shawnazhaoxy
Copy link

Hi, @bhvkshah any update on the issue?

@bhvkshah
Copy link
Contributor

Hi @shawnazhaoxy @shraddha-ca , thanks for following up. I believe this could be an issue in the server. Are you able to provide DEBUG level driver logs, as well as a wireshark trace during a repro which I could then pass on to the internal server team?

@bhvkshah
Copy link
Contributor

@shraddha-ca can updates on providing more info on this?

@shraddha-ca
Copy link
Author

shraddha-ca commented Jun 5, 2024

@bhvkshah apologies for the delay. We didn't have an easy way to get Debug Logs, however, I have attached the logs from the latest failure for:

Caused by: java.util.NoSuchElementException
	at java.base/java.util.ArrayDeque.removeFirst(ArrayDeque.java:363)
	at com.amazon.redshift.core.v3.QueryExecutorImpl.processResultsOnThread(QueryExecutorImpl.java:2368)
	at com.amazon.redshift.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1881)
	at com.amazon.redshift.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1873)
	at com.amazon.redshift.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:595)
	at com.amazon.redshift.jdbc.RedshiftStatementImpl.internalExecuteBatch(RedshiftStatementImpl.java:988)
	at com.amazon.redshift.jdbc.RedshiftStatementImpl.executeBatch(RedshiftStatementImpl.java:1016)
	at com.amazon.redshift.jdbc.RedshiftPreparedStatement.executeBatch(RedshiftPreparedStatement.java:1765)
	at io.confluent.connect.jdbc.sink.BufferedRecords.executeUpdates(BufferedRecords.java:196)
	at io.confluent.connect.jdbc.sink.BufferedRecords.flush(BufferedRecords.java:186)
	at io.confluent.connect.jdbc.sink.JdbcDbWriter.write(JdbcDbWriter.java:80)
	at io.confluent.connect.jdbc.sink.JdbcSinkTask.put(JdbcSinkTask.java:84)
	at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:587)
        at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:336)
	at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:237)
	at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:206)
	at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:202)
	at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:257)
	at org.apache.kafka.connect.runtime.isolation.Plugins.lambda$withClassLoader$1(Plugins.java:181)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)

extract-2024-06-05T13_32_04.239Z.csv

@shraddha-ca
Copy link
Author

My suspicion is that the portal is already closed, mostly from looking at these lines:

Date Host Service Message
2024-06-05T12:03:13.711Z "kafka-ops" "cloudwatch" Jun 05 12:03:13.711 DEBUG [57817 task-thread] com.amazon.redshift.core.v3.QueryExecutorImpl.receiveRFQ: <=BE ReadyForQuery(I)
2024-06-05T12:03:13.711Z "kafka-ops" "cloudwatch" Jun 05 12:03:13.711 DEBUG [57817 task-thread] com.amazon.redshift.core.v3.QueryExecutorImpl.processSyncOnClose: <=BE CloseComplete
2024-06-05T12:03:13.711Z "kafka-ops" "cloudwatch" Jun 05 12:03:13.711 DEBUG [57817 task-thread] com.amazon.redshift.core.v3.QueryExecutorImpl.processSyncOnClose: <=BE CloseComplete
2024-06-05T12:03:13.709Z "kafka-ops" "cloudwatch" Jun 05 12:03:13.709 DEBUG [57817 task-thread] com.amazon.redshift.core.v3.QueryExecutorImpl.sendSync: FE=> Sync
2024-06-05T12:03:13.709Z "kafka-ops" "cloudwatch" Jun 05 12:03:13.709 DEBUG [57817 task-thread] com.amazon.redshift.core.v3.QueryExecutorImpl.sendFlush: FE=> Flush
2024-06-05T12:03:13.709Z "kafka-ops" "cloudwatch" Jun 05 12:03:13.709 DEBUG [57817 task-thread] com.amazon.redshift.core.v3.QueryExecutorImpl.sendClosePortal: FE=> ClosePortal(C_10-1848892317217922)
2024-06-05T12:03:13.709Z "kafka-ops" "cloudwatch" Jun 05 12:03:13.709 DEBUG [57817 task-thread] com.amazon.redshift.core.v3.QueryExecutorImpl.sendCloseStatement: FE=> CloseStatement(S_9-1848892317145757)
2024-06-05T12:03:13.709Z "kafka-ops" "cloudwatch" Jun 05 12:03:13.709 FUNCTION [57817 task-thread] com.amazon.redshift.jdbc.RedshiftStatementImpl.cancel: Return
2024-06-05T12:03:13.709Z "kafka-ops" "cloudwatch" Jun 05 12:03:13.709 ERROR [57817 task-thread] com.amazon.redshift.jdbc.RedshiftStatementImpl.cancel: statementState is StatementCancelState.IDLE
2024-06-05T12:03:13.709Z "kafka-ops" "cloudwatch" Jun 05 12:03:13.709 FUNCTION [57817 task-thread] com.amazon.redshift.jdbc.RedshiftStatementImpl.cancel: Enter (value)
2024-06-05T12:03:13.709Z "kafka-ops" "cloudwatch" Jun 05 12:03:13.709 FUNCTION [57817 task-thread] com.amazon.redshift.jdbc.RedshiftStatementImpl.close: Enter (value)
2024-06-05T12:03:13.709Z "kafka-ops" "cloudwatch" Jun 05 12:03:13.709 FUNCTION [57817 task-thread] com.amazon.redshift.jdbc.RedshiftPreparedStatement.executeUpdate: Return 0
2024-06-05T12:03:13.708Z "kafka-ops" "cloudwatch" Jun 05 12:03:13.708 DEBUG [57817 task-thread] com.amazon.redshift.core.v3.QueryExecutorImpl.processResultsOnThread: ReadyForQuery will exit from processResultsOnThread loop
2024-06-05T12:03:13.708Z "kafka-ops" "cloudwatch" Jun 05 12:03:13.708 DEBUG [57817 task-thread] com.amazon.redshift.core.v3.QueryExecutorImpl.receiveRFQ: <=BE ReadyForQuery(I)
2024-06-05T12:03:13.708Z "kafka-ops" "cloudwatch" Jun 05 12:03:13.708 DEBUG [57817 task-thread] com.amazon.redshift.core.v3.QueryExecutorImpl.processResultsOnThread: FE=> Received packet of type:90=Z

@shraddha-ca
Copy link
Author

@bhvkshah Is there any update on this? Even throwing a java.sql.SQLException will suffice for our use case. This will trigger an automatic restart for us.

@bhvkshah
Copy link
Contributor

Hey @shraddha-ca , I no longer work on this project. Pinging @bsharifi who does.

@paoliniluis
Copy link

Hi there, we're hitting this issue in Metabase as well @bsharifi

@aiguofer
Copy link

Hi, we just saw this issue on our end as well. We're a pass-through query service, so we don't know exactly what is being queried by the customer, but here's the relevant part of the stacktraces we're seeing. What's odd is we've been running the service for over a year and had never encountered this until today, and have seen it happen for at least 2 customers in the past few hours.

java.util.NoSuchElementException: null
	at java.base/java.util.ArrayDeque.getFirst(ArrayDeque.java:402)
	at com.amazon.redshift.core.v3.QueryExecutorImpl.processResultsOnThread(QueryExecutorImpl.java:1940)
	at com.amazon.redshift.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1886)
	at com.amazon.redshift.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1878)
	at com.amazon.redshift.core.v3.QueryExecutorImpl.flushIfDeadlockRisk(QueryExecutorImpl.java:1098)
	at com.amazon.redshift.core.v3.QueryExecutorImpl.sendQuery(QueryExecutorImpl.java:1123)
	at com.amazon.redshift.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:366)
	at com.amazon.redshift.jdbc.RedshiftStatementImpl.executeInternal(RedshiftStatementImpl.java:519)
	at com.amazon.redshift.jdbc.RedshiftStatementImpl.execute(RedshiftStatementImpl.java:440)
	at com.amazon.redshift.jdbc.RedshiftPreparedStatement.executeWithFlags(RedshiftPreparedStatement.java:202)
	at com.amazon.redshift.jdbc.RedshiftPreparedStatement.execute(RedshiftPreparedStatement.java:186)
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants