You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In production environment, we find our cluster can not recover from a disk failure.
Although the cluster is green, bulk_requests on some shard fail because of AlreadyClosedException.
It seems like the InternalEngine is closed, but ShardFailure is not sent to master.
The root cause is that when failEngine is called by a FlushFailure, markStoreCorrupted() in failEngine() throws one java.nio.file.DirectoryIteratorException that can't be caught, as DirectoryIteratorException is not a IOException.
DirectoryIteratorException is directly thrown out, and eventListener.onFailedEngine(reason, failure); can't be executed
The result is that the index can't process bulkRequest, and master won't reroute the this shard as shardFailure is not reported.
Related component
Cluster Manager
To Reproduce
Manually mock a DirectoryIteratorException when markStoreCorrupted() is called
Expected behavior
ShardFailure can be sent to master when filesystem throws DirectoryIteratorException during failEngine.
Additional Details
Plugins
Please list all plugins currently enabled.
Screenshots
If applicable, add screenshots to help explain your problem.
Host/Environment (please complete the following information):
OS: [e.g. iOS]
Version [e.g. 22]
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered:
Describe the bug
In production environment, we find our cluster can not recover from a disk failure.
Although the cluster is green, bulk_requests on some shard fail because of AlreadyClosedException.
It seems like the InternalEngine is closed, but ShardFailure is not sent to master.
The root cause is that when failEngine is called by a FlushFailure, markStoreCorrupted() in failEngine() throws one java.nio.file.DirectoryIteratorException that can't be caught, as DirectoryIteratorException is not a IOException.
DirectoryIteratorException is directly thrown out, and
eventListener.onFailedEngine(reason, failure);
can't be executedThe result is that the index can't process bulkRequest, and master won't reroute the this shard as shardFailure is not reported.
Related component
Cluster Manager
To Reproduce
Expected behavior
ShardFailure can be sent to master when filesystem throws DirectoryIteratorException during failEngine.
Additional Details
Plugins
Please list all plugins currently enabled.
Screenshots
If applicable, add screenshots to help explain your problem.
Host/Environment (please complete the following information):
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: