Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable JetStream on disk errors #6292

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

souravagrawal
Copy link

@souravagrawal souravagrawal commented Dec 22, 2024

Currently, there are scenarios where NATS JetStream may encounter permission errors when file system goes into read only mode, which can lead to an inconsistent state. In such cases, the system continues to allow publishing messages by resetting stream state, leading to a misaligned consumer stream sequence.

This PR introduces changes to gracefully handle these permission errors and prevent NATS from continuing in an inconsistent state when:

  • Flushing stream state to disk
  • Deleting expired messages on startup
  • Creating new block for messages
    After this PR, If NATS is running in non-clustered mode, the user will be unable to issue write requests until the issue is resolved. In clustered mode, only the affected node will stop accepting requests, while the system will continue to function as long as the required quorum remains healthy.

PR potentially fixes : #6211 which leads to consumer sequence reaching higher than stream sequence.
Signed-off-by: Sourabh Agrawal [email protected]

Copy link
Member

@neilalexander neilalexander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably have better detection of a wider array of filesystem errors like permissions etc, agreed, but I'm not entirely sure why this would be a configurable option and I don't particularly like it either.

The disk being pulled out from underneath JetStream is effectively a catastrophic operational issue and for JS to continue to try to operate in those circumstances feels problematic, i.e. with the potential for data loss that no one may even notice. Even more so in a clustered scenario where the metalayer and Raft logs are being written out to disk.

@souravagrawal
Copy link
Author

We should probably have better detection of a wider array of filesystem errors like permissions etc, agreed, but I'm not entirely sure why this would be a configurable option and I don't particularly like it either.

The disk being pulled out from underneath JetStream is effectively a catastrophic operational issue and for JS to continue to try to operate in those circumstances feels problematic, i.e. with the potential for data loss that no one may even notice. Even more so in a clustered scenario where the metalayer and Raft logs are being written out to disk.

Thank you for taking the time to review this.
I will remove the flag. As this is my first patch in the server, I wanted to err on the side of caution.
Are there any other cases you think I should address as part of this?

@souravagrawal
Copy link
Author

Reopening PR, as it was closed mistakenly

@souravagrawal souravagrawal reopened this Dec 25, 2024
@souravagrawal
Copy link
Author

We should probably have better detection of a wider array of filesystem errors like permissions etc, agreed, but I'm not entirely sure why this would be a configurable option and I don't particularly like it either.
The disk being pulled out from underneath JetStream is effectively a catastrophic operational issue and for JS to continue to try to operate in those circumstances feels problematic, i.e. with the potential for data loss that no one may even notice. Even more so in a clustered scenario where the metalayer and Raft logs are being written out to disk.

Thank you for taking the time to review this. I will remove the flag. As this is my first patch in the server, I wanted to err on the side of caution. Are there any other cases you think I should address as part of this?

Thank you for the suggestion @neilalexander. I have implemented the recommended change and removed the flag.

@souravagrawal
Copy link
Author

Hello @neilalexander, Happy New Year.
Just following up to check if you’ve had a chance to review the changes.

Copy link
Member

@neilalexander neilalexander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having the filestore.go call DisableJetStream directly may not be the best approach here. In a number of places we already surface errors up to stream.go, i.e. for isOutOfSpaceErr.

It feels like the more natural thing to do here would be to extend that same pattern, such that the stream examines surfaced errors and reacts, similar to how we do with the disk out-of-space warnings. Otherwise we end up with different types of disk errors being handled differently and that makes the code difficult to trace.

fs.expireMsgsOnRecover()
err := fs.expireMsgsOnRecover()
if err != nil && err == errFileSystemPermissionDenied {
fs.srv.Warnf("file system permission denied while expiring msgs, disabling jetstream: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure any log lines are capitalised/cased correctly.

os.Remove(mb.mfn)
err := os.Remove(mb.mfn)
if err != nil && os.IsPermission(err){
return errFileSystemPermissionDenied
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to do this rather than just check os.IsPermission further up the callstack?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense, I will change it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may even make sense to define a new error-detect function for this case, similar to how we already have this in store.go for out-of-space:

func isOutOfSpaceErr(err error) bool {
	return err != nil && (strings.Contains(err.Error(), "no space left"))
}

@souravagrawal
Copy link
Author

I think having the filestore.go call DisableJetStream directly may not be the best approach here. In a number of places we already surface errors up to stream.go, i.e. for isOutOfSpaceErr.

It feels like the more natural thing to do here would be to extend that same pattern, such that the stream examines surfaced errors and reacts, similar to how we do with the disk out-of-space warnings. Otherwise we end up with different types of disk errors being handled differently and that makes the code difficult to trace.

I agree; it makes sense to handle this within the same pattern. I have implemented the change as you suggested.
However, there are a few other cases where stream-related operations, such as expireMsgs and writeFullState, are executed within a Go routine and need to be addressed in filestore.go, I believe.

@souravagrawal souravagrawal force-pushed the main branch 2 times, most recently from 9b4d96d to 0caaf62 Compare January 16, 2025 15:57
@@ -8562,3 +8563,81 @@ func TestFileStoreDontSpamCompactWhenMostlyTombstones(t *testing.T) {
fmb.bytes /= 2
require_True(t, fmb.shouldCompactInline())
}

func TestStoreRawMessageThrowsPermissionErrorIfFSModeReadOnly(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All tests in filestore_test.go need to start TestFileStore, otherwise CI won't pick them up properly.

fs.writeFullState()
err := fs.writeFullState()
if isPermissionError(err) {
fs.warn("File system permission denied when flushing stream state, disabling jetstream %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Capitalisation JetStream, pop a : before the %v too.

fs.warn("File system permission denied when flushing stream state, disabling jetstream %v", err)
// messages in block cache could be lost in the worst case.
// In the clustered mode it is very highly unlikely as a result of replication.
fs.srv.DisableJetStream()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think fs.srv can be nil here but let's add a fs.srv != nil guard just in case.

@@ -8743,6 +8772,10 @@ func (fs *fileStore) _writeFullState(force bool) error {
// Protect with dios.
<-dios
err := os.WriteFile(fn, buf, defaultFilePerms)
// if file system is not writable isPermissionError is set to true
if isPermissionError(err) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must be done after the dios <- struct{}{} line, otherwise the semaphore becomes unbalanced in a failure scenario.

server/stream.go Outdated
// messages in block cache could be lost in the worst case.
// In the clustered mode it is very highly unlikely as a result of replication.
mset.srv.DisableJetStream()
mset.srv.Warnf("File system permission denied while writing msg, disabling jetstream: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same nit here with capitalisation etc.

Filesystem permission denied while writing message, disabling JetStream: %v

@souravagrawal souravagrawal force-pushed the main branch 2 times, most recently from 491222c to 0cde6ae Compare January 20, 2025 18:54
@souravagrawal
Copy link
Author

Thank you, @neilalexander. All the comments have been addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stream state resets if the Jetstream store directory goes into read only mode [v2.10.23]
2 participants