Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Stronger Writing Guarantee on ReplicaSet #98

Open
Magnitus- opened this issue Jul 16, 2020 · 6 comments
Open

Feature Request: Stronger Writing Guarantee on ReplicaSet #98

Magnitus- opened this issue Jul 16, 2020 · 6 comments
Labels
medium This is important to work on, but not the most important new-feature Request is a new feature SP:2

Comments

@Magnitus-
Copy link

Magnitus- commented Jul 16, 2020

Writing guarantee on a replicaset is not as robust as it could be. If the master crashes before writes are propagated, the client will get a success acknowledgement, but the write won't be there.

Detailed Description

I recommend changing the write concern to "majority". I think that for this particular case, the gain in guarantee greatly outweights a slight loss in latency.

See: https://mongoosejs.com/docs/guide.html#writeConcern

Possible Implementation

src/models/Dictionary.ts:

const defaultWriteConcern = {
  w: 'majority',
  j: true,
  wtimeout: 5000,
};
const DictionarySchema = new mongoose.Schema(
  {
    name: String,
    version: String,
    schemas: Array,
    references: Object,
  },
  { 
    timestamps: true,
    writeConcern: defaultWriteConcern
  },
);
@Magnitus- Magnitus- added the new-feature Request is a new feature label Jul 16, 2020
@Magnitus- Magnitus- changed the title Feature Request Feature Request: Stronger Writing Guarantee on ReplicaSet Jul 16, 2020
@rosibaj rosibaj added specs-wip Specs are in progress new-feature Request is a new feature and removed new-feature Request is a new feature labels Aug 10, 2020
@blabadi
Copy link
Contributor

blabadi commented Aug 10, 2020

I'd say we can make this to be configurable env var, since it depends on the db architecture and how many nodes you have. good point @Magnitus- thanks for the heads up

@Magnitus-
Copy link
Author

Magnitus- commented Aug 11, 2020

I believe it would still work with a single node: I vaguely recall using majority writes in local development with one node in the past (1/1 is still majority afterall). I think where it would fail in some cases would be if you made a hard assumption about the number of nodes and set a write concern of 2 for example.

I guess the main case I could think of against it is if you have a >50ms latency in your replicaset from multiple data center replication with heavy traffic (though frankly, I think that if your replication cannot keep up with your write throughput, you'll be in a lot of trouble when your master crashes).

Either way, it makes sense to make it configurable so that people can adapt to unforeseen use-cases.

@blabadi
Copy link
Contributor

blabadi commented Aug 11, 2020

I believe it would still work with a single node: I vaguely recall using majority writes in local development with one node in the past (1/1 is still majority afterall). I think where it would fail in some cases would be if you made a hard assumption about the number of nodes and set a write concern of 2 for example.

I guess the main case I could think of against it is if you have a >50ms latency in your replicaset from multiple data center replication with heavy traffic (though frankly, I think that if your replication cannot keep up with your write throughput, you'll be in a lot of trouble when your master crashes).

Either way, it makes sense to make it configurable so that people can adapt to unforeseen use-cases.

yeah my concern was having arbiters which may influence the majority vote (not sure how they behave to be frank), if they vote, we may want full consensus, and someone may want just eventual consistency by keeping it in master, so configuration var seems to give enough flexibility.

thanks for the valuable clarifications I appreciate it.

@Magnitus-
Copy link
Author

Magnitus- commented Aug 11, 2020

Not sure if this is the right forum to discuss this, but there are a couple of different concepts at play for this.

Unless things changed a lot since couple of years back, an arbiter should be strictly for voting (in the case of a network partition, it will cut who should be the primary). It does not participate in any data operation (read, write, etc) so I do not think it would be involved at all in the write concern.

A majority write concern doesn't guarantee by itself strong read consistency. All a majority write concern does is give the client a guarantee that when the write call returns, a majority of writeable members will have acknowledged the write (and if that is not possible, an error will be returned).

Whether what you read is strongly consistent depends on your read concern (I see now that it got more complex over time): https://docs.mongodb.com/manual/reference/read-concern/

For example, assume the following scenarios where a writer has majority write concern and a separate reader reads right after the primary got the write, but not the secondaries (and then to make it really fun, assume the primary crashes before the write is propagated to the replicas so the write will be lost and the writer will receive a failure notice):

  • If the reader reads locally on the primary, he'll read a value that will never be persisted on the replicaset (the write will be lost after the primary crashes since the write has not been yet replicated)
  • If the read is of type available, it may read from a secondary which has yet to receive the write of the primary (which in that particular case may ironically be a good thing)

If you want strong consistency (both in your acknowledged writes and the values you are reading), you need a write concern of majority and a read concern of majority.

MongoDB gives you A LOT of granular control on the kind of consistency you want (with the resulting performance tradeoffs). Honestly, I think many of its detractors in that department fail to understand how much you can tweak its behavior to do what you want.

@blabadi
Copy link
Contributor

blabadi commented Aug 11, 2020

If the reader reads locally on the primary, he'll read a value that will never be persisted on the replicaset (the write will be lost after the primary crashes since the write has not been yet replicated)

shouldn't the replication log avoid this case ? even if a crash happens, eventually any non replicated changes will be propagated when replication resumes where it stopped, I think this is a general concept in all leader/follower dbs not only mongo replicaset ?

@Magnitus-
Copy link
Author

Magnitus- commented Aug 11, 2020

If the reader reads locally on the primary, he'll read a value that will never be persisted on the replicaset (the write will be lost after the primary crashes since the write has not been yet replicated)

shouldn't the replication log avoid this case ? even if a crash happens, eventually any non replicated changes will be propagated when replication resumes where it stopped, I think this is a general concept in all leader/follower dbs not only mongo replicaset ?

There is an oplog on the primary (technically, all data nodes have an oplog, but the primary's is the one of interest here) which is used by the secondaries to sync up (basically, as long as a secondary is not so far back that all the operations it doesn't have are in the primary's oplog, syncing up is not too expensive).

However, the primary can accept a write (and put it in its oplog), but then crash before the write is propagated in the secondaries. Unless things changed since I took my certification a couple of years back, what happens then is that when the primary is back up and joins the cluster (as a secondary if another node was elected in the interim), any operations it has in its oplog that are not in the new primary's oplog are "rolled back".

They are still there in the background and retrievable, though it requires a manual intervention at that point (otherwise, the cluster will just ignore them).

Much simpler than the above is just to always acknowledge with a majority write concern so that all your acknowledged writes never end up stuck in the rollback of some former primary that you have to manually restore.

@rosibaj rosibaj added SP:2 and removed specs-wip Specs are in progress labels Aug 13, 2020
@rosibaj rosibaj added the medium This is important to work on, but not the most important label Aug 27, 2020
@rosibaj rosibaj removed this from the Donkey Kong - Sprint 33 milestone Sep 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
medium This is important to work on, but not the most important new-feature Request is a new feature SP:2
Projects
None yet
Development

No branches or pull requests

3 participants