Feature Request: Stronger Writing Guarantee on ReplicaSet #98

Magnitus- · 2020-07-16T16:43:13Z

Writing guarantee on a replicaset is not as robust as it could be. If the master crashes before writes are propagated, the client will get a success acknowledgement, but the write won't be there.

Detailed Description

I recommend changing the write concern to "majority". I think that for this particular case, the gain in guarantee greatly outweights a slight loss in latency.

See: https://mongoosejs.com/docs/guide.html#writeConcern

Possible Implementation

src/models/Dictionary.ts:

const defaultWriteConcern = {
  w: 'majority',
  j: true,
  wtimeout: 5000,
};
const DictionarySchema = new mongoose.Schema(
  {
    name: String,
    version: String,
    schemas: Array,
    references: Object,
  },
  { 
    timestamps: true,
    writeConcern: defaultWriteConcern
  },
);

The text was updated successfully, but these errors were encountered:

blabadi · 2020-08-10T15:59:12Z

I'd say we can make this to be configurable env var, since it depends on the db architecture and how many nodes you have. good point @Magnitus- thanks for the heads up

Magnitus- · 2020-08-11T14:46:30Z

I believe it would still work with a single node: I vaguely recall using majority writes in local development with one node in the past (1/1 is still majority afterall). I think where it would fail in some cases would be if you made a hard assumption about the number of nodes and set a write concern of 2 for example.

I guess the main case I could think of against it is if you have a >50ms latency in your replicaset from multiple data center replication with heavy traffic (though frankly, I think that if your replication cannot keep up with your write throughput, you'll be in a lot of trouble when your master crashes).

Either way, it makes sense to make it configurable so that people can adapt to unforeseen use-cases.

blabadi · 2020-08-11T19:23:02Z

I believe it would still work with a single node: I vaguely recall using majority writes in local development with one node in the past (1/1 is still majority afterall). I think where it would fail in some cases would be if you made a hard assumption about the number of nodes and set a write concern of 2 for example.

I guess the main case I could think of against it is if you have a >50ms latency in your replicaset from multiple data center replication with heavy traffic (though frankly, I think that if your replication cannot keep up with your write throughput, you'll be in a lot of trouble when your master crashes).

Either way, it makes sense to make it configurable so that people can adapt to unforeseen use-cases.

yeah my concern was having arbiters which may influence the majority vote (not sure how they behave to be frank), if they vote, we may want full consensus, and someone may want just eventual consistency by keeping it in master, so configuration var seems to give enough flexibility.

thanks for the valuable clarifications I appreciate it.

Magnitus- · 2020-08-11T20:10:08Z

Not sure if this is the right forum to discuss this, but there are a couple of different concepts at play for this.

Unless things changed a lot since couple of years back, an arbiter should be strictly for voting (in the case of a network partition, it will cut who should be the primary). It does not participate in any data operation (read, write, etc) so I do not think it would be involved at all in the write concern.

A majority write concern doesn't guarantee by itself strong read consistency. All a majority write concern does is give the client a guarantee that when the write call returns, a majority of writeable members will have acknowledged the write (and if that is not possible, an error will be returned).

Whether what you read is strongly consistent depends on your read concern (I see now that it got more complex over time): https://docs.mongodb.com/manual/reference/read-concern/

For example, assume the following scenarios where a writer has majority write concern and a separate reader reads right after the primary got the write, but not the secondaries (and then to make it really fun, assume the primary crashes before the write is propagated to the replicas so the write will be lost and the writer will receive a failure notice):

If the reader reads locally on the primary, he'll read a value that will never be persisted on the replicaset (the write will be lost after the primary crashes since the write has not been yet replicated)
If the read is of type available, it may read from a secondary which has yet to receive the write of the primary (which in that particular case may ironically be a good thing)

If you want strong consistency (both in your acknowledged writes and the values you are reading), you need a write concern of majority and a read concern of majority.

MongoDB gives you A LOT of granular control on the kind of consistency you want (with the resulting performance tradeoffs). Honestly, I think many of its detractors in that department fail to understand how much you can tweak its behavior to do what you want.

blabadi · 2020-08-11T20:56:08Z

If the reader reads locally on the primary, he'll read a value that will never be persisted on the replicaset (the write will be lost after the primary crashes since the write has not been yet replicated)

shouldn't the replication log avoid this case ? even if a crash happens, eventually any non replicated changes will be propagated when replication resumes where it stopped, I think this is a general concept in all leader/follower dbs not only mongo replicaset ?

Magnitus- · 2020-08-11T21:55:47Z

If the reader reads locally on the primary, he'll read a value that will never be persisted on the replicaset (the write will be lost after the primary crashes since the write has not been yet replicated)
shouldn't the replication log avoid this case ? even if a crash happens, eventually any non replicated changes will be propagated when replication resumes where it stopped, I think this is a general concept in all leader/follower dbs not only mongo replicaset ?

There is an oplog on the primary (technically, all data nodes have an oplog, but the primary's is the one of interest here) which is used by the secondaries to sync up (basically, as long as a secondary is not so far back that all the operations it doesn't have are in the primary's oplog, syncing up is not too expensive).

However, the primary can accept a write (and put it in its oplog), but then crash before the write is propagated in the secondaries. Unless things changed since I took my certification a couple of years back, what happens then is that when the primary is back up and joins the cluster (as a secondary if another node was elected in the interim), any operations it has in its oplog that are not in the new primary's oplog are "rolled back".

They are still there in the background and retrievable, though it requires a manual intervention at that point (otherwise, the cluster will just ignore them).

Much simpler than the above is just to always acknowledge with a majority write concern so that all your acknowledged writes never end up stuck in the rollback of some former primary that you have to manually restore.

Magnitus- added the new-feature Request is a new feature label Jul 16, 2020

Magnitus- changed the title ~~Feature Request~~ Feature Request: Stronger Writing Guarantee on ReplicaSet Jul 16, 2020

rosibaj added specs-wip Specs are in progress new-feature Request is a new feature and removed new-feature Request is a new feature labels Aug 10, 2020

rosibaj added this to the [FUTUREST] Donkey Kong - Sprint 32 milestone Aug 13, 2020

rosibaj added SP:2 and removed specs-wip Specs are in progress labels Aug 13, 2020

rosibaj added the medium This is important to work on, but not the most important label Aug 27, 2020

rosibaj modified the milestones: Donkey Kong - Sprint 32, [D2 FUTURE] Donkey Kong - Sprint 33 Aug 27, 2020

rosibaj removed this from the Donkey Kong - Sprint 33 milestone Sep 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Stronger Writing Guarantee on ReplicaSet #98

Feature Request: Stronger Writing Guarantee on ReplicaSet #98

Magnitus- commented Jul 16, 2020 •

edited

Loading

blabadi commented Aug 10, 2020

Magnitus- commented Aug 11, 2020 •

edited

Loading

blabadi commented Aug 11, 2020

Magnitus- commented Aug 11, 2020 •

edited

Loading

blabadi commented Aug 11, 2020 •

edited

Loading

Magnitus- commented Aug 11, 2020 •

edited

Loading

Feature Request: Stronger Writing Guarantee on ReplicaSet #98

Feature Request: Stronger Writing Guarantee on ReplicaSet #98

Comments

Magnitus- commented Jul 16, 2020 • edited Loading

Detailed Description

Possible Implementation

blabadi commented Aug 10, 2020

Magnitus- commented Aug 11, 2020 • edited Loading

blabadi commented Aug 11, 2020

Magnitus- commented Aug 11, 2020 • edited Loading

blabadi commented Aug 11, 2020 • edited Loading

Magnitus- commented Aug 11, 2020 • edited Loading

Magnitus- commented Jul 16, 2020 •

edited

Loading

Magnitus- commented Aug 11, 2020 •

edited

Loading

Magnitus- commented Aug 11, 2020 •

edited

Loading

blabadi commented Aug 11, 2020 •

edited

Loading

Magnitus- commented Aug 11, 2020 •

edited

Loading