Nhse d34 nhskv.i33 token #35

martinsumner · 2024-05-31T23:04:54Z

Add support for a token manager and token session process to act as a requestor for tokens, and warp riak_client calls within a session.

This is then used to be provide a stronger conditional PUT API to allow for check-and-set operations that will behave reliably when the cluster is healthy, as well as in simple failure scenarios and operational changes. The CAS will fall-back to eventual consistency in extreme failure scenarios.

Start a token manager on each node. The token manager will on request register a session against a token, and remove the registration when the session terminates. If another request is received for the same token, it will be queued until the previous session releases the token (or the queued session dies). There is a session server that can be started to run a session (which request a token from the local token_manager on startup), and will allow for the session to be used by making calls to riak_client functions. Each call will renew the session. After token timeout, the session will terminate.

Allow, by configuration, for the conditional check to be strengthened to use the new token-based mechanism

Means that if a topology change occurs without node failure, sibling protection is still maintained.

Also catch erpc errors when initial connecting to node, and have back-off and retry. This smooths handling of the case where a node is suddenly killed.

To make it simpler and clearer to remove session token

If the strong conditional check is at the API - becomes much harder to make the check identically and reliably in the HTTP API. Webmachine has lots of potential exit points outside of our control - so it is hard to reliably release a session on exit.

Change the GET run as part of a conditional PUT, so as not ot require a full object fetch.

To help in riak_test

Also allows for the async update to verifications that an upstream node has granted a token

src/riak_kv_token_session.erl

Discussion of the reliability of process monitoring, including the monitoring of processes on remote nodes, led to a new design which simplifies the message passing required by relying instead on monitoring. A request should now be refused if there exists an currently active grant, and the preflist has changed since that grant was activated. In all other situations grants should be made (either immediately or after being queued awaiting a release).

With aim to improve clarity

The naming has been changed following review to try and clarify meaning. The downstream_check was previously sync - but two token_managers could attempt to check in each other at the same time (for different tokens) - resulting in deadlock. These messages are now async not sync.

priv/riak_kv.schema

src/riak_kv_sup.erl

src/riak_kv_token_manager.erl

The token_manager will no longer monitor remote upstream sessions. Add a backup GC process, whereby if a request is blocked, there is a check of the existence in the remote manager of an association with the session. The only thing that must be reliable, is therefore the monitoring of local sessions by managers. Before using a local session, an association check is now made, that the local token manager has an active association for that session.

Very hard to test the GC process - hard to create a PID which doesn't trigger down on die. There was a discrepancy in sloppy_quorum between PB/WM APIs for the conditional check. This is resolved.

Once a HEAD has been converted to a GET, disqualify it from the find_bestobject calculation. As GETs occur after HEADs in the GET_FSM, it should by default be the case when allow_mult=true that VCget >= VChead. This is not the case with lww=true, wher eobjects may regress in VC terms. However as lww=true, it is destined to prefer the GET which came after the HEAD. The code as-is could cause a HEAD to continuously require re-fetching, and lead to requests timing out. This is demonstrated in the riak_test associated with the conditional PUT token implementation

…k/riak_kv into nhse-d34-nhskv.i33-token

src/riak_kv_token_manager.erl

Change following review

lists:uniq introduced in OTP25

martinsumner · 2024-07-08T09:37:53Z

Looking at the results of a full performance test, the overhead of performing basic_consensus for a proportion of PUTs is not large

martinsumner added 12 commits May 2, 2024 18:16

Initial support for OTP26

79a0e55

Remove tracers

246774a

Update rebar.config

6c46b9d

Update rebar.config

1d04e59

Add strong conditional check to PB API

4e3489d

Allow, by configuration, for the conditional check to be strengthened to use the new token-based mechanism

Reset state on PB connection after completion

b615d2e

Add downstream recording of state

cb53d98

Means that if a topology change occurs without node failure, sibling protection is still maintained.

Fix uniquenes of node list

deb3ce3

Also catch erpc errors when initial connecting to node, and have back-off and retry. This smooths handling of the case where a node is suddenly killed.

Refactor riak_kv_pb_object

6854977

To make it simpler and clearer to remove session token

Add support to HTTP API

39a3e87

martinsumner mentioned this pull request May 31, 2024

Nhse d34 nhskv.i33 token OpenRiak/riak_test#19

Open

martinsumner added 9 commits June 5, 2024 07:34

Conditional PUT to require GET not HEAD

09fcda2

Change the GET run as part of a conditional PUT, so as not ot require a full object fetch.

Add configuration

56b6833

Add profile function

a0b027b

To help in riak_test

Add profiler

ef61d8b

Merge branch 'nhse-d34-otp26' into nhse-d34-nhskv.i30-profiler

a11d4d2

Type fix

664b6e5

Add to extending list of defaults

5b8cb9f

Merge branch 'nhse-d34-nhskv.i30-profiler' into nhse-d34-nhskv.i33-token

346f276

Initial write-up of riak_kv_token_manager

87cf5b1

Also allows for the async update to verifications that an upstream node has granted a token

ThomasArts reviewed Jun 13, 2024

View reviewed changes

src/riak_kv_token_session.erl Outdated Show resolved Hide resolved

ThomasArts reviewed Jun 13, 2024

View reviewed changes

src/riak_kv_token_session.erl Outdated Show resolved Hide resolved

ThomasArts reviewed Jun 13, 2024

View reviewed changes

src/riak_kv_token_session.erl Show resolved Hide resolved

martinsumner added 5 commits June 13, 2024 11:29

Change to config parameters

665d467

With aim to improve clarity

Remove duplicate

2f95b40

Clarify erpc errors

30a871b

ThomasArts reviewed Jun 17, 2024

View reviewed changes

priv/riak_kv.schema Outdated Show resolved Hide resolved

ThomasArts reviewed Jun 17, 2024

View reviewed changes

src/riak_kv_sup.erl Show resolved Hide resolved

ThomasArts reviewed Jun 18, 2024

View reviewed changes

src/riak_kv_token_manager.erl Outdated Show resolved Hide resolved

ThomasArts reviewed Jun 18, 2024

View reviewed changes

src/riak_kv_token_manager.erl Outdated Show resolved Hide resolved

martinsumner added 6 commits June 18, 2024 12:58

Add GC process

ce10992

Very hard to test the GC process - hard to create a PID which doesn't trigger down on die. There was a discrepancy in sloppy_quorum between PB/WM APIs for the conditional check. This is resolved.

Merge branch 'nhse-d34-nhskv.i33-token' of https://github.com/nhs-ria…

c4a033f

…k/riak_kv into nhse-d34-nhskv.i33-token

Extend eunit test of token_manager

09ad183

Try and keep github formatter happy

10de288

ThomasArts reviewed Jun 19, 2024

View reviewed changes

src/riak_kv_token_manager.erl Show resolved Hide resolved

ThomasArts approved these changes Jun 19, 2024

View reviewed changes

martinsumner added 3 commits June 19, 2024 14:59

Use monitor rather than spoof 'DOWN'

11caa89

Change following review

Use uniq code in OTP 24

6ba1736

lists:uniq introduced in OTP25

Update comments for perceived clarity

8f9b5e0

Base automatically changed from nhse-d34-otp26 to nhse-develop-3.4 September 10, 2024 11:32

martinsumner added 2 commits September 23, 2024 15:10

Merge branch 'nhse-develop-3.4' into nhse-d34-nhskv.i33-token

904fd69

Remove double-definition on merge

3a102b3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nhse d34 nhskv.i33 token #35

Nhse d34 nhskv.i33 token #35

martinsumner commented May 31, 2024

martinsumner commented Jul 8, 2024

Nhse d34 nhskv.i33 token #35

Are you sure you want to change the base?

Nhse d34 nhskv.i33 token #35

Conversation

martinsumner commented May 31, 2024

martinsumner commented Jul 8, 2024