Draw faint shaded areas around the eval line to indicate likeliness of a decisive result #248

yuzisee · 2023-09-28T19:55:31Z

Showing "decisive" vs. "dead drawn" in some way allows Nibbler users to:

(when eval is close to zero) users can easily identify which "drawn" positions are actually dead drawn vs. unclear/complex/sharp
(when eval favors one side) users can easily identify which portions of the game contain more counterplay/complexity, despite the advantage

Testing

According to Lc0 evaluating Kramnik vs. Topalov's World Chess Championship 2006 Round 4:

Kramnik was pushing for a draw after 36. …Qh4 but Topalov plays 37. Ra1 rather than 37. e4, to keep the position sharp
the game ultimately transitions from "equal but unclear" → "nearly certain draw" after the exchange 47. …Bxc4 48. Raxc4

The "Immortal Draw" from 1872: Karl Hamppe vs. Philipp Meitner
↓

(screenshot using the famous Karpov vs. Kasparov World Chess Championship 1987, Game 24)
↓

According to Lc0 evaluating Gelfand vs. Anand World Chess Championship 2012 Round 7:

high "sharpness" throughout the game predicts an eventually decisive outcome
only a two-result game remains after 25. …f6 by Anand, giving up his final winning chances

(one of the more "exciting" draws in recent history: Karjakin vs. Carlsen World Chess Championship 2016, Game 2)
↓

And, just for fun: Deep Blue vs. Humankind 1997 Game 2

Old screenshots (ignore)

without laplace smoothing

[]

raw drawrate centered on eval line

value in WDL

a decisive result according to WDL * When eval is close to zero, this allows users to more easily identify which "drawn" positions are actually dead drawn vs. unclear/complex/sharp * When eval favors one side, this allows users to more easily identify which portions of the game contained more counterplay/complexity despite the advantage

Reduce potential merge conflicts w/ rooklift#248

Help to reduce merge conflicts w/ rooklift#237

rooklift · 2023-09-29T10:45:43Z

It's an interesting idea. The main line of the graph seems to be missing for the start position somehow?

rooklift · 2023-09-29T10:53:05Z

Also, I suspect a lot of people want a WDL graph, which I don't think this quite is?

(Not that I'd promise to accept such a thing.)

Review response to rooklift#248 (comment)

yuzisee · 2023-09-29T15:47:29Z

It's an interesting idea. The main line of the graph seems to be missing for the start position somehow?

Oh, thanks! Fixed be6715d

yuzisee · 2023-09-29T17:36:23Z

people want a WDL graph, which I don't think this quite is?

Here's what a WDL version would look like (same game as the upper screenshot, in case you want to compare directly: Kramnik vs. Topalov's World Chess Championship 2006, Game 4)

(Kasparov vs. Karpov World Chess Championship 1987, Game 24)
↓

rooklift · 2023-09-29T19:12:12Z

By the way, I recall telling this to someone but not exactly who I told it to - Nibbler is barely maintained these days, and the codebase is a mess; I'm not enthusiastic about making any changes at all, unless there's bugs or feature requests from Lc0 devs...

At least a couple of people maintain their own Nibbler forks (e.g. this one) and that might be the happier way to proceed.

yuzisee · 2023-09-29T20:03:14Z

I'm not enthusiastic about making any changes at all… At least a couple of people maintain their own Nibbler forks (e.g. this one) and that might be the happier way to proceed.

Oh, yes! Maybe a top-level message on the README.md that endorses a specific fork could be a great way to gather community interest in one place

e.g.

Otherwise, https://github.com/rooklift/nibbler/forks currently lists 59 different forks so new contributors (such as myself) will simply end up pushing PRs into the main repo instead.

rooklift#248 (comment) Draw as a "WDL graph" instead

yuzisee · 2023-09-30T05:29:28Z

the codebase is a mess

Take it from someone who has been in Software Engineering for a long time… all codebases eventually become a mess. The more widely used a product is, the more of a mess its code becomes over time 🙂

Here's what a WDL version would look like

In any case, whether you ultimately decide to accept/merge this or not, I think you're right that the WDL version turned out to be both (a) more aesthetically pleasing, and (b) easier for a new user to understand — so the pull request has been updated into the WDL version.

rooklift · 2023-09-30T19:33:29Z

Hmm - is it simply drawing the Draw score as centred on the main line of the graph? I don't think that would be correct, e.g. in this image:

The infobox tells me the current position has Black win of only 15 out of 1000, but it's certainly drawn as if it's more than that.

I think to do it correctly you would need to actually use all 3 of the WDL numbers? (Or at least 2, the third can be inferred...)

yuzisee · 2023-09-30T20:24:28Z

is it simply drawing the Draw score as centred on the main line of the graph?

Yeah.

I think to do it correctly you would need to actually use all 3 of the WDL numbers? (Or at least 2, the third can be inferred...)

I did test various things, but ultimately ran into the same question that @Naphthalin describes when trying to read winrates directly from the engine:

I personally don't think that engine WDL (and the derived expected score) is particularly useful for analyzing human games, and requires some translation to be of use, which is mostly what LeelaChessZero/lc0#1791 is about. After working on that quite some time, I came to the conclusion that what is now the WDL_mu score type is the most (or possibly only) natural way of assigning a single number to a position

This way, the centipawn value coming from WDL_mu remains the "source of truth" for which player has the advantage and by how much, rather than the "W" and "L" values given by the engine.

If we were to calculate, by hand, the "translated" win% and loss% the same way as in #237 (comment) by treating WDL_mu eval as the source of truth,

it lands us around 23% for White : (64% draw) : 13% for Black which looks about right based on what's in the Kasparov-Karpov image and for the same reasons lines up well with the sigmoid squishing of #244

Naphthalin · 2023-10-01T19:20:53Z

Interesting work, I really like the shaded background!

I personally see two possibilities; one is displaying the actual WDL (and leaving it to the user to use adequate WDL sharpness through the contempt settings), the other is to do even more maths, and directly use the WDL_mu stddev formula 2 / (ln(1/W-1) + ln(1/L-1)) to draw a 95% interval around the eval line.

yuzisee · 2023-10-01T22:37:40Z

directly use the WDL_mu stddev formula 2 / (ln(1/W-1) + ln(1/L-1)) to draw a 95% interval around the eval line.

@Naphthalin We'd want to use log2 rather than ln in this case, since Nibbler is drawing the graph as

nibbler/files/src/renderer/50_table.js

Line 40 in 34dba0a

this.graph_y = 1 / (1 + Math.pow(0.5, cp / 100));

right?

Naphthalin · 2023-10-01T22:53:19Z

No, idea would be to calculate cp(WDL_mu) +/- 2 * stddev, calculate the y with the formula you posted, and shade the part in between.

yuzisee · 2023-10-02T00:02:59Z

No, idea would be to calculate cp(WDL_mu) +/- 2 * stddev, calculate the y with the formula you posted, and shade the part in between.

Got it, yes I see that here

https://github.com/LeelaChessZero/lc0/blob/076299b1f1ca21993b2c5e82ab3e80edb5367057/src/mcts/search.cc#L237-L239

And, would we want a 50% confidence interval (rather than 95%) so that it expresses something akin to:

shaded region: decisive outcome more likely than draw (i.e. >50%)
middle region: draw more likely than decisive outcome (i.e. <50%)

UPDATE:

Worked! It's looking good:

I've committed 107710b and will update the remaining screenshots

Review response to rooklift#248 (comment)

rooklift#248 (comment) Draw as a "WDL graph" instead

* Handle null values correctly

Directly use the `WDL_mu` stddev formula `2 / (ln(1/W-1) + ln(1/L-1))` to calculate the interval around the eval line.

Naphthalin · 2023-10-02T08:33:03Z

I just realized that optically it suggests the opposite of what your initial plots did (i.e. the draw area going to 100% in case of a draw), and if you stick to the idea of the shaded areas representing uncertainty about the eval, I think the currently black part should be shaded instead. There are probably some edge case issues with W or L being reported as 0 you need to deal with, e.g. by scaling WDL by 0.994 and adding 0.2% to each of W,D,L when calculating the line.

"I just realized that optically it suggests the opposite of what your initial plots did (i.e. the draw area going to 100% in case of a draw), and if you stick to the idea of the shaded areas representing uncertainty about the eval, I think the currently black part should be shaded instead"

"There are probably some edge case issues with `W` or `L` being reported as 0 you need to deal with, e.g. by scaling WDL by 0.994 and adding 0.2% to each of W,D,L when calculating the line."

yuzisee · 2023-10-02T15:35:29Z

There are probably some edge case issues with W or L being reported as 0 you need to deal with, e.g. by scaling WDL by 0.994 and adding 0.2% to each of W,D,L when calculating the line.

Yup, incorporated bfe752a

Looks like there was a similar warning here as well https://github.com/LeelaChessZero/lc0/blob/076299b1f1ca21993b2c5e82ab3e80edb5367057/src/mcts/search.cc#L232-L236

I just realized that optically it suggests the opposite of what your initial plots did (i.e. the draw area going to 100% in case of a draw), and if you stick to the idea of the shaded areas representing uncertainty about the eval, I think the currently black part should be shaded instead.

Aha yes, committed 696c576

Here's what it all looks like now, altogether:

"There are probably some edge case issues with `W` or `L` being reported as 0 you need to deal with, e.g. by scaling WDL by 0.994 and adding 0.2% to each of W,D,L when calculating the line."

rooklift · 2023-10-02T16:34:39Z

I doubt I'm going to accept this honestly. As I say I barely maintain Nibbler these days.

(and leaving it to the user to use adequate WDL sharpness through the contempt settings)

yuzisee · 2023-10-04T15:30:32Z

In the interest of reducing maintenance burden to the bare minumum, I'll leave everything in "display raw WDL" mode so everything can be kept as simple as possible, codewise.

Screenshots have been updated at the top, and it does add a nice aesthetic to Nibbler overall. Maybe this final simplified version is simple enough to be worth considering?

Anyway the final decision is still yours, of course — hopefully this helps in some small part to make the decision easier.

yuzisee added 3 commits September 28, 2023 12:23

No functional change, make it easier to adjust the return type

d0e823b

Compute "complexity/uncertainty/counterplay" value based on the draw

09e91bb

value in WDL

yuzisee changed the title ~~Draw a faint shaded area around the eval line to indicate the likeliness of a decisive result~~ Draw a faint shaded area around the eval line to indicate likeliness of a decisive result Sep 28, 2023

yuzisee added a commit to yuzisee/nibbler-dev that referenced this pull request Sep 29, 2023

No functional change.

57a1f22

Reduce potential merge conflicts w/ rooklift#248

yuzisee added a commit to yuzisee/nibbler-dev that referenced this pull request Sep 29, 2023

No functional change.

8941e30

Reduce potential merge conflicts w/ rooklift#248

yuzisee force-pushed the dev-complexity-counterplay-graph branch from cf4f1ae to db77356 Compare September 29, 2023 02:28

yuzisee added a commit to yuzisee/nibbler-dev that referenced this pull request Sep 29, 2023

No functional change.

1342d2c

Reduce potential merge conflicts w/ rooklift#248

yuzisee added a commit to yuzisee/nibbler-dev that referenced this pull request Sep 29, 2023

No functional change.

1b41cbf

Reduce potential merge conflicts w/ rooklift#248

No functional change.

279e905

Help to reduce merge conflicts w/ rooklift#237

yuzisee force-pushed the dev-complexity-counterplay-graph branch from db77356 to 279e905 Compare September 29, 2023 02:52

Minor bugfix

13097a2

yuzisee added a commit to yuzisee/nibbler-dev that referenced this pull request Sep 29, 2023

Bugfix (the main line of the graph was missing for the start position)

be6715d

Review response to rooklift#248 (comment)

yuzisee changed the title ~~Draw a faint shaded area around the eval line to indicate likeliness of a decisive result~~ Draw a faint shaded areas around the eval line to indicate likeliness of a decisive result Sep 30, 2023

yuzisee changed the title ~~Draw a faint shaded areas around the eval line to indicate likeliness of a decisive result~~ Draw faint shaded areas around the eval line to indicate likeliness of a decisive result Sep 30, 2023

yuzisee added a commit to yuzisee/nibbler-dev that referenced this pull request Sep 30, 2023

Review response to

3f345dd

rooklift#248 (comment) Draw as a "WDL graph" instead

Bugfix (the main line of the graph was missing for the start position)

aa07184

Review response to rooklift#248 (comment)

yuzisee added 2 commits October 1, 2023 17:34

Rename variable for readability only (no functional change)

a2516d3

Review response to

51c4687

rooklift#248 (comment) Draw as a "WDL graph" instead

yuzisee force-pushed the dev-complexity-counterplay-graph branch from 3f345dd to 51c4687 Compare October 2, 2023 00:35

yuzisee added 2 commits October 1, 2023 22:19

Bugfix (corner case)

9dad848

* Handle null values correctly

Review response to rooklift#248 (comment)

107710b

Directly use the `WDL_mu` stddev formula `2 / (ln(1/W-1) + ln(1/L-1))` to calculate the interval around the eval line.

Review response to rooklift#248 (comment)

bfe752a

"There are probably some edge case issues with `W` or `L` being reported as 0 you need to deal with, e.g. by scaling WDL by 0.994 and adding 0.2% to each of W,D,L when calculating the line."

yuzisee force-pushed the dev-complexity-counterplay-graph branch from 2cf337d to bfe752a Compare October 2, 2023 15:38

Draw the actual (raw) WDL instead

592478a

(and leaving it to the user to use adequate WDL sharpness through the contempt settings)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draw faint shaded areas around the eval line to indicate likeliness of a decisive result #248

Draw faint shaded areas around the eval line to indicate likeliness of a decisive result #248

yuzisee commented Sep 28, 2023 •

edited

Loading

rooklift commented Sep 29, 2023

rooklift commented Sep 29, 2023 •

edited

Loading

yuzisee commented Sep 29, 2023

yuzisee commented Sep 29, 2023 •

edited

Loading

rooklift commented Sep 29, 2023

yuzisee commented Sep 29, 2023 •

edited

Loading

yuzisee commented Sep 30, 2023

rooklift commented Sep 30, 2023

yuzisee commented Sep 30, 2023 •

edited

Loading

Naphthalin commented Oct 1, 2023

yuzisee commented Oct 1, 2023 •

edited

Loading

Naphthalin commented Oct 1, 2023

yuzisee commented Oct 2, 2023 •

edited

Loading

Naphthalin commented Oct 2, 2023

yuzisee commented Oct 2, 2023 •

edited

Loading

rooklift commented Oct 2, 2023

yuzisee commented Oct 4, 2023

Draw faint shaded areas around the eval line to indicate likeliness of a decisive result #248

Are you sure you want to change the base?

Draw faint shaded areas around the eval line to indicate likeliness of a decisive result #248

Conversation

yuzisee commented Sep 28, 2023 • edited Loading

Testing

rooklift commented Sep 29, 2023

rooklift commented Sep 29, 2023 • edited Loading

yuzisee commented Sep 29, 2023

yuzisee commented Sep 29, 2023 • edited Loading

rooklift commented Sep 29, 2023

yuzisee commented Sep 29, 2023 • edited Loading

yuzisee commented Sep 30, 2023

rooklift commented Sep 30, 2023

yuzisee commented Sep 30, 2023 • edited Loading

Naphthalin commented Oct 1, 2023

yuzisee commented Oct 1, 2023 • edited Loading

Naphthalin commented Oct 1, 2023

yuzisee commented Oct 2, 2023 • edited Loading

UPDATE:

Naphthalin commented Oct 2, 2023

yuzisee commented Oct 2, 2023 • edited Loading

rooklift commented Oct 2, 2023

yuzisee commented Oct 4, 2023

yuzisee commented Sep 28, 2023 •

edited

Loading

rooklift commented Sep 29, 2023 •

edited

Loading

yuzisee commented Sep 29, 2023 •

edited

Loading

yuzisee commented Sep 29, 2023 •

edited

Loading

yuzisee commented Sep 30, 2023 •

edited

Loading

yuzisee commented Oct 1, 2023 •

edited

Loading

yuzisee commented Oct 2, 2023 •

edited

Loading

yuzisee commented Oct 2, 2023 •

edited

Loading