Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draw faint shaded areas around the eval line to indicate likeliness of a decisive result #248

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

yuzisee
Copy link

@yuzisee yuzisee commented Sep 28, 2023

Showing "decisive" vs. "dead drawn" in some way allows Nibbler users to:

  • (when eval is close to zero) users can easily identify which "drawn" positions are actually dead drawn vs. unclear/complex/sharp
  • (when eval favors one side) users can easily identify which portions of the game contain more counterplay/complexity, despite the advantage

Testing

According to Lc0 evaluating Kramnik vs. Topalov's World Chess Championship 2006 Round 4:

  • Kramnik was pushing for a draw after 36. …Qh4 but Topalov plays 37. Ra1 rather than 37. e4, to keep the position sharp
  • the game ultimately transitions from "equal but unclear" → "nearly certain draw" after the exchange 47. …Bxc4 48. Raxc4

image


The "Immortal Draw" from 1872: Karl Hamppe vs. Philipp Meitner

image


(screenshot using the famous Karpov vs. Kasparov World Chess Championship 1987, Game 24)

image


According to Lc0 evaluating Gelfand vs. Anand World Chess Championship 2012 Round 7:

  • high "sharpness" throughout the game predicts an eventually decisive outcome
  • only a two-result game remains after 25. …f6 by Anand, giving up his final winning chances

image


(one of the more "exciting" draws in recent history: Karjakin vs. Carlsen World Chess Championship 2016, Game 2)

image


And, just for fun: Deep Blue vs. Humankind 1997 Game 2
image

Old screenshots (ignore)

without laplace smoothing

[]

image

[]

image

[]

[]


raw drawrate centered on eval line

image

image

image

image

image

image


  • image
  • image
  • image
  • image

a decisive result according to WDL

* When eval is close to zero, this allows users to more easily identify
  which "drawn" positions are actually dead drawn vs.
  unclear/complex/sharp
* When eval favors one side, this allows users to more easily identify
  which portions of the game contained more counterplay/complexity
  despite the advantage
@yuzisee yuzisee changed the title Draw a faint shaded area around the eval line to indicate the likeliness of a decisive result Draw a faint shaded area around the eval line to indicate likeliness of a decisive result Sep 28, 2023
yuzisee added a commit to yuzisee/nibbler-dev that referenced this pull request Sep 29, 2023
Reduce potential merge conflicts w/ rooklift#248
yuzisee added a commit to yuzisee/nibbler-dev that referenced this pull request Sep 29, 2023
Reduce potential merge conflicts w/ rooklift#248
@yuzisee yuzisee force-pushed the dev-complexity-counterplay-graph branch from cf4f1ae to db77356 Compare September 29, 2023 02:28
yuzisee added a commit to yuzisee/nibbler-dev that referenced this pull request Sep 29, 2023
Reduce potential merge conflicts w/ rooklift#248
yuzisee added a commit to yuzisee/nibbler-dev that referenced this pull request Sep 29, 2023
Reduce potential merge conflicts w/ rooklift#248
Help to reduce merge conflicts w/ rooklift#237
@yuzisee yuzisee force-pushed the dev-complexity-counterplay-graph branch from db77356 to 279e905 Compare September 29, 2023 02:52
@rooklift
Copy link
Owner

It's an interesting idea. The main line of the graph seems to be missing for the start position somehow?

@rooklift
Copy link
Owner

rooklift commented Sep 29, 2023

Also, I suspect a lot of people want a WDL graph, which I don't think this quite is?

(Not that I'd promise to accept such a thing.)

yuzisee added a commit to yuzisee/nibbler-dev that referenced this pull request Sep 29, 2023
@yuzisee
Copy link
Author

yuzisee commented Sep 29, 2023

It's an interesting idea. The main line of the graph seems to be missing for the start position somehow?

Oh, thanks! Fixed be6715d

@yuzisee
Copy link
Author

yuzisee commented Sep 29, 2023

people want a WDL graph, which I don't think this quite is?

Here's what a WDL version would look like (same game as the upper screenshot, in case you want to compare directly: Kramnik vs. Topalov's World Chess Championship 2006, Game 4)

image


(Kasparov vs. Karpov World Chess Championship 1987, Game 24)

image

@rooklift
Copy link
Owner

By the way, I recall telling this to someone but not exactly who I told it to - Nibbler is barely maintained these days, and the codebase is a mess; I'm not enthusiastic about making any changes at all, unless there's bugs or feature requests from Lc0 devs...

At least a couple of people maintain their own Nibbler forks (e.g. this one) and that might be the happier way to proceed.

@yuzisee
Copy link
Author

yuzisee commented Sep 29, 2023

I'm not enthusiastic about making any changes at all… At least a couple of people maintain their own Nibbler forks (e.g. this one) and that might be the happier way to proceed.

Oh, yes! Maybe a top-level message on the README.md that endorses a specific fork could be a great way to gather community interest in one place


e.g.
image

Otherwise, https://github.com/rooklift/nibbler/forks currently lists 59 different forks so new contributors (such as myself) will simply end up pushing PRs into the main repo instead.

@yuzisee yuzisee changed the title Draw a faint shaded area around the eval line to indicate likeliness of a decisive result Draw a faint shaded areas around the eval line to indicate likeliness of a decisive result Sep 30, 2023
@yuzisee yuzisee changed the title Draw a faint shaded areas around the eval line to indicate likeliness of a decisive result Draw faint shaded areas around the eval line to indicate likeliness of a decisive result Sep 30, 2023
yuzisee added a commit to yuzisee/nibbler-dev that referenced this pull request Sep 30, 2023
rooklift#248 (comment)

Draw as a "WDL graph" instead
@yuzisee
Copy link
Author

yuzisee commented Sep 30, 2023

the codebase is a mess

Take it from someone who has been in Software Engineering for a long time… all codebases eventually become a mess. The more widely used a product is, the more of a mess its code becomes over time 🙂

Here's what a WDL version would look like

In any case, whether you ultimately decide to accept/merge this or not, I think you're right that the WDL version turned out to be both (a) more aesthetically pleasing, and (b) easier for a new user to understand — so the pull request has been updated into the WDL version.

@rooklift
Copy link
Owner

Hmm - is it simply drawing the Draw score as centred on the main line of the graph? I don't think that would be correct, e.g. in this image:

271743626-67f3e324-ade4-4007-bab8-10d9e2ea23ad

The infobox tells me the current position has Black win of only 15 out of 1000, but it's certainly drawn as if it's more than that.

I think to do it correctly you would need to actually use all 3 of the WDL numbers? (Or at least 2, the third can be inferred...)

@yuzisee
Copy link
Author

yuzisee commented Sep 30, 2023

is it simply drawing the Draw score as centred on the main line of the graph?

Yeah.

I think to do it correctly you would need to actually use all 3 of the WDL numbers? (Or at least 2, the third can be inferred...)

I did test various things, but ultimately ran into the same question that @Naphthalin describes when trying to read winrates directly from the engine:

I personally don't think that engine WDL (and the derived expected score) is particularly useful for analyzing human games, and requires some translation to be of use, which is mostly what LeelaChessZero/lc0#1791 is about. After working on that quite some time, I came to the conclusion that what is now the WDL_mu score type is the most (or possibly only) natural way of assigning a single number to a position

This way, the centipawn value coming from WDL_mu remains the "source of truth" for which player has the advantage and by how much, rather than the "W" and "L" values given by the engine.

If we were to calculate, by hand, the "translated" win% and loss% the same way as in #237 (comment) by treating WDL_mu eval as the source of truth,
image
it lands us around 23% for White : (64% draw) : 13% for Black which looks about right based on what's in the Kasparov-Karpov image and for the same reasons lines up well with the sigmoid squishing of #244

@Naphthalin
Copy link

Interesting work, I really like the shaded background!

I personally see two possibilities; one is displaying the actual WDL (and leaving it to the user to use adequate WDL sharpness through the contempt settings), the other is to do even more maths, and directly use the WDL_mu stddev formula 2 / (ln(1/W-1) + ln(1/L-1)) to draw a 95% interval around the eval line.

@yuzisee
Copy link
Author

yuzisee commented Oct 1, 2023

directly use the WDL_mu stddev formula 2 / (ln(1/W-1) + ln(1/L-1)) to draw a 95% interval around the eval line.

@Naphthalin We'd want to use log2 rather than ln in this case, since Nibbler is drawing the graph as

this.graph_y = 1 / (1 + Math.pow(0.5, cp / 100));

right?

@Naphthalin
Copy link

No, idea would be to calculate cp(WDL_mu) +/- 2 * stddev, calculate the y with the formula you posted, and shade the part in between.

@yuzisee
Copy link
Author

yuzisee commented Oct 2, 2023

No, idea would be to calculate cp(WDL_mu) +/- 2 * stddev, calculate the y with the formula you posted, and shade the part in between.

Got it, yes I see that here

https://github.com/LeelaChessZero/lc0/blob/076299b1f1ca21993b2c5e82ab3e80edb5367057/src/mcts/search.cc#L237-L239

And, would we want a 50% confidence interval (rather than 95%) so that it expresses something akin to:
  • shaded region: decisive outcome more likely than draw (i.e. >50%)
  • middle region: draw more likely than decisive outcome (i.e. <50%)

UPDATE:

Worked! It's looking good:

image

I've committed 107710b and will update the remaining screenshots

@yuzisee yuzisee force-pushed the dev-complexity-counterplay-graph branch from 3f345dd to 51c4687 Compare October 2, 2023 00:35
* Handle null values correctly
Directly use the `WDL_mu` stddev formula `2 / (ln(1/W-1) + ln(1/L-1))` to calculate the interval around the eval line.
@Naphthalin
Copy link

I just realized that optically it suggests the opposite of what your initial plots did (i.e. the draw area going to 100% in case of a draw), and if you stick to the idea of the shaded areas representing uncertainty about the eval, I think the currently black part should be shaded instead. There are probably some edge case issues with W or L being reported as 0 you need to deal with, e.g. by scaling WDL by 0.994 and adding 0.2% to each of W,D,L when calculating the line.

"I just realized that optically it suggests the opposite of what your initial plots did (i.e. the draw area going to 100% in case of a draw), and if you stick to the idea of the shaded areas representing uncertainty about the eval, I think the currently black part should be shaded instead"
yuzisee added a commit to yuzisee/nibbler-dev that referenced this pull request Oct 2, 2023
"There are probably some edge case issues with `W` or `L` being reported as 0 you need to deal with, e.g. by scaling WDL by 0.994 and adding 0.2% to each of W,D,L when calculating the line."
@yuzisee
Copy link
Author

yuzisee commented Oct 2, 2023

There are probably some edge case issues with W or L being reported as 0 you need to deal with, e.g. by scaling WDL by 0.994 and adding 0.2% to each of W,D,L when calculating the line.

Yup, incorporated bfe752a

Looks like there was a similar warning here as well https://github.com/LeelaChessZero/lc0/blob/076299b1f1ca21993b2c5e82ab3e80edb5367057/src/mcts/search.cc#L232-L236

I just realized that optically it suggests the opposite of what your initial plots did (i.e. the draw area going to 100% in case of a draw), and if you stick to the idea of the shaded areas representing uncertainty about the eval, I think the currently black part should be shaded instead.

Aha yes, committed 696c576


Here's what it all looks like now, altogether:
image

"There are probably some edge case issues with `W` or `L` being reported as 0 you need to deal with, e.g. by scaling WDL by 0.994 and adding 0.2% to each of W,D,L when calculating the line."
@yuzisee yuzisee force-pushed the dev-complexity-counterplay-graph branch from 2cf337d to bfe752a Compare October 2, 2023 15:38
@rooklift
Copy link
Owner

rooklift commented Oct 2, 2023

I doubt I'm going to accept this honestly. As I say I barely maintain Nibbler these days.

(and leaving it to the user to use adequate WDL sharpness through the contempt settings)
@yuzisee
Copy link
Author

yuzisee commented Oct 4, 2023

In the interest of reducing maintenance burden to the bare minumum, I'll leave everything in "display raw WDL" mode so everything can be kept as simple as possible, codewise.

Screenshots have been updated at the top, and it does add a nice aesthetic to Nibbler overall. Maybe this final simplified version is simple enough to be worth considering?

Anyway the final decision is still yours, of course — hopefully this helps in some small part to make the decision easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants