Are draws hard to predict?

Daniel Shawul · Post by **Daniel Shawul** » Tue Nov 27, 2018 3:03 pm

While trying to figure out why my neural network training sucks, I noticed the following:

If I use only won and lost games ( excluding draws ) the accuracy is about 91%, however, if I used
all games the accuracy is 35-40% !? This surprised me a bit but it maybe Ok because of either

a) Sniffing a slight advantage to assign it a W/L result, when the actual game result is either W/L, is an easy job

b) I do not have repetition counts ( don't have history input planes ) or fifty move counts ( just didn't feed them)
so this screws draw prediction badly.

c) I only have a value network that assigns a single value, however, it maybe be better to classify into more than two classes (W, D, L) using
softmax

d) and the Occam's razor, I have a BUG.

Maybe something else I didn't think off.

Any thoughts.

Daniel

Daniel Shawul · Post by **Daniel Shawul** » Tue Nov 27, 2018 8:01 pm

I didn't expect it but it seems (c) could be the reason.
Instead of using a single output node for the winning percentage, I used a softmax layer with 3 nodes
for W/D/L probabilities. Then I use cross_entropy loss with it, instead of the mean squared error loss I used with the single output node.
The accuracy has gone up to 85% from 37% -- even though I am not sure the performance will go up if at all.
This is very weird and I am not sure why just ouputting a single value for the value network is giving me 35% accuracy when used along with mean_squared_error.

Daniel

AlvaroBegue · Post by **AlvaroBegue** » Wed Nov 28, 2018 3:57 pm

I am not sure comparing the accuracy of these different setups is very informative, since they really are different things.

Having W/D/L probabilities has some practical advantages, though. For instance, you can assign a value for draws different than 0, and your engine will then make reasonable decisions about simplifying into a dead draw or keeping things complicated.

chrisw · Post by **chrisw** » Wed Nov 28, 2018 10:05 pm

Daniel Shawul wrote: ↑Tue Nov 27, 2018 3:03 pm While trying to figure out why my neural network training sucks, I noticed the following:

If I use only won and lost games ( excluding draws ) the accuracy is about 91%, however, if I used
all games the accuracy is 35-40% !? This surprised me a bit but it maybe Ok because of either

a) Sniffing a slight advantage to assign it a W/L result, when the actual game result is either W/L, is an easy job

b) I do not have repetition counts ( don't have history input planes ) or fifty move counts ( just didn't feed them)
so this screws draw prediction badly.

c) I only have a value network that assigns a single value, however, it maybe be better to classify into more than two classes (W, D, L) using
softmax

d) and the Occam's razor, I have a BUG.

Maybe something else I didn't think off.

Any thoughts.

Daniel

errrrm, accuracy is a tough one. and I think you are dependent in how the classifier software decides to calculate it. I think (caveat) a net with binary win/loss computes whether you’re on the right side of positive/negative only, so any result between 0.51 and 1.0 is “accurate”.
add draws to the balance and I assume the classifier decides there are “three” categories (and how does it know that? by looking at the desired outcomes, y-data, and that is variant with batch). Does it band each type equally? Are there equal numbers in each batch?
I’m treating accuracy as a relative value, and stopped caring what the absolute figure means. It’s a figure thrown at me by the AI coding of the guy who wrote the network software, who knows what he was thinking. Questions I ask are: Is it improving? Does it change? Is there a sudden change? And so on. There’s enough complexity in this subject without worrying about every self critical evaluation the network software throws out while its being verbosely helpful. Put it down to banding problems, and worry about something else. my 2c.

Daniel Shawul · Post by **Daniel Shawul** » Wed Nov 28, 2018 11:23 pm

chrisw wrote: ↑Wed Nov 28, 2018 10:05 pm errrrm, accuracy is a tough one. and I think you are dependent in how the classifier software decides to calculate it. I think (caveat) a net with binary win/loss computes whether you’re on the right side of positive/negative only, so any result between 0.51 and 1.0 is “accurate”.
add draws to the balance and I assume the classifier decides there are “three” categories (and how does it know that? by looking at the desired outcomes, y-data, and that is variant with batch). Does it band each type equally? Are there equal numbers in each batch?
I’m treating accuracy as a relative value, and stopped caring what the absolute figure means. It’s a figure thrown at me by the AI coding of the guy who wrote the network software, who knows what he was thinking. Questions I ask are: Is it improving? Does it change? Is there a sudden change? And so on. There’s enough complexity in this subject without worrying about every self critical evaluation the network software throws out while its being verbosely helpful. Put it down to banding problems, and worry about something else. my 2c.

That helped me to understand what is going on! I think what is happening is that if you have a single output node, like you said 0.51 is a win, 0.49 is a loss. Nothing like 0.45<x<0.55 is a draw, so if you have draw results in your database, the accuracy plummets with every draw. The fact that the accuracy shoots up to 90% when i remove them confirms it.

Keras assume binary_crossentropy (W,L) accuracy metrics when you have a single node output, so it doesn't matter whether you interpet the output as a WDL. But when you explicitly classify the output as WDL, it calculates the accuracy metrics correctly. In the end, the issue is just cosmetics.
https://github.com/keras-team/keras/blo ... metrics.py

chrisw · Post by **chrisw** » Thu Nov 29, 2018 12:10 am

Daniel Shawul wrote: ↑Wed Nov 28, 2018 11:23 pm
chrisw wrote: ↑Wed Nov 28, 2018 10:05 pm errrrm, accuracy is a tough one. and I think you are dependent in how the classifier software decides to calculate it. I think (caveat) a net with binary win/loss computes whether you’re on the right side of positive/negative only, so any result between 0.51 and 1.0 is “accurate”.
add draws to the balance and I assume the classifier decides there are “three” categories (and how does it know that? by looking at the desired outcomes, y-data, and that is variant with batch). Does it band each type equally? Are there equal numbers in each batch?
I’m treating accuracy as a relative value, and stopped caring what the absolute figure means. It’s a figure thrown at me by the AI coding of the guy who wrote the network software, who knows what he was thinking. Questions I ask are: Is it improving? Does it change? Is there a sudden change? And so on. There’s enough complexity in this subject without worrying about every self critical evaluation the network software throws out while its being verbosely helpful. Put it down to banding problems, and worry about something else. my 2c.
That helped me to understand what is going on! I think what is happening is that if you have a single output node, like you said 0.51 is a win, 0.49 is a loss. Nothing like 0.45<x<0.55 is a draw, so if you have draw results in your database, the accuracy plummets with every draw. The fact that the accuracy shoots up to 90% when i remove them confirms it.

Keras assume binary_crossentropy (W,L) accuracy metrics when you have a single node output, so it doesn't matter whether you interpet the output as a WDL. But when you explicitly classify the output as WDL, it calculates the accuracy metrics correctly. In the end, the issue is just cosmetics.
https://github.com/keras-team/keras/blo ... metrics.py

oh, I hadn’t worked out how to explicitly tell it WDL, I figured it was treating value as continuous via MSE and then deciding for itself it was binary or whatever by reading the y-values. Because, when I adapted to continuous value, the accuracy figure dropped to zero (I assumed that was its way of telling me it couldn’t treat value as binary anymore).

How did you tell it to expect 0, 0.5, 1.0 only? Or is it working that out for itself?

Daniel Shawul · Post by **Daniel Shawul** » Thu Nov 29, 2018 12:41 am

chrisw wrote: ↑Thu Nov 29, 2018 12:10 am oh, I hadn’t worked out how to explicitly tell it WDL, I figured it was treating value as continuous via MSE and then deciding for itself it was binary or whatever by reading the y-values. Because, when I adapted to continuous value, the accuracy figure dropped to zero (I assumed that was its way of telling me it couldn’t treat value as binary anymore).

How did you tell it to expect 0, 0.5, 1.0 only? Or is it working that out for itself?

Instead of prediction the winning percentage (i.e. a combined single value of W+D/2), estimate win, loss, draw probabilities separately.
I replaced my final layer to estimate three probabilites for W/D/L:

Code: Select all

output = Dense(3, activation='softmax', name='value')(x)

instead of the single value ('continious') estimator.

Code: Select all

output = Dense(1, activation='sigmoid', name='value')(x)

You will also have to change your y values to categorical i.e. instead of 1,0.5,0 for WDL you pass [1, 0, 0], [0, 1, 0] and [0, 0, 1]

With the categorical method method, you have to combine win/draw probability as W+D/2 to get a single evaluation value for a position.
This method also has some advantages -- though not significant:
- As Alvaro pointed out the method has a "draw model" similar to what bayeselo has.
- You may choose a more adventurous position that would lead to more wins/losses e.g.
Choosing a position with WDL=60/20/20 instead of a positions with WDL=50/40/10 though both have same winning percentage...

chrisw · Post by **chrisw** » Thu Nov 29, 2018 12:57 am

Daniel Shawul wrote: ↑Thu Nov 29, 2018 12:41 am
chrisw wrote: ↑Thu Nov 29, 2018 12:10 am oh, I hadn’t worked out how to explicitly tell it WDL, I figured it was treating value as continuous via MSE and then deciding for itself it was binary or whatever by reading the y-values. Because, when I adapted to continuous value, the accuracy figure dropped to zero (I assumed that was its way of telling me it couldn’t treat value as binary anymore).

How did you tell it to expect 0, 0.5, 1.0 only? Or is it working that out for itself?
Instead of prediction the winning percentage (i.e. a combined single value of W+D/2), estimate win, loss, draw probabilities separately.
I replaced my final layer to estimate three probabilites for W/D/L:
Code: Select all
output = Dense(3, activation='softmax', name='value')(x)
instead of the single value ('continious') estimator.
Code: Select all
output = Dense(1, activation='sigmoid', name='value')(x)
You will also have to change your y values to categorical i.e. instead of 1,0.5,0 for WDL you pass [1, 0, 0], [0, 1, 0] and [0, 0, 1]

With the categorical method method, you have to combine win/draw probability as W+D/2 to get a single evaluation value for a position.
This method also has some advantages -- though not significant:
- As Alvaro pointed out the method has a "draw model" similar to what bayeselo has.
- You may choose a more adventurous position that would lead to more wins/losses e.g.
Choosing a position with WDL=60/20/20 instead of a positions with WDL=50/40/10 though both have same winning percentage...

ah! right. I'm not at that stage (experimenting with the easier and faster Othello, to see what works at doesn't, and Othello very very rarely is a draw). Will think on the idea of the three categories, firstly wondering about what the NN would mean by "draw possibility" ..... for example, Tal might say about a position, "you can sacrifice here, there's definitely a draw by perpetual, but it might also be a win, but is unclear". Would your category say the same as Tal, in the same sense of meaning? L=0%, D=100%, W=some unknown value%
I wonder. Yes, no, or something else?
Or if you don't sacrifice, what would be the W,D,L figures be and mean? And then how to decide whether or not to sacrifice ....? Difficult, I will think on it ...

jp · Post by jp » Fri Nov 30, 2018 1:03 pm

What do Leela etc. do re. W/D/L vs. W/L?

chrisw · Post by **chrisw** » Fri Nov 30, 2018 2:40 pm

jp wrote: ↑Fri Nov 30, 2018 1:03 pm What do Leela etc. do re. W/D/L vs. W/L?

treat as one continuous value, I think, last time I looked. value is not treated as categorical, the trained network output is a value between 0 and 1, representing win probablity.

Are draws hard to predict?

Are draws hard to predict?

Re: Are draws hard to predict?

Re: Are draws hard to predict?

Re: Are draws hard to predict?

Re: Are draws hard to predict?

Re: Are draws hard to predict?

Re: Are draws hard to predict?

Re: Are draws hard to predict?

Re: Are draws hard to predict?

Re: Are draws hard to predict?