TalkChess.com

Posted: **Sat Mar 16, 2019 1:04 am**

Daniel Shawul wrote: ↑Fri Mar 15, 2019 4:24 pm I have had a similar frustrating experience with reinforcement learning before i deemed it too big a project for one man.

What input planes do you use besides piece placement? I use attack tables and piece counts as additional inputs and they
seem to help accelerate learning.

Do you use replay buffer in your training and how much does it help to stabilize ? I suspect this was one of the key things
I missed when i was doing RL. I was also using value net only (no policy) at the time that makes matters worse. The 6x64 value net
was not improving after 300k games or so.

Good luck.
Daniel

It uses attack tables as well. My initial attempt used 12 planes for the pieces, 12 for the squares attacked by each piece and additional planes for side to move, castling and en passant. This worked up to a point. It was reaching around 2300-2400 and won some games against 2500 engines but then I noticed it started falling for some very easy tactics, and falling for them even more as training progressed. In particular tactics involving pinned pieces. For example if a piece is being protected, and the protecting piece is pinned then the first piece is actually hanging but the (policy) net wasn't able to 'see' this (since the attack planes were 'saying' it was protected).
You might think that with more training it would learn that when a piece is pinned its attack squares are nullified but on the contrary as training progresses it simply learns to trust whatever custom inputs you give it more and more.

I don't want say exactly what changes I made but definitely you will need to think carefully what additional data you give the net and in particular if there are any exceptional situations where the data becomes misleading.

I'm curious what you meant by 'piece counts' as one of your inputs. Does the main convolution body receive it as a plane or do you enter it after as a single input?

I do use a replay buffer. Each generation consists of 2048 positions (approx 19-20 games). For training I select random positions from the previous 64 generations. In other words I select from the most recent 1200-1300 games (this contrasts with Leela / AlphaZero where they select from the previous million or so games).

I've only ever used a replay buffer. If you didn't, what were you using? What instability issues were you facing?

I would recommend that you try to get something working with only the raw board as input. Once you've got the framework and main techniques sorted then you can experiment with extra inputs.

Incidentally on one of my training runs I accidentally deleted the call to fill the attack planes and only realised it much later. And it had no issues learning. So even if it's a one man job, using the board input only should still work.

Alex

Posted: **Sat Mar 16, 2019 2:00 pm**

Alexander Lim wrote: ↑Sat Mar 16, 2019 1:04 am I'm curious what you meant by 'piece counts' as one of your inputs. Does the main convolution body receive it as a plane or do you enter it after as a single input?

I feed 5 single inputs of material differences to the beginning of the value head. So the convolution steps is skipped.
That seems to help initial learning greatly. I still don't understand how a resnet with convolution is able to count pieces precisely -- what kind of filter can do that.

I do use a replay buffer. Each generation consists of 2048 positions (approx 19-20 games). For training I select random positions from the previous 64 generations. In other words I select from the most recent 1200-1300 games (this contrasts with Leela / AlphaZero where they select from the previous million or so games).

I've only ever used a replay buffer. If you didn't, what were you using? What instability issues were you facing?

If you haven't used it, you can potentially face overfitting issues either to new or old data. Learning is more stable with it.

Posted: **Sat Mar 16, 2019 2:11 pm**

Alexander Lim wrote: ↑Sat Mar 16, 2019 1:04 am
Daniel Shawul wrote: ↑Fri Mar 15, 2019 4:24 pm I have had a similar frustrating experience with reinforcement learning before i deemed it too big a project for one man.

What input planes do you use besides piece placement? I use attack tables and piece counts as additional inputs and they
seem to help accelerate learning.

Do you use replay buffer in your training and how much does it help to stabilize ? I suspect this was one of the key things
I missed when i was doing RL. I was also using value net only (no policy) at the time that makes matters worse. The 6x64 value net
was not improving after 300k games or so.

Good luck.
Daniel
It uses attack tables as well. My initial attempt used 12 planes for the pieces, 12 for the squares attacked by each piece and additional planes for side to move, castling and en passant. This worked up to a point. It was reaching around 2300-2400 and won some games against 2500 engines but then I noticed it started falling for some very easy tactics, and falling for them even more as training progressed. In particular tactics involving pinned pieces. For example if a piece is being protected, and the protecting piece is pinned then the first piece is actually hanging but the (policy) net wasn't able to 'see' this (since the attack planes were 'saying' it was protected).
You might think that with more training it would learn that when a piece is pinned its attack squares are nullified but on the contrary as training progresses it simply learns to trust whatever custom inputs you give it more and more.

I don't want say exactly what changes I made but definitely you will need to think carefully what additional data you give the net and in particular if there are any exceptional situations where the data becomes misleading.

I'm curious what you meant by 'piece counts' as one of your inputs. Does the main convolution body receive it as a plane or do you enter it after as a single input?

I do use a replay buffer. Each generation consists of 2048 positions (approx 19-20 games). For training I select random positions from the previous 64 generations. In other words I select from the most recent 1200-1300 games (this contrasts with Leela / AlphaZero where they select from the previous million or so games).

I've only ever used a replay buffer. If you didn't, what were you using? What instability issues were you facing?

I would recommend that you try to get something working with only the raw board as input. Once you've got the framework and main techniques sorted then you can experiment with extra inputs.

Incidentally on one of my training runs I accidentally deleted the call to fill the attack planes and only realised it much later. And it had no issues learning. So even if it's a one man job, using the board input only should still work.

Alex

I think Daniel is using five differential piece counts (#whitepawns - #blackpawns) and so on for each piece type. Not as planes, but concatenated with the lowest value head dense layer outputs and given as input to the next layer.
Agree about extra inputs, I had a similar case where I forgot to include, and found practically no difference. Nets are able to work things out for themselves, it seems. Inputs can help to kickstart training, but problem is then you are stuck with their effect, and later have to rely on hopium to get past the initial desirable learnt feature.

Everybody appears to have the same idea, that with more and more training, and input selection and and and, NNs become more and more “accurate”, in the sense that Sf10 evaluation function is accurate. But if we consider other problems “solved” by convolutionary nets, we don’t see 100% accuracy. The famous cats and dogs categorization problem is solved around 98, 99%, maybe more with massive training and tricks, but good nets get it wrong perhaps 1% of the time, let’s say. Calling a cat is a dog or vice versa. Try taking the eval function of SF and inverting its sign 1 times out of 100, and see what chaos AB search makes out of that. Or 5 times out of a 100. It strikes me as highly unlikely, well also experimentally observed, that chess NNs are not making evaluation and policy “errors” at a rate higher than the way easier cats and dogs.
So, what am I saying? Your, our, networks are learning to generally generalise, they’re not going to learn specifics other than in a general sort of way, and their outputs are going to be generally in the right sort of region. Accuracy, or specifics, we are not going to get. Hopeless “errors” we will continue to get. Magically the averaging capability of pseudo-MCTS kind of resolves all this inaccurate generalisation with added (occasional) errors, and usually, but as we see, not always, plays strong.
Back to your original point about inputs, apart from accelerating early stage learning, it may not make much ultimate difference if you just set your target at “wide generalisation skill” and don’t worry about either accuracy or specifics. I might change my mind in a few more weeks, of course.

Posted: **Fri Apr 12, 2019 3:10 am**

Alexander Lim wrote: ↑Fri Mar 15, 2019 7:16 am
Been playing games online?
Show me how! Chess Fighter would love to play online .
Seriously, Chess Fighter has not played online and I don't (yet) know how to implement that kind of stuff. Though I hope to get some help through this forum.

You can make a bot account on lichess and run your engine with this bridge https://github.com/careless25/lichess-bot

TalkChess.com

Chess Fighter Neural Net

Re: Chess Fighter Neural Net

Re: Chess Fighter Neural Net

Re: Chess Fighter Neural Net

Re: Chess Fighter Neural Net