Search found 39 matches

by trulses
Thu May 09, 2019 2:43 pm
Forum: Computer Chess Club: Programming and Technical Discussions
Topic: SL vs RL
Replies: 8
Views: 2241

Re: SL vs RL

RL has the same problem. Weak moves of early self-play games are rapidly forgotten. In Go, the alphazero method has a very severe problem with ladders. It is similar to what you describe. A fundamental flaw of the Alpha Zero approach is that it learns only from games between strong players. When th...
by trulses
Thu May 09, 2019 2:36 pm
Forum: Computer Chess Club: Programming and Technical Discussions
Topic: SL vs RL
Replies: 8
Views: 2241

Re: SL vs RL

... Policy trained SL gave the king move evasions normal probabilities, 0.25, 0.12 sort of values. But the obvious Q capture (which would be outright winning) gets a probability of 0.005 or whatever. This situation happened enough times for an investigation..... I think it’s because SL games are so...
by trulses
Thu May 09, 2019 2:19 pm
Forum: Computer Chess Club: Programming and Technical Discussions
Topic: Training using 1 playout instead of 800
Replies: 12
Views: 2171

Re: Training using 1 playout instead of 800

... I actually decided to try policy gradient when i realized that training policy head with the loosing side's moves just doesn't make sense. I think losses carry a lot more signal than a draw does. If you just play randomly a draw is much more likely than any other outcome. Going back to the grad...
by trulses
Sun Apr 28, 2019 3:08 pm
Forum: Computer Chess Club: Programming and Technical Discussions
Topic: Training using 1 playout instead of 800
Replies: 12
Views: 2171

Re: Training using 1 playout instead of 800

Daniel: ... Good point. I thought I lowered it enough when reducing from 0.25 to 0.15 but setting it to 0 (turning it off) all in all seems to be better already. The noise is there to ecncourag finding of bad looking moves that turn out to be good later, but i guess that can wait till the basic stuf...
by trulses
Fri Apr 26, 2019 7:29 pm
Forum: Computer Chess Club: Programming and Technical Discussions
Topic: Training using 1 playout instead of 800
Replies: 12
Views: 2171

Re: Training using 1 playout instead of 800

... b) I train the policy head to match the actual game result. This is quite different from the AlphaZero algorithms because the training of the policy head is by imitation irregardles of the outcome. So I do not train policy head from moves made by the loosing side. WDL are weighted by 1, 0.5 and...
by trulses
Tue Dec 18, 2018 8:38 pm
Forum: Computer Chess Club: Programming and Technical Discussions
Topic: Policy training in Alpha Zero, LC0 ..
Replies: 26
Views: 3241

Re: Policy training in Alpha Zero, LC0 ..

I agree. the legal moves list is an attack map, and because of the way it is encoded, a weighted attack map, only for one side though. Unless you're talking about the policy label, you're not discriminating "bad" vs "good" moves by just providing the legal moves so I'm not sure what you mean by wei...
by trulses
Tue Dec 18, 2018 7:58 pm
Forum: Computer Chess Club: Programming and Technical Discussions
Topic: Policy training in Alpha Zero, LC0 ..
Replies: 26
Views: 3241

Re: Policy training in Alpha Zero, LC0 ..

I agree. the legal moves list is an attack map, and because of the way it is encoded, a weighted attack map, only for one side though. Unless you're talking about the policy label, you're not discriminating "bad" vs "good" moves by just providing the legal moves so I'm not sure what you mean by wei...
by trulses
Tue Dec 18, 2018 6:38 pm
Forum: Computer Chess Club: Programming and Technical Discussions
Topic: Policy training in Alpha Zero, LC0 ..
Replies: 26
Views: 3241

Re: Policy training in Alpha Zero, LC0 ..

chrisw wrote:
Tue Dec 18, 2018 6:00 pm
...

Just a passing thought, but isn’t this breaching the zero-rule?
I think knowing which moves are legal fall under "being given perfect knowledge of the game rules".
by trulses
Tue Dec 18, 2018 2:17 pm
Forum: Computer Chess Club: Programming and Technical Discussions
Topic: Policy training in Alpha Zero, LC0 ..
Replies: 26
Views: 3241

Re: Policy training in Alpha Zero, LC0 ..

The label for the policy head is the visit count frequency from the tree search (potentially with a temperature).