Sorry for the stupid question, but how do you input the position? (I got a bit lost in the code...

And another question, it's probably a bad idea but could one also let the NN learn the rules by itself? By letting it try random square combinations and have the cost function e.g. X - legal moves played at the start? (and by minimizing the cost function it learns to only play legal moves)
That might be slow but like that one doesn't force it to take moves in whatever format and instead let it decide itself how it wants to generate them. (and in what order if that makes sense)