I have searched and read to update knowledge, studied some book formats / user guide such as Polyglot one. I have been considering to write a total new one (as an opening source project) or modify Polyglot to meet what I need.
So far I have not looked into any code since it’s hard work (I’m lazy) and want to keep my brain out of being affected by other codes/ designs. Thus I still have some gaps / misunderstandings, regarding opening books in general and Polyglot in particular as followings:
1) Data of one side:
Suppose I want to build an opening book for black side. Should I store white positions or discard them all?
In my previous implementation I keep data of both sides for being easy to search/verify. But now I think that’s redundant. Is there something I missed?
2) Flip board:
In my previous implementation if I cannot find out a new opening move, I will flip current board horizontally (Xiangqi is horizontally symmetrical) and vertically then check again. For chess do you check with vertical flipping? Any drawback?
3) From CPW (https://chessprogramming.wikispaces.com/CTG) I have read that ChessBase builds opening books from white side only and flip the board for black side (looks similar to my question 2). However, without real data of the black side, how it could hit? For example, after the 1st move e4, if we flip the board, the board become as below with white side in turn. I don’t think we have any board starts like that.
[d]
4) Weight size:
Polyglot uses 16 bit (my previous one used 8 bit). Look like I will use 16 or 8 bits too. My dilemma is what or how to store data for weights. If I calculate and store percentages, it is fine for any number of bits. However if later I let users to add few more opening lines into an existing book I have no clue to change those percentages to right values. If I store number of hits - it is easier for both creating and adding later but it may be overflowed when creating huge opening books - in that case the app cannot know which move are better than others since all overflowing values will be cut into a constant. I may do some tricks, say divide them for a constant number but still see some drawbacks. Any advice?
5) Searching:
In my previous implementation I did binary search based on sorted positions. That required to search whole data every time. Any tips/tricks to reduce search scope?
6) Moves:
My previous one did not store moves in data. Only positions in form of Zobrist hash key. From a given position I generate and make all legal moves, search new positions from books so I can get which moves are still in opening and their weights. I don’t understand why Polyglot, ChessBase store moves in their opening positions?
Thanks in advance for any help

