Carnivor, another unheard of chess engine!

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Carnivor, another unheard of chess engine!

Post by Michael Sherwin »

The problem with modern processors today is that they require a person to have the equivelant of a masters degree to compile for them. :?

The RomiChess team now has an opening for a master compilerator! :lol:
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Tony Thomas

Re: Carnivor, another unheard of chess engine!

Post by Tony Thomas »

Michael Sherwin wrote:The problem with modern processors today is that they require a person to have the equivelant of a masters degree to compile for them. :?

The RomiChess team now has an opening for a master compilerator! :lol:
Why not use DC, JA or BH?
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: Carnivor, another unheard of chess engine!

Post by Michael Sherwin »

Tony Thomas wrote:
Michael Sherwin wrote:The problem with modern processors today is that they require a person to have the equivelant of a masters degree to compile for them. :?

The RomiChess team now has an opening for a master compilerator! :lol:
Why not use DC, JA or BH?
Which one is the best programmer? They would have to realign variables as needed and reorder nested if constructs and other master guru type stuff inorder to be considered a master compilerator! :D

But, to be serious, it would be nice if one of them were to do a PGO compile for Romi when I have settled upon my next release version! 8-)
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
User avatar
hgm
Posts: 28383
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: Carnivor, another unheard of chess engine!

Post by hgm »

I guess optimization of the memory allocation is really a burden that should be placed on the programmer. You have to be aware of the nastities of the hardware, though. Esecially on very cramped hardware, like Pentium 4 wrt L1 size (8KB), or AMD chips wrt number of L1 cache ways (only 2-way).

Had I known in advance how critical memory layout on some processors is, and how much of a mess likers tend to make from this, I would have put all globals in a single struct g, at the expense of having to clutter my source code by prefixing 'g.' to all identifiers. Perhaps I am going to do a re-write that actually does this, as it would make conversion to SMP easier as well.

On AMD chips, with their 2-way caches, it also seems really important to keep the move stack out of the way of the globals, so that move stack and globals can share one cache way, and the system stack with its local variables can use the other cache way. With ~40 moves per position, and 4 bytes per move descriptor, each ply takes 160 bytes of move-stack space. That makes it rather unlikely the move stack 'wraps around' as teh cahe ways on AMD chips are large (32KB per way). So 60 ply still only takes 10KB of move stack, fitting well in a single way. But one must make sure that it doesn't collide with the globals, so it should be declared as part of the g structure.

On the P4 the L1 ways measure only 2KB, so wrap-around of the move stack through a cache way is unavoidable. A 12-ply search already spans an entire cache way, and thus must overlap the globals at some depth. Especially if the globals themselves already (nearly) span a full cache way. The best solution here might be to incorporate the move stack amongst the local variables, so that it resides in the stack frame on the system stack (and thus cannot collide with other variables on that system stack). Problem is that you have to keep track of the worst case. And 64 moves (256 bytes in Joker) already seems rather tight for that. With 256 bytes of other locals that would make a stack frame 512 bytes, or wrap-around after 4 ply in the P4 L1 cache. That would mean all locals of the last 4 ply of your tree remain cached, and only nodes that do searches deeper than 4 ply in some of their branches would suffer L1 misses on their local variables. But as this includes QS plies, this might still be a substantial fraction of the nodes, as QS trees are not very big...

What a crappy machine!