Scaling from FGRL results with top 3 engines

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

Dann Corbit
Posts: 12777
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Scaling from FGRL results with top 3 engines

Post by Dann Corbit »

Lyudmil Tsvetkov wrote: I repeat my question: how does lower BF perform better at LTC?
Like I said, a mathematical ignorant.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
Dann Corbit
Posts: 12777
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Scaling from FGRL results with top 3 engines

Post by Dann Corbit »

You seem to think that BF is relatively unimportant.

I suggest you understand what it actually means.

There is only such thing as an average branching factor, because a program may fail high on a particular node and take a very long time on rare occasions.

Suppose you are playing chess with someone. That person can visualize two moves ahead.
You have no problem visualizing 7 moves ahead.

Who is going to win?

That is how branching factor works.

If you take a huge number of nodes to get to the next ply, you will also take a huge amount of time.
If your opponent can see ahead quickly, this is an obvious advantage.

Now, there is danger in pruning too much. But there is far more danger in pruning too little.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
Cardoso
Posts: 363
Joined: Thu Mar 16, 2006 7:39 pm
Location: Portugal
Full name: Alvaro Cardoso

Re: Scaling from FGRL results with top 3 engines

Post by Cardoso »

Lyudmil Tsvetkov wrote:
Cardoso wrote:
to do move ordering, you are using hash moves, where the score is based on eval; killer moves are based on eval
Lyudmil, the hashmove and it's associated score, comes not from the eval, but from a search that has an eval, same as the killer moves, counter moves, followupmoves you name it. if you simple assigned the eval result to the hashmove and to killers/countermoves/followupmoves etc that would hurt the engine badly.
Do you think Stockfish is so good because of eval? Just drop the SF eval and use a simplistic material only eval, and play some games against it, and you will finally understand the power of the search and the search also regulates Branching Factor.
To me SF success is 90% search and 10% eval!
Before making so assertive comments I think you should take a course on chess programming, steadily and gradually understand and implement the basics of an alphabeta searcher.

Also drop the "expert" attitude, as you are not one, as you are not even a student. Also that bit of overbearing pride is not heathy to you and those around you.
and you are a BS.

why are you teaching me?

SF 90% search, 10% eval. are you certain?

drop QS and SF search will lead you nowhere with primitive eval. =less than 1000 elo.

if you have not understood by now eval and search are completely inseparable, than what programmer are you?

a checkers programmer into a chess forum? :) :)

upgrade to chess, and then we will talk. :)

you say: "the hashmove and it's associated score, comes not from the eval, but from a search that has an eval".

what is the difference, man?

I guess you are drinking more than me.
and you are a BS.
Good argument! I don't have anything to counter that.

No I wouldn't drop my QS, since QS a special kind of search and tightly connected to the main search. So QS is part of those 90% I mentioned.
If I droped the QS then I would have to drop the root search, and several mini searches I have in my engine like a mini search for promotion finding or a mini search for passed pawns finding or a material only search I have to include on a certain algorithm I implemented.
This is not an eval, these are searches.
you say: "the hashmove and it's associated score, comes not from the eval, but from a search that has an eval".
what is the difference, man?
As I said you should try to program an alphabeta searcher on your own, starting simple and gradualy implementing several components. Any doubts will be answered gladly here.
When you do this you will understand the difference between the eval and the search (that calls the eval naturally). And also have an idea on the weight the search has (comparing to the eval) on the overall strength of an engine .
As I said, search is the main reason chess programs works so well. It can be said it is really the brains of a chess engine. Of course we need an eval and we can't live without it, but it has a much lower role in the overall strength of the engine. That's why I suggest to test SF with a material eval only, against yourself. Now there could be one case where eval could have an heavier role in the engine's strength, if you have an FCPGA/Asic searcher + eval, then you could have a massive eval implemented without the fear of taking too much time to be computed, it would just take space on the hardware and be computed fast. But this is a bit beyond the scope of this forum. I certainly though of it (FCPGA) for checkers, but it is very expensive for the final consumer and not justifiable for the number of clients
I have on the variant I program.

Anyway if you think the eval is of so much importance maybe you could implement your own eval in SF and test it, but this time against SF standard version. As a chess player (maybe even an expert) you sure know a lot of things that are not implemented on the current version of SF's eval.

Anyway I really don't think you wan't to discuss in a spirit of learning, that overbearing pride of yours causes barriers between people and hurts the learning process, it also causes putting words in other's people's mouths, I've seen this behaviour more than I wish I had in several forums.
Cardoso
Posts: 363
Joined: Thu Mar 16, 2006 7:39 pm
Location: Portugal
Full name: Alvaro Cardoso

Re: Scaling from FGRL results with top 3 engines

Post by Cardoso »

Lyudmil Tsvetkov wrote:
Uri Blass wrote:
Lyudmil Tsvetkov wrote:
Dann Corbit wrote:
Lyudmil Tsvetkov wrote:
Dann Corbit wrote:It is also true that better evaluation will reduce branching factor, principally by improvement in move ordering (which is very important to the fundamental alpha-beta step).

There are other things that tangentially improve branching factor like hash tables and IID.

It is also true that pure wood counting is not good enough. But examine the effectiveness of Olithink, which has an incredibly simply eval. It has more than just wood, but an engine can be made very strong almost exclusively through search. I guess that grafting Stockfish evaluation into a minimax engine you will get less than 2000 Elo.

I guess that grafting Olithink eval into Stockfish you will still get more than 3000 Elo.

Note that I did not test this, it is only a gedankenexperiment.
so, no search without eval.

I guess you are grossly wrong about both the 2000 and 3000 elo mark.

wanna try one of the 2?

Olithink eval into SF will play something like 1500 elo, wanna bet? :)

I guess it is time to change gedankenexperiment for realitaetsueberpruefung... :)
From CCRL 40/40;
216 OliThink 5.3.2 64-bit 2372 +19 −19 48.3% +12.5 25.6% 1011

With a super simple eval and a fairly simple search, it is already 2372.
Adding the incredible, sophisticated search of Stockfish will lower the eval by more than 872 points?
of course, it is all about tuning.

we are not speaking here of downgrading SF, leaving all its search and using just a dozen basic eval terms, in which case SF will still be somewhat strong, but of patching an entirely alien eval onto SF search.

as the eval and search will not be tuned to each other, you will mostly get completely random results.
Based on my experience with a different engine in the past(strelka) it is not the case.
I changed strelka's evaluation to a simple piece square table and was surprised to see that it is very strong at least in fast time control and beat engine like Joker in most games when joker has near 2300 CCRL rating.

Note that strelka's piece square table was simply the piece square table from the strelka's code that is not optimized and it was clear from watching the games that strelka could be better by increasing the value of the knight and bishop so it knows that knight and bishop are more than rook and a pawn.

In the original code strelka has imbalance table but I throwed all the code except piece square table.

I guess that strelka could get at least 2400 CCRL rating with piece square table evaluation at 40/4 time control and I have no reason to believe that it is going to be worse for stockfish inspite of the fact that stockfish is not tuned for olithink's evaluation(strelka is also not tuned for simplified strelka's evaluation that does not include pawn structure mobility and king safety).
don't believe that, you did something wrong.

psqt, no mobility, no piece values, and playing at 2400?

don't buy it.

maybe Joker has just very basic QS and Strelka(clone too?) a very refined one, so this might explain partially a result trend, but QS is not search proper.

disable QS for both engines, repeat the test and, if Strelka achieves more than 1500 elo, than I am damned.

btw., I have watched a range of Strelka games, and something strikes the knowledgeable watcher already at first glance: every 3rd or 4th move, Strelka plays completely random moves, obviously the result of random eval changes.

the chess impression is not appealing, I swear solemnly.
don't believe that, you did something wrong.
psqt, no mobility, no piece values, and playing at 2400?
don't buy it.
Lyudmil, you have to be a chinese to better understand a chinese, the culture, the language, if you are a native speaker you understand much better another chinese.
This is just to say to you you must stop contradicting programmers who have years of implementation and testing and hard thinking on the several algorithms of a chess engine. Please check my last post, and accept my suggestion of being a chess engine programmer yourself, believe me your horizons will expand immensely.
The other problem you have is your attitude with people, as if you know it all, been everywhere, done everything. Please stop it, being a chess player isn't the samething as being an alphabeta searcher programmer.
Dann Corbit
Posts: 12777
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Scaling from FGRL results with top 3 engines

Post by Dann Corbit »

Lyudmil Tsvetkov wrote: {snip}
I don't know what you are talking about. move ordering passing 95% correct? what does that mean?
That means that the fail high node does not change.
no chess engine as of today has still passed even the 50% correct mark, as modern top engines, SF and Komodo, never guess more than 1 oout of every 3 best moves.
Engines are now far better than humans, so it is silly to criticize move choices that would humble Carlsen.
so, talking about fluff is simply ridiculous.

branching factor, branching factor... that is just a measure, man, nothing more.
Mathematical ignorance at its peak. One hundred years from now, school children in Junior high school will ogle at this post with wide eyes and giggle.
you change things in eval and search, and if the branching factor happens to
decrease, that is fine, but it is not the branching factor that is responsible for success, rather the implemented changes.

do you know what a measure is?
Success is measured in Elo for game play. Nothing else matters.

Now, both you and I care about analysis, and that is another matter.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
Uri Blass
Posts: 10801
Joined: Thu Mar 09, 2006 12:37 am
Location: Tel-Aviv Israel

Re: Scaling from FGRL results with top 3 engines

Post by Uri Blass »

Cardoso wrote:
Uri Blass wrote:
Cardoso wrote:
to do move ordering, you are using hash moves, where the score is based on eval; killer moves are based on eval
Lyudmil, the hashmove and it's associated score, comes not from the eval, but from a search that has an eval, same as the killer moves, counter moves, followupmoves you name it. if you simple assigned the eval result to the hashmove and to killers/countermoves/followupmoves etc that would hurt the engine badly.
Do you think Stockfish is so good because of eval? Just drop the SF eval and use a simplistic material only eval, and play some games against it, and you will finally understand the power of the search and the search also regulates Branching Factor.
To me SF success is 90% search and 10% eval!
Before making so assertive comments I think you should take a course on chess programming, steadily and gradually understand and implement the basics of an alphabeta searcher.

Also drop the "expert" attitude, as you are not one, as you are not even a student. Also that bit of overbearing pride is not heathy to you and those around you.
I think that it will be an interesting experiment to have 2 versions of stockfish.

version A has the same search but only simple piece square table evaluation(no mobility no pawn structure and no king safety)

version B has the same evaluation but only simple alpha beta search.

Note that I expect version A to win convincingly but it may be interesting if A also scales better than B.

Note that I expect both of them to scale significantly worse than stockfish.

It means that
I guess we may see something like the following
1)Stockfish 0.1 seconds per move is the same level as simple stockfish 10 seconds per move.

2)Stockfish 1 seconds per move is at the same level as simple stockfish 300 seconds per move.
That was not exactly the experiment I suggested, I said:
and play some games against it
meaning Lyudmil to try some games against SF himself with only material eval.
I meant to say he would find SFmat can play amazing chess against humans.
I do not believe with only material evaluation it is going to play well(unlike only piece square table).

With only material evaluation it can play stupid moves in the opening like 1.a4 because it does not lose material.

It can probably still beat weak humans but I believe that if you want it to beat strong human players(let say level of 2300 fide rating) then you need at least more evaluation like piece square table.
Dann Corbit
Posts: 12777
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: Scaling from FGRL results with top 3 engines

Post by Dann Corbit »

Uri Blass wrote:
Cardoso wrote:
Uri Blass wrote:
Cardoso wrote:
to do move ordering, you are using hash moves, where the score is based on eval; killer moves are based on eval
Lyudmil, the hashmove and it's associated score, comes not from the eval, but from a search that has an eval, same as the killer moves, counter moves, followupmoves you name it. if you simple assigned the eval result to the hashmove and to killers/countermoves/followupmoves etc that would hurt the engine badly.
Do you think Stockfish is so good because of eval? Just drop the SF eval and use a simplistic material only eval, and play some games against it, and you will finally understand the power of the search and the search also regulates Branching Factor.
To me SF success is 90% search and 10% eval!
Before making so assertive comments I think you should take a course on chess programming, steadily and gradually understand and implement the basics of an alphabeta searcher.

Also drop the "expert" attitude, as you are not one, as you are not even a student. Also that bit of overbearing pride is not heathy to you and those around you.
I think that it will be an interesting experiment to have 2 versions of stockfish.

version A has the same search but only simple piece square table evaluation(no mobility no pawn structure and no king safety)

version B has the same evaluation but only simple alpha beta search.

Note that I expect version A to win convincingly but it may be interesting if A also scales better than B.

Note that I expect both of them to scale significantly worse than stockfish.

It means that
I guess we may see something like the following
1)Stockfish 0.1 seconds per move is the same level as simple stockfish 10 seconds per move.

2)Stockfish 1 seconds per move is at the same level as simple stockfish 300 seconds per move.
That was not exactly the experiment I suggested, I said:
and play some games against it
meaning Lyudmil to try some games against SF himself with only material eval.
I meant to say he would find SFmat can play amazing chess against humans.
I do not believe with only material evaluation it is going to play well(unlike only piece square table).

With only material evaluation it can play stupid moves in the opening like 1.a4 because it does not lose material.

It can probably still beat weak humans but I believe that if you want it to beat strong human players(let say level of 2300 fide rating) then you need at least more evaluation like piece square table.
The Olithink engine adds (primarily and most importantly) mobility along with x-ray/pins to the eval.

Here is the eval in full:

Code: Select all


/* The evulation for Color c. It's almost only mobility stuff. Pinned pieces are still awarded for limiting opposite's king */
int evalc(int c, int* sf) {
	int t, f;
	int mn = 0, katt = 0;
	int oc = c^1;
	u64 ocb = colorb[oc];
	u64 m, b, a, cb;
	u64 kn = kmoves[kingpos[oc]];
	u64 pin = pinnedPieces(kingpos[c], oc);

	b = pieceb[PAWN] & colorb[c];
	while (b) {
		int ppos = 0;
		f = pullLsb(&b);
		t = f + (c << 6);
		ppos = pawnprg[t];
		m = PMOVE(f, c);
		a = POCC(f, c);
		if (a & kn) katt += _bitcnt(a & kn) << 4;
		if (BIT[f] & pin) {
			if (!(getDir(f, kingpos[c]) & 16)) m = 0;
		} else {
			ppos += _bitcnt(a & pieceb[PAWN] & colorb[c]) << 2;
		}
		if (m) ppos += 8; else ppos -= 8;
		/* The only non-mobility eval is the detection of free pawns/hanging pawns */
		if (!(pawnfile[t] & pieceb[PAWN] & ocb)) { //Free file?
			if (!(pawnfree[t] & pieceb[PAWN] & ocb)) ppos *= 2; //Free run?
			if (!(pawnhelp[t] & pieceb[PAWN] & colorb[c])) ppos -= 33; //Hanging backpawn?
		}

		mn += ppos;
	}

	cb = colorb[c] & (~pin);
	b = pieceb[KNIGHT] & cb;
	while (b) {
		*sf += 1;
		f = pullLsb(&b);
		a = nmoves[f];
		if (a & kn) katt += _bitcnt(a & kn) << 4;
		mn += nmobil[f];
	}

	b = pieceb[KNIGHT] & pin;
	while (b) {
		*sf += 1;
		f = pullLsb(&b);
		a = nmoves[f];
		if (a & kn) katt += _bitcnt(a & kn) << 4;
	}

	xorBit(kingpos[oc], colorb+oc); //Opposite King doesn't block mobility at all
	b = pieceb[QUEEN] & cb;
	while (b) {
		*sf += 4;
		f = pullLsb(&b);
		a = RATT1(f) | RATT2(f) | BATT3(f) | BATT4(f);
		if (a & kn) katt += _bitcnt(a & kn) << 4;
		mn += bitcnt(a);
	}

	colorb[oc] ^= RQU & ocb; //Opposite Queen & Rook doesn't block mobility for bishop
	b = pieceb[BISHOP] & cb;
	while (b) {
		*sf += 1;
		f = pullLsb(&b);
		a = BATT3(f) | BATT4(f);
		if (a & kn) katt += _bitcnt(a & kn) << 4;
		mn += bitcnt(a) << 3;
	}

	colorb[oc] ^= pieceb[ROOK] & ocb; //Opposite Queen doesn't block mobility for rook.
	colorb[c] ^= pieceb[ROOK] & cb; //Own non-pinned Rook doesn't block mobility for rook.
	b = pieceb[ROOK] & cb;
	while (b) {
		*sf += 2;
		f = pullLsb(&b);
		a = RATT1(f) | RATT2(f);
		if (a & kn) katt += _bitcnt(a & kn) << 4;
		mn += bitcnt(a) << 2;
	}

	colorb[c] ^= pieceb[ROOK] & cb; // Back
	b = pin & (pieceb[ROOK] | pieceb[BISHOP] | pieceb[QUEEN]); 
	while (b) {
		int p;
		f = pullLsb(&b);
		p = identPiece(f);
		if (p == BISHOP) {
			*sf += 1; 
			a = BATT3(f) | BATT4(f);
			if (a & kn) katt += _bitcnt(a & kn) << 4;
		} else if (p == ROOK) {
			*sf += 2; 
			a = RATT1(f) | RATT2(f);
			if (a & kn) katt += _bitcnt(a & kn) << 4;
		} else {
			*sf += 4;
			a = RATT1(f) | RATT2(f) | BATT3(f) | BATT4(f);
			if (a & kn) katt += _bitcnt(a & kn) << 4;
		}
		t = p | getDir(f, kingpos[c]);
		if ((t & 10) == 10) mn += _bitcnt(RATT1(f));
		if ((t & 18) == 18) mn += _bitcnt(RATT2(f));
		if ((t & 33) == 33) mn += _bitcnt(BATT3(f));
		if ((t & 65) == 65) mn += _bitcnt(BATT4(f));
	}

	colorb[oc] ^= pieceb[QUEEN] & ocb; //Back
	xorBit(kingpos[oc], colorb+oc); //Back
	if (*sf == 1 && !(pieceb[PAWN] & colorb[c])) mn =- 200; //No mating material
	if (*sf < 7) katt = katt * (*sf) / 7; //Reduce the bonus for attacking king squares
	if (*sf < 2) *sf = 2;
	return mn + katt;
}

int eval1 = 0;
int eval(int c) {
	int sf0 = 0, sf1 = 0;
	int ev0 = evalc(0, &sf0);
	int ev1 = evalc(1, &sf1);
	eval1++;

	if (sf1 < 6) ev0 += kmobil[kingpos[0]]*(6-sf1);
	if (sf0 < 6) ev1 += kmobil[kingpos[1]]*(6-sf0);

	return (c ? (ev1 - ev0) : (ev0 - ev1));
}
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
Cardoso
Posts: 363
Joined: Thu Mar 16, 2006 7:39 pm
Location: Portugal
Full name: Alvaro Cardoso

Re: Scaling from FGRL results with top 3 engines

Post by Cardoso »

I do not believe with only material evaluation it is going to play well(unlike only piece square table).

With only material evaluation it can play stupid moves in the opening like 1.a4 because it does not lose material.

It can probably still beat weak humans but I believe that if you want it to beat strong human players(let say level of 2300 fide rating) then you need at least more evaluation like piece square table.
You are right, pst would be much better, pst would guide the game much better, specially for the opening.
With 16core Treadripper, it would be an interesting experiment against humans.
[/quote]
Cardoso
Posts: 363
Joined: Thu Mar 16, 2006 7:39 pm
Location: Portugal
Full name: Alvaro Cardoso

Amazing! (NT)

Post by Cardoso »

Amazing!
User avatar
Laskos
Posts: 10948
Joined: Wed Jul 26, 2006 10:21 pm
Full name: Kai Laskos

Re: Scaling from FGRL results with top 3 engines

Post by Laskos »

Finally Houdini 6 was tested by Andreas for FGRL at LTC one core 60 minutes + 15 seconds increment. The scaling from 10 minutes to 60 minutes (the most important and hardest to measure) is:

Code: Select all

                                                 NORMALIZED ELO

   # PLAYER              : 10'+ 6''    ERROR 1SD = 0.019        60'+ 15''    ERROR 1SD = 0.027  |  SCALING   ERROR 1SD = 0.033 |
================================================================================================|==============================|
   1 Houdini 6           :  0.861                                0.785                          |  -0.076                      |
   2 Komodo 11.2         :  0.704                                0.726                          |  +0.022                      |
   3 Stockfish 8         :  0.704                                0.638                          |  -0.066                      |
   4 Deep Shredder 13    : -0.025                               -0.013                          |  +0.012                      |
   5 Fire 5              : -0.122                               -0.092                          |  +0.030                      |
   6 Fizbo 1.9           : -0.211                               -0.267                          |  -0.056                      |
   7 Gull 3              : -0.325                               -0.320                          |  +0.005                      |
   8 Andscacs 0.91       : -0.340                               -0.278                          |  +0.062                      |
   9 Booot 6.2           : -0.406                               -0.365                          |  +0.042                      |
  10 Chiron 4            : -0.530                               -0.539                          |  -0.009                      |
================================================================================================================================
What can be said with an almost certainty is that Komodo scales significantly better with time control than both Houdini and Stockfish, and that Andscacs scales the best.

Will be a fascinating TCEC event.