Toga The Killer 1Y MP 4CPU is the strongest Toga....

Stephen Ham · Post by **Stephen Ham** » Thu Jun 25, 2009 7:37 pm

Ryan Benitez wrote:
bob wrote:
jpqy wrote:It's a pity that some people don't show any result..so this is better then nothing..10 or 1000 games...if i let every engine play against each other 10 games i come easy on 1000 games.

You have people who always complain...just be happy to see results from people who spend there free time on it!!

JP.
Actually it is _not_ "better than nothing". Unless you consider a completely random number to be "better than nothing"...
Value my random numbers! I don't

Engine Score St
1: Fruit090624k2_64 12.5/18 110=11011=1=1=0=11
2: Stockfish_13_x64_ja 12.5/35 ··················
3: Fruit090624k5_64 10.0/17 =0=1=1=1====01==1

Hi Ryan,

Are your scores derived from games against the aforementioned Toga? Where can I obtain a 64-bit Fruit?

All the best,
Steve

krazyken · Post by **krazyken** » Thu Jun 25, 2009 7:44 pm

bob wrote:
yanquis1972 wrote:the results he posted are quite a bit better than nothing, obviously. you can glance at them & get a fairly good idea of TK's strength (id guess about 3000+ CCRL elo). even if i'm off by a long shot, i'm going to be a lot closer than i would if we just had random results, because chess is not random chaos. in fact it's much farther removed from chance than almost any game i can think of.

anyway, it's anyone's choice how to use their hardware & software & by looking at the results posted & combining them with mine i can see that naum is probably not some kind of special poision for TK, but that its performance was what should be expected.
My point was that 10 games is worthless for determining anything. In a 1000 game match, you can probably find 10 games in a row where each side wins. If you trust 10 game results, that's up to you. I know the inaccuracy this involves.

Data points are never worthless. You are going to need much more than a 1000 game match to declare that both players get a 10 in a row as probable. If the two players are equal the chance of one of them getting a 10 in a row is far less than 1%. The chance of both is far lower. True 10 games has a large confidence interval, but it is far from worthless.

Ryan Benitez · Post by **Ryan Benitez** » Thu Jun 25, 2009 8:17 pm

Stephen Ham wrote:
Ryan Benitez wrote:
bob wrote:
jpqy wrote:It's a pity that some people don't show any result..so this is better then nothing..10 or 1000 games...if i let every engine play against each other 10 games i come easy on 1000 games.

You have people who always complain...just be happy to see results from people who spend there free time on it!!

JP.
Actually it is _not_ "better than nothing". Unless you consider a completely random number to be "better than nothing"...
Value my random numbers! I don't

Engine Score St
1: Fruit090624k2_64 12.5/18 110=11011=1=1=0=11
2: Stockfish_13_x64_ja 12.5/35 ··················
3: Fruit090624k5_64 10.0/17 =0=1=1=1====01==1
Hi Ryan,

Are your scores derived from games against the aforementioned Toga? Where can I obtain a 64-bit Fruit?

All the best,
Steve

It is games against Stockfish just like the Toga games where against Stockfish. Please understand that my sample set of games played on my laptop overnight prove nothing and I only posted to prove a point that small sample sets really are useless unless put together with many other sample sets.

Engine Score St
1: Stockfish_13_x64_ja 21.5/52 ··························
2: Fruit090624k5_64 15.5/26 =0=1=1=1====01==10111011=0
3: Fruit090624k2_64 15.0/26 110=11011=1=1=0=110100100=

See, things have changed a bit already and still far too few games to prove that either version of Fruit is better than the other or better than the engine being tested against. Because they are experimental versions and not well tuned I would guess that after 2000 games each fruit version the results will show that they are both not as strong as Stockfish.

bob · Post by **bob** » Thu Jun 25, 2009 8:31 pm

krazyken wrote:
bob wrote:
yanquis1972 wrote:the results he posted are quite a bit better than nothing, obviously. you can glance at them & get a fairly good idea of TK's strength (id guess about 3000+ CCRL elo). even if i'm off by a long shot, i'm going to be a lot closer than i would if we just had random results, because chess is not random chaos. in fact it's much farther removed from chance than almost any game i can think of.

anyway, it's anyone's choice how to use their hardware & software & by looking at the results posted & combining them with mine i can see that naum is probably not some kind of special poision for TK, but that its performance was what should be expected.
My point was that 10 games is worthless for determining anything. In a 1000 game match, you can probably find 10 games in a row where each side wins. If you trust 10 game results, that's up to you. I know the inaccuracy this involves.
Data points are never worthless. You are going to need much more than a 1000 game match to declare that both players get a 10 in a row as probable. If the two players are equal the chance of one of them getting a 10 in a row is far less than 1%. The chance of both is far lower. True 10 games has a large confidence interval, but it is far from worthless.

Sorry, but if the programs are 100 (or even 200) Elo of each other, 10 games _is_ absolutely worthless for identifying which is best.. Absolutely worthless...

BTW, as far as 10 in a row needing more than 1000 games? See "the birthday paradox". You don't even need 1000 to be reasonably sure...

beachknight · Post by **beachknight** » Thu Jun 25, 2009 8:40 pm

Stephen Ham wrote:
Ryan Benitez wrote:
bob wrote:
jpqy wrote:It's a pity that some people don't show any result..so this is better then nothing..10 or 1000 games...if i let every engine play against each other 10 games i come easy on 1000 games.

You have people who always complain...just be happy to see results from people who spend there free time on it!!

JP.
Actually it is _not_ "better than nothing". Unless you consider a completely random number to be "better than nothing"...
Value my random numbers! I don't

Engine Score St
1: Fruit090624k2_64 12.5/18 110=11011=1=1=0=11
2: Stockfish_13_x64_ja 12.5/35 ··················
3: Fruit090624k5_64 10.0/17 =0=1=1=1====01==1
Hi Ryan,

Are your scores derived from games against the aforementioned Toga? Where can I obtain a 64-bit Fruit?

All the best,
Steve

I ask a similar question:

Where can I obtain a 64-bit and mp Fruit?

Best,

jpqy · Post by **jpqy** » Thu Jun 25, 2009 8:57 pm

Well,that's my way..you have at least 100engines,every week/month you have a new version/update/tuned chess engine..if you have to let them run all 1000 games,is for me not possible.
So after more then 20 years testing,you find a way that for you is the most commen and i'am agree we need more games!
But you have so many lists,just put them together and you have the averange Elo from this engine.When i do 10 games(for starting) you find quickly out how strong this engine is.Because i let them play so much i can against different engines!And then you see very fast whitch engine he like to play and to others he loose.
So if you just by luck choose a engine where he don't like the play style and you run first time 1000 games,you get a bad Elo..so this engine is under his value..you play against a engine he likes very much,you get a much higher Elo. It's like with chess players..to some people you like to play,to others you don't know how to handle his style.
So i take a big range in engines..and i also comes to 1000games(if i have the time) and you get a much nicer averange Elo from this engine.
There are top engines who beat top engines but have it more difficult against less strong engines and inverse.

Look to this example TTK plays agaist TogaII141SE6 and don't like it at all
and against my stronger in my list TogaII142JD he likes very much to play
So now if i want i can continue with this two versions and play 1000 games and at the end i gonne gave a averange Elo when i put the games together.

Then i don't talk yet to use all the different openings books..is again the same some engines like this one book and not the other one.

But this is just may way to see fast how strong a engine is and you see it quickly where he take place in the Elo list.
So,don't think i just only run 10 games against one engine,when i have time enough they get a second round,a third and so on..but at that time there is already a new engine to test

Blitz 5min Core i7 @3.89Ghz 2009

TTK.cirebonb1Y.st.4cpu_b 2800 - Stockfish_13_win32_ja 2800 5.0 - 5.0 +4/-4/=2 50.00%
TTK.cirebonb1Y.st.4cpu_b 2800 - Grapefruit 1.0 alpha 3 2800 5.0 - 5.0 +3/-3/=4 50.00%
TTK.cirebonb1Y.st.4cpu_b 2800 - MP-x86-Inert---Thinker 5.4D 2800 5.5 - 4.5 +4/-3/=3 55.00%
TTK.cirebonb1Y.st.4cpu_b 2800 - TogaII141SE6-4cpu 2800 3.0 - 7.0 +1/-5/=4 30.00%
TTK.cirebonb1Y.st.4cpu_b 2800 - Glaurung22_win32_ja 2800 6.0 - 4.0 +3/-1/=6 60.00%
TTK.cirebonb1Y.st.4cpu_b 2800 - Bright-0.4a 2800 5.5 - 4.5 +5/-4/=1 55.00%
TTK.cirebonb1Y.st.4cpu_b 2800 - TogaII142JD-4cpu 2800 7.0 - 3.0 +4/-0/=6 70.00%
TTK.cirebonb1Y.st.4cpu_b 2800 - MP-x86-Inert---Thinker 5.4C 2800 3.0 - 7.0 +1/-5/=4 30.00%

JP.

Ryan Benitez · Post by **Ryan Benitez** » Thu Jun 25, 2009 9:02 pm

beachknight wrote:
Stephen Ham wrote:
Ryan Benitez wrote:
bob wrote:
jpqy wrote:It's a pity that some people don't show any result..so this is better then nothing..10 or 1000 games...if i let every engine play against each other 10 games i come easy on 1000 games.

You have people who always complain...just be happy to see results from people who spend there free time on it!!

JP.
Actually it is _not_ "better than nothing". Unless you consider a completely random number to be "better than nothing"...
Value my random numbers! I don't

Engine Score St
1: Fruit090624k2_64 12.5/18 110=11011=1=1=0=11
2: Stockfish_13_x64_ja 12.5/35 ··················
3: Fruit090624k5_64 10.0/17 =0=1=1=1====01==1
Hi Ryan,

Are your scores derived from games against the aforementioned Toga? Where can I obtain a 64-bit Fruit?

All the best,
Steve
I ask a similar question:

Where can I obtain a 64-bit and mp Fruit?

Best,

I have come to learn that releasing a chess engine is almost always a bad idea. I like to retain the option of making a release but I don't see a need to make a release. When I release an engine I get some praise by some and I appreciate that but it is not the reason I do what I do in computer chess. I also get insults, and ridicule and that is somewhat annoying but also I get to support the release and my bugs. One would think open source would be the solution but the past has shown that I get 0 bug reports based on open source code. I also don't have thick enough skin to GPL the latest Fruit. That said I prefer source code over compiled engine release but both are unlikely.

ernest · Post by **ernest** » Fri Jun 26, 2009 1:32 am

jpqy wrote:if you have to let them run all 1000 games,is for me not possible...

Come on Jean-Paul, with a good machine like yours, 100 games at 2'+1" is not impossible (takes 10 hours) and far more relevant...

mcostalba · Post by **mcostalba** » Fri Jun 26, 2009 6:49 am

Ryan Benitez wrote: I have come to learn that releasing a chess engine is almost always a bad idea. I like to retain the option of making a release but I don't see a need to make a release. When I release an engine I get some praise by some and I appreciate that but it is not the reason I do what I do in computer chess. I also get insults, and ridicule and that is somewhat annoying but also I get to support the release and my bugs. One would think open source would be the solution but the past has shown that I get 0 bug reports based on open source code. I also don't have thick enough skin to GPL the latest Fruit. That said I prefer source code over compiled engine release but both are unlikely.

My opinion is that an open source engine benefits from releasing often (3/4 times a year to be clear).

This as the following advantages:

- Expose engine to wider testing and get more valuable feedback then what you can get from internal testing.

- Let people talk about the engine so that this can increase interest and attract potential co-developers.

- Raise the possibility of something "unexpected" happens. After a release there is always something you didn't foreseen, and normally it is from where new advances come.

- Drive testers crazy

There are also possible disadvantages:

- Let other open sources developers taking your ideas: this I don't care at all and I even am glad of.

- Let other NOT open sources developers (commercial or free) taking your ideas: this I don't care at all but I am a bit less glad of

krazyken · Post by **krazyken** » Fri Jun 26, 2009 7:04 pm

bob wrote:
krazyken wrote:
bob wrote:
yanquis1972 wrote:the results he posted are quite a bit better than nothing, obviously. you can glance at them & get a fairly good idea of TK's strength (id guess about 3000+ CCRL elo). even if i'm off by a long shot, i'm going to be a lot closer than i would if we just had random results, because chess is not random chaos. in fact it's much farther removed from chance than almost any game i can think of.

anyway, it's anyone's choice how to use their hardware & software & by looking at the results posted & combining them with mine i can see that naum is probably not some kind of special poision for TK, but that its performance was what should be expected.
My point was that 10 games is worthless for determining anything. In a 1000 game match, you can probably find 10 games in a row where each side wins. If you trust 10 game results, that's up to you. I know the inaccuracy this involves.
Data points are never worthless. You are going to need much more than a 1000 game match to declare that both players get a 10 in a row as probable. If the two players are equal the chance of one of them getting a 10 in a row is far less than 1%. The chance of both is far lower. True 10 games has a large confidence interval, but it is far from worthless.
Sorry, but if the programs are 100 (or even 200) Elo of each other, 10 games _is_ absolutely worthless for identifying which is best.. Absolutely worthless...

BTW, as far as 10 in a row needing more than 1000 games? See "the birthday paradox". You don't even need 1000 to be reasonably sure...

A. You are changing your statement. I was responding to "10 games is worthless for determining anything" now you are switching to "10 games _is_ absolutely worthless for identifying which is best." Which is statistically a completely different question. The first is false, the second is frequently true, especially given the qualification you added to it.

B. Picking 2 people out of a group in the Birthday Paradox, has nothing to do with the problem of finding a particular streak in a series. The formula for finding the probability of a streak of wins with independent trials is:

(N - x + 1)(p^x)

Where N is the number of trials, x is the length of the streak, and p is the probability of winning. Depending on what p is, I slightly overstated the case before, the probability of finding a streak of 10 wins in 1000 a is close to 1%, not far less. The probability of a streak of 10 wins and a streak of 10 losses is still far less though.

Toga The Killer 1Y MP 4CPU is the strongest Toga....

Re: Toga The Killer 1Y MP 4CPU is the strongest Toga....

Re: Toga The Killer 1Y MP 4CPU is the strongest Toga....

Re: Toga The Killer 1Y MP 4CPU is the strongest Toga....

Re: Toga The Killer 1Y MP 4CPU is the strongest Toga....

Re: Toga The Killer 1Y MP 4CPU is the strongest Toga....

Re: Toga The Killer 1Y MP 4CPU is the strongest Toga....

Re: Toga The Killer 1Y MP 4CPU is the strongest Toga....

Re: Toga The Killer 1Y MP 4CPU is the strongest Toga....

Re: Toga The Killer 1Y MP 4CPU is the strongest Toga....

Re: Toga The Killer 1Y MP 4CPU is the strongest Toga....