Test

Discussion of anything and everything relating to chess playing software and machines.

Moderator: Ras

chessica
Posts: 979
Joined: Thu Aug 11, 2022 11:30 pm
Full name: Esmeralda Pinto

Test

Post by chessica »

MEA
A. Processor:
Brand: AMD Athlon(tm) II X4 630 Processor
Arch: X86_64
Cores: 4

B. EPD test set:
Filename: temere.epd

Code: Select all

Rank	Engine	Rating	Top1	MaxTop1	Top1Rate	Score	MaxScore	ScoreRate	MoveTime(ms)	Hash(MB)	Threads
1	Leptir-28-02-23	2500	2809	4975	0.565	36687	49750	0.737	100	128	1
2	Stockfish-15.1	2500	2780	4975	0.559	36408	49750	0.732	100	128	1
3	crystal-5-kwk	2500	2762	4975	0.555	36378	49750	0.731	100	128	1
4	Berserk-11.1	2500	2727	4975	0.548	36322	49750	0.730	100	128	1
5	BlueMarlin-15.7	2500	2753	4975	0.553	36305	49750	0.730	100	128	1
6	Stockfish-11	2500	2716	4975	0.546	36109	49750	0.726	100	128	1
7	Brainlearn-25.2	2500	2713	4975	0.545	36004	49750	0.724	100	128	1
8	Stockfish_14.1	2500	2684	4975	0.539	35736	49750	0.718	100	128	1
9	Stockfish-16.1	2500	2704	4975	0.544	35631	49750	0.716	100	128	1
10	Stockfish-17	2500	2659	4975	0.534	34984	49750	0.703	100	128	1
11	Houdini_15a	2500	2411	4975	0.485	32864	49750	0.661	100	128	1
12	Fidelio-17-MPV	2500	2406	4975	0.484	32826	49750	0.660	100	128	1
13	Ippolit_051323	2500	2373	4975	0.477	32459	49750	0.652	100	128	1
14	WildCat_8	2500	1919	4975	0.386	26908	49750	0.541	100	128	1
15	Ruffian_2	2500	1883	4975	0.378	26704	49750	0.537	100	128	1
16	igel_1.2	2500	1847	4975	0.371	26268	49750	0.528	100	128	1
17	Rotor_0.8	2500	1852	4975	0.372	26209	49750	0.527	100	128	1
18	Brainlearn-28.1	2500	1658	4975	0.333	23974	49750	0.482	100	128	1
chessica
Posts: 979
Joined: Thu Aug 11, 2022 11:30 pm
Full name: Esmeralda Pinto

Re: Test

Post by chessica »

Now, what conclusions can or should from the results? I'm at a loss.
User avatar
Rebel
Posts: 7409
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: Test

Post by Rebel »

The "temere.epd" (which was shared here at the time) was not great.
90% of coding is debugging, the other 10% is writing bugs.
chessica
Posts: 979
Joined: Thu Aug 11, 2022 11:30 pm
Full name: Esmeralda Pinto

Re: Test

Post by chessica »

It may be that Temere is not balanced, but that affects all engines.

An example is shown here with a #3

viewtopic.php?p=968610#p968610

And the stockfishes show the same order here.
User avatar
Rebel
Posts: 7409
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: Test

Post by Rebel »

chessica wrote: Mon Sep 16, 2024 1:00 pm It may be that Temere is not balanced, but that affects all engines.
False explanation.

Do you want to learn something about MEA testing?
90% of coding is debugging, the other 10% is writing bugs.
chessica
Posts: 979
Joined: Thu Aug 11, 2022 11:30 pm
Full name: Esmeralda Pinto

Re: Test

Post by chessica »

What you say is true, the composition of the positions is crucial.
User avatar
Rebel
Posts: 7409
Joined: Thu Aug 18, 2011 12:04 pm
Full name: Ed Schröder

Re: Test

Post by Rebel »

chessica wrote: Sat Sep 21, 2024 7:52 pm What you say is true, the composition of the positions is crucial.
No, it's the quality of the best move in the "c0" tag of the epds, it should be the best move, if not the MEA results are unreliable.
90% of coding is debugging, the other 10% is writing bugs.
chessica
Posts: 979
Joined: Thu Aug 11, 2022 11:30 pm
Full name: Esmeralda Pinto

Re: Test

Post by chessica »

I had assumed that. Likewise if there are several good ones, so that they are included.