On-line engine blitz (policy poll)

Discussion of chess software programming and technical issues.

Moderators: hgm, Dann Corbit, Harvey Williamson

Should multiple engine copies be allowed in the monthly blitz tourney

Under no condition
12
35%
Just to make an even number of participants
6
18%
Only if one of them if running on Raspberry Pi
8
24%
Already if one use >8x as many cores
2
6%
Even old ancestors are allowed on the same hardware
6
18%
 
Total votes: 34

User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Wattage classes

Post by sje »

How about organizing an event with participants grouped by wattage requirements? A 10 W class would allow Raspberry Pi and BeagleBone hosts, a 25 W class would allow many notebooks, etc. Of course, entrants would need to observe the honor system.

----

I will enter Symbolic in one of your events after I add a few more items to the re-write and have it use XBoard protocol version two to test ICS operation. At present, the program is too weak to put up much of a fight:

Code: Select all

2015-08-23 100 1m+1s vs FairyMax 57-25-18 elo+115
2015-08-23 100 1m+1s vs Fruit    07-89-04 elo-402
2015-08-24 100 1m+1s vs tscp     73-16-11 elo+225
Henk
Posts: 7210
Joined: Mon May 27, 2013 10:31 am

Re: On-line engine blitz (policy poll)

Post by Henk »

Also if your program has a severe bug your program might demote to a lower group. But if the bug is repaired you are already demoted to a lower group and so you have to wait two months instead of one month for a revenge.
Joost Buijs
Posts: 1562
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: On-line engine blitz (policy poll)

Post by Joost Buijs »

Henk wrote:
Joost Buijs wrote:
hgm wrote:
I think the problem is exactly the opposite: the participant strength varies over a very wide range, and pairing a top engine against an average one is basically a waste of time, as the outcome is 99% certain. But due to the comparatively large number of rounds the top engines run out of reasonable opponents pretty fast, (given the black/white restrictions), so that such pairings have to be made.
The last few tournaments there were at least 10 or 12 (top) engines of comparable strength, and about the same number of weaker engines.

I don't know anything about Swiss systems at all, but with this high number of participants would it not be possible to split the tournament into 2 groups, one for the stronger engines and one for the weaker ones?
This also gives you the opportunity to add some of the low level hardware engines to the 2nd group.
I guess it would not involve much extra work for the admin because most of it is handled by mamer anyway.

It seems logical to me that a Swiss tournament can only give a reasonable outcome with respect to ranking when the participants are of comparable strength.
If you only want to determine who is the strongest then it doesn't matter of course.
Even 2 groups may not be enough. But I also do not like to play against chess engines that give away pieces almost instantly. Even there is a difference between bad playing and very bad playing engines.
Playing against engines that play give away chess is of no use at all.
I don't mind playing a weaker engine as long as it looks like a reasonable game of chess.

I saw on many occasions that the final Swiss ranking after the tournament has finished is way off from what you would expect.
For instance when you have engines with a higher number of points than engines which have a much higher TPR.
Maybe the mamer pairing algorithm is not up to standards, but I think this happens because the spread in playing strength is too large.
User avatar
sje
Posts: 4675
Joined: Mon Mar 13, 2006 7:43 pm

Re: On-line engine blitz (policy poll)

Post by sje »

The first round of a Swiss event usually sorts the entrants into upper and lower halves fairly well. So having one big event vs two split events just means one extra round for a roughly equal confidence level of rank establishment.

For an Elo delta of 400 points, the lesser ranked entrant has an expectation of about 9.1%. But at a delta of 800 points, this decreases to just under 1%, and so such a pairing probably won't produce a very interesting game.
User avatar
hgm
Posts: 27703
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: On-line engine blitz (policy poll)

Post by hgm »

Joost Buijs wrote:It seems logical to me that a Swiss tournament can only give a reasonable outcome with respect to ranking when the participants are of comparable strength.
If there is any truth in that, it is only because when all participants are of comparable strength, any ranking would be reasonable. And any pairing algorithm would then have that property.

In fact Swiss is usually used when the strength range is huge. The first two rounds quickly sort out the players by strength class, especially if it uses seeding (like mamer does based on pre-existing rating). There is an 'accelerated pairing' algorithm that adds a virtual point to the top-half seeds in the first two rounds, so that you skip the totally predictable first top-half vs bottom-half round, and in the second round play the second quart against the third. I don't think mamer uses that.
Joost Buijs wrote:I saw on many occasions that the final Swiss ranking after the tournament has finished is way off from what you would expect.
For instance when you have engines with a higher number of points than engines which have a much higher TPR.
Maybe the mamer pairing algorithm is not up to standards, but I think this happens because the spread in playing strength is too large.
Mamer's pairing is quite advanced; it even uses back-tracking to get the best pairing. It is just intrinsic to the swiss system that the upper and lower half of the participants cannot stay separated in score without also playing each other. If the best of the weaker half only plays weaker opponents it would soon get a score equal or better than the best participant that faces strong opposition all the time.

A problem is that there are actually very few participants in the intermediate range. A lot of 2700+, just 2 or 3 around 2400, and then a lot of 2000-. There are not enough middle-class engine to keep the weak ones in the tail, which means that occasionally strong engines will have to be paired with weak engines to keep the score of the latter down. This is why I thought including some more strong engines running on R.Pi-class hardware would help, as their Elo will fall in the intermediate range. (rpiFruit is still quite effective in beating micro-Max.)

TPR is actually a much more reliable indicator than score. If I had to choose another tournament form, I think I would go for doing a few rounds of Swiss first (say 4), and then dividing up the field by TPR of the Swiss, (say in groups of 6), and play a round-robin in those. The problem is that this puts a much larger constraint on the number of players that can be conveniently handled. With Swiss it is enough that it is even. As a practical consideration, mamer is really written to do Swiss pairing, and will stick to that system in the early rounds of a round-robin, so that it often gets stuck in the fore-last round. (E.g. when with 2 rounds to go A-B, B-C and A-C has to be played; A, B and C all need two more games, but one of them will not be able to play.) It really would require a re-write of mamer to make it reliable for round-robin pairing, using a fixed schedule for that.
Joost Buijs
Posts: 1562
Joined: Thu Jul 16, 2009 10:47 am
Location: Almere, The Netherlands

Re: On-line engine blitz (policy poll)

Post by Joost Buijs »

hgm wrote: A problem is that there are actually very few participants in the intermediate range. A lot of 2700+, just 2 or 3 around 2400, and then a lot of 2000-. There are not enough middle-class engine to keep the weak ones in the tail, which means that occasionally strong engines will have to be paired with weak engines to keep the score of the latter down.
Well, this is more or less what I meant, I didn't express myself clearly.
The strength should be distributed more evenly, normally you would expect a Gaussian distribution with most engines in the middle class Elo region.

I did not expect that the mamer pairing algorithm is already very advanced, at least not when you take a glimpse at the Lasker source which is the biggest mess I have ever seen.
Robert Pope
Posts: 558
Joined: Sat Mar 25, 2006 8:27 pm

Re: On-line engine blitz (policy poll)

Post by Robert Pope »

hgm wrote:Is this a sensible policy, or should there be no objection to participation of such Raspberry-Pie duplicats of other participants? Last tourney even rpiStockfish (which is a Stockfish running on Raspberry Pi) finished only in lower half of the field, so it would not really affect the battle at the top. They would instead fall in the (now little populated) mid-section of the field if they were top engines, and in the sub-microMax tail if they were just average engines.

The advantage of lifting restrictions on such participation would be that we have more participants, of a wider range of strengths. IDuring the last tourney there were 4 engines logged in that now were barred from joining:
NightmareA, Blieps (= Rookie on R.Pi), rpiFruit (because of FruitReloaded), and rpiTogaII (as derivative of Fruit). In the end I did join the latter to make an even number. rpiGlaurung would be excluded anyway because of rpiStockfish.

The danger is that allowing multiple copies is the first step on a sliding scale: 'weak hardware' will eventually get stronger as well. In the future there could be 8-core Raspberri Pies. And is an engine running single-threaded on a 2.4GHz i3 laptop sufficiently weaker than the same engine running on a 12-core i7? How much weaker does the hardware have to be to be qualified as a 'minor participant' that is allowed besides the top-of-the-line version?
I don't have a problem with both Stockfish and rpiStockfish - I think having a full range of strengths of competitors is good. As long as the hardware disadvantage is at least 100:1, they really are competing in a different league.

But I do love that the blitz tourney is primarily authors running their own engines, and I would hate to dilute that too much.
mvk
Posts: 589
Joined: Tue Jun 04, 2013 10:15 pm

Re: On-line engine blitz (policy poll)

Post by mvk »

For making the field even, adding a clone on weak hardwre is a much more fun option than having a 'bye'. One can debate about the required strength. The traditional thinking is that it shouldn't be a spoiler for the top half of the pool, and it should not be guaranteed to finish last.

Given that, I feel that Bliep(C) is already too strong to make the field even and be a substitute for a 'bye'. It placed 16 out of 26 when we tried that. This was single-threaded, but on a RPI v2, which are not slow computers at all. I can imagine it is very demotivating to lose to such a 'bye' replacement when you are normally playing in the tail of the tournament. You should have a reasonable expectation that you could beat it once in a while.

Also I don't regret when you pick an other program for that purpose. But I would prefer some program by an author who is online.

In case the poll is about more than making the field even, I also wouldn't mind if we add 10 more RPIs of various programs. The PI's are really good material for uniform platform contests, especially the v2. Personally, and in general, I don't care much about 'sliding scale' type of arguments, as they usually conceal some concern that needs to be made explicit so it can be discussed, or they are just invalid handwaving. A 'sliding scale' argument by itself is never valid, IMO. But that is just me.

Also, Glaurung and SF are already too different, I would welcome both, why not. After all, it is always fun to beat Glaurung. It is a reminder of where SF once was, and sometimes you need that motivation.

One final thing: Don't underestimate SF on a PI. I expect it to finish in the top 5 when running on a RPI v2 and 4 cores. I think something is just plain wrong with the rpiStockfish(C) configuration on your server. Bliep(C) scores >50% against it, while it should score closer to 5% based on my experience.

In short: just do as you see fit. It is your tournament and it is a lot of fun, and it will stay like that if we don't overthink things :-)
[Account deleted]
bob
Posts: 20943
Joined: Mon Feb 27, 2006 7:30 pm
Location: Birmingham, AL

Re: On-line engine blitz (policy poll)

Post by bob »

BTW the idea about the "extra program" to avoid byes has always been a good idea. And I agree with one of the posters here (to an extent) where the added program needs to be at least in the bottom-half of the strength pool. You don't want one so weak that the games are meaningless and no more informative than a normal bye, but you don't want it strong enough that the beginner programs have no chance. I'd be happy with tscp, perhaps gerbil, or even gnuchess 5 or whatever, as the new programs will still have a chance of beating them.