I can't believe that so many people don't get it!

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
hgm
Posts: 27790
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: I can't believe that so many people don't get it!

Post by hgm »

They mean normal nodes per second. It is not easy to compare hardware that is very different. Depending on what feature you compare you can make one or the other look better. Stockfish would probably run 20 times slower on the hardware AlphaZero was using than on the hardware it got. It is irrelevant how much you would suffer on the hardware of your opponent, which is unsuitable for you.
TPUs are not more complex than x86 cores. But they do completely different things. So it is always possible to find something that one does, and the other not, or hardly, to make itlook better.
CheckersGuy
Posts: 273
Joined: Wed Aug 24, 2016 9:49 pm

Re: I can't believe that so many people don't get it!

Post by CheckersGuy »

Ed, do you even know what the tpu's are mostly used for ? They are specialized chips for deepLearning (the neural network that A0 uses).
"Evaluations per second " would make much more sense
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: I can't believe that so many people don't get it!

Post by Rebel »

hgm wrote:They mean normal nodes per second. It is not easy to compare hardware that is very different. Depending on what feature you compare you can make one or the other look better. Stockfish would probably run 20 times slower on the hardware AlphaZero was using than on the hardware it got. It is irrelevant how much you would suffer on the hardware of your opponent, which is unsuitable for you.
TPUs are not more complex than x86 cores. But they do completely different things. So it is always possible to find something that one does, and the other not, or hardly, to make itlook better.
That (the red) is a strange assumption. We already discussed the second generation TPU besides floating point (FLOPS) is also integer (IPS) based.

So we can make make a TERA comparison.

It's logical to assume that integers are not slower than floating points and so we get 720 TFLOPS is at least 720 TIPS.

SF running on a real 64 CPU (no hyperthreading) 4 GHz would do 256 GIPS and thus only 0.25 TIPS.

And even if SF (per your guess) would run 20 times slower on a TPU they would do 720/20=36 TIPS making SF run 144 times faster than on a 64 core PC at 4 GHz.

---------

OTOH we read statements of Google, likely about the first generation:
On production AI workloads that utilize neural network inference, the TPU is 15 times to 30 times faster than contemporary GPUs and CPUs, Google said.
I would say it's clear as mud.
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: I can't believe that so many people don't get it!

Post by Rebel »

I guess the claimed 180 TFLOPS is only based on reg,reg (non-memory) operations, it would explain.
User avatar
mhull
Posts: 13447
Joined: Wed Mar 08, 2006 9:02 pm
Location: Dallas, Texas
Full name: Matthew Hull

Re: I can't believe that so many people don't get it!

Post by mhull »

Rebel wrote:
hgm wrote:They mean normal nodes per second. It is not easy to compare hardware that is very different. Depending on what feature you compare you can make one or the other look better. Stockfish would probably run 20 times slower on the hardware AlphaZero was using than on the hardware it got. It is irrelevant how much you would suffer on the hardware of your opponent, which is unsuitable for you.
TPUs are not more complex than x86 cores. But they do completely different things. So it is always possible to find something that one does, and the other not, or hardly, to make itlook better.
That (the red) is a strange assumption. We already discussed the second generation TPU besides floating point (FLOPS) is also integer (IPS) based.

So we can make make a TERA comparison.

It's logical to assume that integers are not slower than floating points and so we get 720 TFLOPS is at least 720 TIPS.

SF running on a real 64 CPU (no hyperthreading) 4 GHz would do 256 GIPS and thus only 0.25 TIPS.

And even if SF (per your guess) would run 20 times slower on a TPU they would do 720/20=36 TIPS making SF run 144 times faster than on a 64 core PC at 4 GHz.

---------

OTOH we read statements of Google, likely about the first generation:
On production AI workloads that utilize neural network inference, the TPU is 15 times to 30 times faster than contemporary GPUs and CPUs, Google said.
I would say it's clear as mud.
Well, I would rather compare the NPS of the MCTS. Or, ask the relative path length of the NN "position evaluation process" compared to the path length of Stockfish's static evaluation.

My hazardous guess is that the path length of the NN is quite a bit longer. So it maybe needs TIPS to keep up with its opponents GIPS (if that makes any sense).
Matthew Hull
User avatar
hgm
Posts: 27790
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: I can't believe that so many people don't get it!

Post by hgm »

Rebel wrote: That (the red) is a strange assumption. We already discussed the second generation TPU besides floating point (FLOPS) is also integer (IPS) based.
You have a completely wrong idea of what a TPU is. It has for instance no branch instructions. It can do integer operations. But only of a type where the CPU sends it an instruction, and that instruction then orders it to do 64K multiplications and additions. Which it then does in a single (or perhaps a few) clock cycles. It is an array processor. Perhaps it would be possible to instruct it to use 1x1 arrays. But that would then slow it down by a factor 64K, making it enormously slower than an x86 CPU. For everything a TPU does the CPU would have to send it an instruction in a memory mapped or I/O port mapped control register, or several such registers. A TPU does not fetch its own instructions. This only makes sense if you can order the TPU to do thousands of operations with one instruction, like multiplying one array of 1000 numbers with another array of 1000 numbers element by element, and adding all products.

So TPUs don't do anything if there is not a CPU to tell them what to do. Array operations are completely useless for Stockfish, so it could only use the CPU core that Alpha Zero uses to tell the TPUs what to do
So we can make make a TERA comparison.

It's logical to assume that integers are not slower than floating points and so we get 720 TFLOPS is at least 720 TIPS.

SF running on a real 64 CPU (no hyperthreading) 4 GHz would do 256 GIPS and thus only 0.25 TIPS.

And even if SF (per your guess) would run 20 times slower on a TPU they would do 720/20=36 TIPS making SF run 144 times faster than on a 64 core PC at 4 GHz.

---------

OTOH we read statements of Google, likely about the first generation:
On production AI workloads that utilize neural network inference, the TPU is 15 times to 30 times faster than contemporary GPUs and CPUs, Google said.
I would say it's clear as mud.
User avatar
Rebel
Posts: 6991
Joined: Thu Aug 18, 2011 12:04 pm

Re: I can't believe that so many people don't get it!

Post by Rebel »

hgm wrote:
Rebel wrote: That (the red) is a strange assumption. We already discussed the second generation TPU besides floating point (FLOPS) is also integer (IPS) based.
You have a completely wrong idea of what a TPU is.
If you say so :lol:
hgm wrote:It has for instance no branch instructions.
What ?!

From : - The TPU is programmable like a CPU or GPU. It isn't designed for just one neural network model; it executes CISC instructions on many networks (convolutional, LSTM models, and large, fully connected models). So it is still programmable, but uses a matrix as a primitive instead of a vector or scalar.
User avatar
hgm
Posts: 27790
Joined: Fri Mar 10, 2006 10:06 am
Location: Amsterdam
Full name: H G Muller

Re: I can't believe that so many people don't get it!

Post by hgm »

Read the entire paper you linked to. In the figure caption it says "instructions are sent from the host interface to a queue. There is no looping." The text just below thar says: "The TPU does not have any stored program; it simply executes instructions sent from the host." TPUs don't do anything unless they receive an instruction over the PCI bus that tells them what to do. So their instruction rate is extremely slow. But the instructions are extremely powerful, capable of doing 64K multiplications at once, so the number of arithmetic operations per second is still very high.

The sentence you quote doesn't contradict what I said, but doesn't mean what you think. Yes, the TPU is programmable, through a sequence of instructions. But it doesn't say it fetches these by itself.
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: I can't believe that so many people don't get it!

Post by Michael Sherwin »

Michael Sherwin wrote:
Lyudmil Tsvetkov wrote:
Michael Sherwin wrote:
corres wrote:
Michael Sherwin wrote: AlphaZ beat SF by the use of a 'simple trick' called a learn file with reinforcement learning. RomiChess demonstrated the same 'simple trick' 11 years ago against the world's strongest chess engine at the time beating Rybka.
It has been established that A0 has a learn file that it saves all its trained games in and stores wins, losses, draws and a percentage chance to win. RomiChess does the exact same thing. Here is a record from Romi's learn file.
Record 1 sib 487 chd 2 fs 12 ts 28 t 0 f 0 d 15 s 0 score 17 w 283 L 264 d 191
Record Number
First Sibling Record
First Child Record
From Square
To Square
Type of Move
Flags
Depth
Status
Score, reinforcement learning rewards/penalties
White Wins
Black Wins
Draws
Store a million complete games that have been guided by the stats in the learn file and tactics unlimited ply deep can be found and stored and played back or the search can be guided to find them. It is just a 'simple trick'.
I put 'simple trick' in single quotes because it is a valid trick and not some swindle. If an engine is programmed to do this then more power to it! The wins are legit and if an engine like SF, K or H etc. lose because they don't have this type of learning then tough cookies!
You are right basically.
But. Can you estimate the measure of that learning file what makes from Romi an engine with 3400 Elo?
It is pity but the team of DeepMind did not give me any information about the measure of (programmable) memory used by AlphaZero for neural network. I am afraid a Romi type engine with 3400 Elo needs much more bigger memory to use for learning file as the AlphaZero have.
Moreover a system based on neural network is more flexible and effective than using a learning file only.
I'm not sure what you are asking but I will give as much information as I can.

Romi's learn file is stored on the hard drive. It is modified on the hard drive. The only part of it that is brought into memory is the subtree of the current position if there is one. And that is stored in the hash table so no extra memory footprint is created.

Romi only being a 2425 ccrl elo engine needs to learn a lot of good moves to win games against way stronger engines. A top engine can take advantage of much less learning just simply because only one move is all it may need. A top engine will show a positive learning curve much sooner.

"Moreover a system based on neural network is more flexible and effective than using a learning file only."

Romi does not use a learn file only. Technically there is no learning in a learn file. It is just data recording results. The real learning happens when the nodes are moved from the data tree to the hash file. The data moved into the hash is what allows the search to learn and hopefully play better moves. Those nodes moved into the hash are each a little nugget of accumulated knowledge that goes beyond the understanding of the eval and results in super human looking play. If an engine that achieves a 3800 elo can play near perfect chess then RL may not help much. If instead the elo ceiling is at 5000 or higher then RL can produce giant gains in elo with enough games. Romi's elo gain is linear in the range of 1 to 1000 elo in only 400 games using only 10 starting positions against one opponent. That is 2.5 elo per game. Against a humongous book and iirc 6 top engines Romi's elo gain was 50 elo per 5000 games.
If you are able to draw SF at 2425, why would not Alpha be able to beat SF at a bit higher level?
Actually, are not they just 2500?

What is your score against SF after 100 games?
Without learning Romi is a 2400 level engine. With learning Romi's elo would rise. The problem is that the only person who cared, died. And since then no organized tournament or rating agency allows Romi to learn if it plays. Romi would climb in the rating list if it was allowed to use its learning.

Against multithreaded SF because it constantly changes its play Romi can not show any gain in 100 games except if the starting position is already highly favorable to Romi. Against single threaded SF if that means SF will always play the same then in 100 games I suspect Romi would win. I have not tested that since Glaurung days. I'll test it now.
I'm really burnt out on this subject and this place and computer chess in general so I had to take a couple of days off. It turns out that SF 8 varies its play even using just one thread. But I let the first 100 game match finish. Romi scored only 3 draws in 100 games. The first draw was on game 42. Now the second 100 game match is underway. It is only game 25 and already Romi has 6 draws. I will update when the second hundred games are finished. So far -330 elo for Romi so about a 3,000 elo performance in the first 25 games of the second 100.
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through
Michael Sherwin
Posts: 3196
Joined: Fri May 26, 2006 3:00 am
Location: WY, USA
Full name: Michael Sherwin

Re: I can't believe that so many people don't get it!

Post by Michael Sherwin »

Michael Sherwin wrote:
Michael Sherwin wrote:
Lyudmil Tsvetkov wrote:
Michael Sherwin wrote:
corres wrote:
Michael Sherwin wrote: AlphaZ beat SF by the use of a 'simple trick' called a learn file with reinforcement learning. RomiChess demonstrated the same 'simple trick' 11 years ago against the world's strongest chess engine at the time beating Rybka.
It has been established that A0 has a learn file that it saves all its trained games in and stores wins, losses, draws and a percentage chance to win. RomiChess does the exact same thing. Here is a record from Romi's learn file.
Record 1 sib 487 chd 2 fs 12 ts 28 t 0 f 0 d 15 s 0 score 17 w 283 L 264 d 191
Record Number
First Sibling Record
First Child Record
From Square
To Square
Type of Move
Flags
Depth
Status
Score, reinforcement learning rewards/penalties
White Wins
Black Wins
Draws
Store a million complete games that have been guided by the stats in the learn file and tactics unlimited ply deep can be found and stored and played back or the search can be guided to find them. It is just a 'simple trick'.
I put 'simple trick' in single quotes because it is a valid trick and not some swindle. If an engine is programmed to do this then more power to it! The wins are legit and if an engine like SF, K or H etc. lose because they don't have this type of learning then tough cookies!
You are right basically.
But. Can you estimate the measure of that learning file what makes from Romi an engine with 3400 Elo?
It is pity but the team of DeepMind did not give me any information about the measure of (programmable) memory used by AlphaZero for neural network. I am afraid a Romi type engine with 3400 Elo needs much more bigger memory to use for learning file as the AlphaZero have.
Moreover a system based on neural network is more flexible and effective than using a learning file only.
I'm not sure what you are asking but I will give as much information as I can.

Romi's learn file is stored on the hard drive. It is modified on the hard drive. The only part of it that is brought into memory is the subtree of the current position if there is one. And that is stored in the hash table so no extra memory footprint is created.

Romi only being a 2425 ccrl elo engine needs to learn a lot of good moves to win games against way stronger engines. A top engine can take advantage of much less learning just simply because only one move is all it may need. A top engine will show a positive learning curve much sooner.

"Moreover a system based on neural network is more flexible and effective than using a learning file only."

Romi does not use a learn file only. Technically there is no learning in a learn file. It is just data recording results. The real learning happens when the nodes are moved from the data tree to the hash file. The data moved into the hash is what allows the search to learn and hopefully play better moves. Those nodes moved into the hash are each a little nugget of accumulated knowledge that goes beyond the understanding of the eval and results in super human looking play. If an engine that achieves a 3800 elo can play near perfect chess then RL may not help much. If instead the elo ceiling is at 5000 or higher then RL can produce giant gains in elo with enough games. Romi's elo gain is linear in the range of 1 to 1000 elo in only 400 games using only 10 starting positions against one opponent. That is 2.5 elo per game. Against a humongous book and iirc 6 top engines Romi's elo gain was 50 elo per 5000 games.
If you are able to draw SF at 2425, why would not Alpha be able to beat SF at a bit higher level?
Actually, are not they just 2500?

What is your score against SF after 100 games?
Without learning Romi is a 2400 level engine. With learning Romi's elo would rise. The problem is that the only person who cared, died. And since then no organized tournament or rating agency allows Romi to learn if it plays. Romi would climb in the rating list if it was allowed to use its learning.

Against multithreaded SF because it constantly changes its play Romi can not show any gain in 100 games except if the starting position is already highly favorable to Romi. Against single threaded SF if that means SF will always play the same then in 100 games I suspect Romi would win. I have not tested that since Glaurung days. I'll test it now.
I'm really burnt out on this subject and this place and computer chess in general so I had to take a couple of days off. It turns out that SF 8 varies its play even using just one thread. But I let the first 100 game match finish. Romi scored only 3 draws in 100 games. The first draw was on game 42. Now the second 100 game match is underway. It is only game 25 and already Romi has 6 draws. I will update when the second hundred games are finished. So far -330 elo for Romi so about a 3,000 elo performance in the first 25 games of the second 100.
After 42 games in the first 100 games Romi only had 1 draw. After 42 games of the second 100 Romi has gotten 10 draws. SF 8 1 thread is rated 3422 on the 40/4 index. RomiChess is rated a whopping 2423 elo. So after 142 games of training on the original position Romi in the now last 45 games and 11 draws has performed at 3422 - 330 = 3092 elo. Here are the openings visited.

c15, c10, c49, c68, c66, a55, b58, c66, c48, c66, c66, c66, c66, c66, c66, c68, c65, c61, c65, c66, c66, a56, c66, c65, c66, c66, b51, c66, c66, b51, c66, c66, c65, c65, c65, c68, c65, c68, c66, c65, b51, c66, c66, c66, c65, b09, c41, c41, c66, c66, c66, c66, c63, c66, c66, c02, c66, c41, c46, e87, c66, c45, c26, c66, c65, c41, c66, c65, c65, c66, c84, c65, c41, c65, c65, c84, c41, c66, c66, c66, c41, c41, c66, a56, b51, c66, c41, c66, c66, c41, c66, c65, c65, c66, c66, c65, c66, c66, c66, c66, c66, c66, c15, c65, b51, c65, c66, c68, c66, c65, c66, b52, c65, c65, c65, b31, c84, c66, c65, c66, c66, c66, b51, c66, b51, c66, a43, c68, b44, c66, c66, b36, c65, c66, c65, c66, c66, c66, c66, c66, c65, c41, c66, c66, c61, c66, a44

(Draws ...)
c66 = Closed Berlin . . . . . . . . . .
c65 = Berlin, Anderson
c15 = Winawer, Alekhine .
c10 = French, Rubinstein
c49 = Four knights, Nimzovitch
c68 = Spanish exchange
a55 = Old Indian
b58 = Sicilian, Boleslavsky
c48 = Spanish, Classical
c61 = Spanish, Birds
a56 = Benoni, Czech
b51 = Sicilian, Bb5+ Nc6
b09 = Pirc, Austrian
c41 = Philidor, Berger Variation .
c63 = Spanish, Schliemann
c02 = French, Advance
c46 = Three Knights, Schlechter Variation
e87 = King's Indian, Samisch
c45 = Scotch, Tartakower
c26 = Vienna
c84 = Spanish, Closed Center Attack .
b52 = Sicilian, Bb5+ Bd7
b31 = Sicilian, Rossolimo 3 ... g6
a43 = Old Benoni, Schmidt
b44 = Sicilian, Taimanov
b36 = Maroczy Bind
a44 = Old Benoni, Czech

Now 157 games and 16 draws for 3422 - 315 = 3107 elo performance.

Is anyone that did not believe changing their minds yet?

Can the detractors see what 44 million games of training would do for RomiChess?
If you are on a sidewalk and the covid goes beep beep
Just step aside or you might have a bit of heat
Covid covid runs through the town all day
Can the people ever change their ways
Sherwin the covid's after you
Sherwin if it catches you you're through