TalkChess.com

Posted: **Thu Jul 30, 2020 8:37 pm**

towforce wrote: ↑Thu Jul 30, 2020 6:19 pm
towforce wrote: ↑Thu Jul 30, 2020 5:33 pm2. if you're only going to be training NNs, and not, say, modelling stampeding yaks, you'd probably be better off buying a TPU (tensor processing unit - a bit like a GPU, but all arithmetic in low precision). Google sell their TPU under the "Coral Edge" brand for some reason, and you can buy it as either a board or a USB device - link

Apologies - those devices are for running a trained and compiled net - not for learning.

Could still be useful for running chess engines with large, trained nets.

The mini-gumstick vesion is fifty bucks for 4TFlops.
Goodness, gracious.
$500 would be 40 TFlops.
$1000, 80 TFlops.
$10,000 800 TFlops (closing in on a petaflop for $10,000)
I am impressed

Posted: **Thu Jul 30, 2020 8:59 pm**

Dann Corbit wrote: ↑Thu Jul 30, 2020 8:37 pm
The mini-gumstick vesion is fifty bucks for 4TFlops.
Goodness, gracious.
$500 would be 40 TFlops.
$1000, 80 TFlops.
$10,000 800 TFlops (closing in on a petaflop for $10,000)
I am impressed

Thank you - that's kind of you to say.

I know you prefer to own than to rent, but from link, on GCE (Google Compute Engine), you can rent a 32 TPU V3 processor: each TPU will give you 90 tflops, for a total of 90*32=2880 tflops.

So that's 3 pflops for $32 per hour!

Per cloud thread (link), as far as I can tell, your fastest option would be 184 petaflops (but remember you would only be able to do machine learning (ML) tasks because this is all low precision arithmetic).

Posted: **Fri Jul 31, 2020 12:34 am**

chrisw wrote: ↑Thu Jul 30, 2020 9:08 am Rough maths suggests he has at least 1000 cores to do that. If we call several = 5

5 billion fens a day = 50000 fens per second, which if he samples at 25% implies 12500 games per second. One game every 0.08 ms. For massively fast bullet that needs 1000 cores at least, I think.
1000 cores would allow for game in 80ms

The fastest I can get games to play without having time losses is about 28msec per ply. The average depth reached is ply 13-14. I think this is much deeper than what he is using. I thought I read he was using an 8-ply search but I could be wrong. I went back and tried to locate where I read this and was unable to find it. But if that is true, this is a 5 to 6 ply difference in search depth. At a branching factor of 1.6 this implies that I could get to 8-plies in about 2.8 milli-seconds if I had the appropriate software. That's ~ 357 plies per second per core or ~ 31M plies per day per core.

But I think this misses the point. The point is, there is little reason to produce so many nets per day. Two or three per week would be plenty and would allow each net to be better tested. Having to store all those nets is a waste of space. Producing all the training data is fine, but the number of releases is excessive.

The ELO increase seems to have declined markedly after about 30B fens.

Regards,

Zenmastur

Posted: **Fri Jul 31, 2020 3:34 am**

mmt wrote: ↑Thu Jul 30, 2020 7:51 am
ChickenLogic wrote: ↑Thu Jul 30, 2020 4:48 am Very recently ttak added an arch that supports switching based on pieces left rather than gameply.
Having weights for each piece might be even better. But the best would be to have a small net trained to select the best evaluation net.

What I've seen is that NNUE is extremely sensitive to opening selection, I can mark one move red in the book and get 50 elo swings.

So maybe it'd be fruitful to train nets for specific openings, then train a net that picks other nets that are best in those pawn/pieces structures. Like "oh, this is a Sicilian, I'll use this net. Oh wait, we've transposed to a Najdorf, I'll switch to this killer net."

Posted: **Fri Jul 31, 2020 4:12 am**

Ovyron wrote: ↑Fri Jul 31, 2020 3:34 am
mmt wrote: ↑Thu Jul 30, 2020 7:51 am
ChickenLogic wrote: ↑Thu Jul 30, 2020 4:48 am Very recently ttak added an arch that supports switching based on pieces left rather than gameply.
Having weights for each piece might be even better. But the best would be to have a small net trained to select the best evaluation net.
What I've seen is that NNUE is extremely sensitive to opening selection, I can mark one move red in the book and get 50 elo swings.

So maybe it'd be fruitful to train nets for specific openings, then train a net that picks other nets that are best in those pawn/pieces structures. Like "oh, this is a Sicilian, I'll use this net. Oh wait, we've transposed to a Najdorf, I'll switch to this killer net."

I've noticed some unusual and VERY interesting behaviors as well.

Posted: **Fri Jul 31, 2020 4:35 am**

Zenmastur wrote: ↑Fri Jul 31, 2020 4:12 am I've noticed some unusual and VERY interesting behaviors as well.

Yeah, I had a record time of analysis obsolescence. Before NNUE I could count about a 11 months window (analysis from around September 2019 was becoming obsolete.) After NNUE, I can say that ALL my analysis with Stockfish dev became obsolete in one shot (because it's unreliable, the analysis can't be mixed, because the backsolved wrong score of a single Stockfish dev analyzed line could mess everything up.)

I can finally say that all the positions I've analyzed have been nothing but wasted time now

- but NNUE has leveled the field, and draw rates have gone down, there has never been a better point to start from scratch!

Posted: **Fri Jul 31, 2020 12:15 pm**

chrisw wrote: ↑Thu Jul 30, 2020 9:08 am
Zenmastur wrote: ↑Thu Jul 30, 2020 3:30 am
ChickenLogic wrote: ↑Thu Jul 30, 2020 12:30 am Haven't read all of this thread so excuse me if it was already mentioned: tttak's fork has a lot of different architectures to try out. I'm interested in the "Gameply40x4" one which effectively holds 4 NNs in one file which get used sequentially switching every 40 plies until the 4th is loaded. This sacrifices no speed over the regular halfkp-nets with the same input layer size. It'll require at least 4x more training data though.
Well, considering Sergio is generating 1,000,000,000 fens several time per day

Rough maths suggests he has at least 1000 cores to do that. If we call several = 5

5 billion fens a day = 50000 fens per second, which if he samples at 25% implies 12500 games per second. One game every 0.08 ms. For massively fast bullet that needs 1000 cores at least, I think.
1000 cores would allow for game in 80ms

the amount of training doesn't seem to be a huge problem. So what if only one new net is released everyday or two on overage. The nets are coming out so fast they can't be tested properly before the next one is ready for testing. Slowing down a bit on the releases wouldn't be a bad thing. He's released 15 nets in the last two days. This would be reduced to about 3 if it takes 4 times as much data per release.

I think splitting it by number of plies played is a mistake. You can be in an endgame in 40 plies. Better to do it on the number of pieces and pawns left on the board. Every capture decrements the piece count by one so it's easy to keep track of. The other advantage is the depth of search can be varied between opening, middlegame, engame, and TB positions ect.

ChickenLogic wrote: ↑Thu Jul 30, 2020 12:30 am With weaker nets this will introduce more eval instability of course but I think a "more correct" eval will benefit search more than eval instability will hurt it. Maybe it is also possible to make NNUE viable on really old hw with "Gameply40x4" nets with smaller input layers.
I think this concept of multiple nets will work better for SF than for Leela. The most obvious reason is that it takes less positions to saturate NNUE nets and thus we can experiment much more.
The nets will eventually saturate regardless of size, so weaker nets would only be a temporary condition.

Regards,

Zenmastur

1 fen = 1 position.
Not 1 game.

Posted: **Fri Jul 31, 2020 12:16 pm**

Zenmastur wrote: ↑Fri Jul 31, 2020 12:34 am
chrisw wrote: ↑Thu Jul 30, 2020 9:08 am Rough maths suggests he has at least 1000 cores to do that. If we call several = 5

5 billion fens a day = 50000 fens per second, which if he samples at 25% implies 12500 games per second. One game every 0.08 ms. For massively fast bullet that needs 1000 cores at least, I think.
1000 cores would allow for game in 80ms
The fastest I can get games to play without having time losses is about 28msec per ply. The average depth reached is ply 13-14. I think this is much deeper than what he is using. I thought I read he was using an 8-ply search but I could be wrong. I went back and tried to locate where I read this and was unable to find it. But if that is true, this is a 5 to 6 ply difference in search depth. At a branching factor of 1.6 this implies that I could get to 8-plies in about 2.8 milli-seconds if I had the appropriate software. That's ~ 357 plies per second per core or ~ 31M plies per day per core.

But I think this misses the point. The point is, there is little reason to produce so many nets per day. Two or three per week would be plenty and would allow each net to be better tested. Having to store all those nets is a waste of space. Producing all the training data is fine, but the number of releases is excessive.

The ELO increase seems to have declined markedly after about 30B fens.

Regards,

Zenmastur

Sergio switched to multiPV depth 10 when (singlePV) depth 12 stopped getting results.

Posted: **Fri Jul 31, 2020 12:31 pm**

Raphexon wrote: ↑Fri Jul 31, 2020 12:16 pm
Zenmastur wrote: ↑Fri Jul 31, 2020 12:34 am
chrisw wrote: ↑Thu Jul 30, 2020 9:08 am Rough maths suggests he has at least 1000 cores to do that. If we call several = 5

5 billion fens a day = 50000 fens per second, which if he samples at 25% implies 12500 games per second. One game every 0.08 ms. For massively fast bullet that needs 1000 cores at least, I think.
1000 cores would allow for game in 80ms
The fastest I can get games to play without having time losses is about 28msec per ply. The average depth reached is ply 13-14. I think this is much deeper than what he is using. I thought I read he was using an 8-ply search but I could be wrong. I went back and tried to locate where I read this and was unable to find it. But if that is true, this is a 5 to 6 ply difference in search depth. At a branching factor of 1.6 this implies that I could get to 8-plies in about 2.8 milli-seconds if I had the appropriate software. That's ~ 357 plies per second per core or ~ 31M plies per day per core.

But I think this misses the point. The point is, there is little reason to produce so many nets per day. Two or three per week would be plenty and would allow each net to be better tested. Having to store all those nets is a waste of space. Producing all the training data is fine, but the number of releases is excessive.

The ELO increase seems to have declined markedly after about 30B fens.

Regards,

Zenmastur
Sergio switched to multiPV depth 10 when (singlePV) depth 12 stopped getting results.

Well, it looks like he's over due to change his strategy since there hasn't been any progress to speak of since net 0109. I'm running an 8,000 game match between 0109 and 0821 and after 3700 games 0109 is in the lead.

Posted: **Wed Aug 05, 2020 2:04 pm**

Ovyron wrote: ↑Fri Jul 31, 2020 4:35 am
Zenmastur wrote: ↑Fri Jul 31, 2020 4:12 am I've noticed some unusual and VERY interesting behaviors as well.
Yeah, I had a record time of analysis obsolescence. Before NNUE I could count about a 11 months window (analysis from around September 2019 was becoming obsolete.) After NNUE, I can say that ALL my analysis with Stockfish dev became obsolete in one shot (because it's unreliable, the analysis can't be mixed, because the backsolved wrong score of a single Stockfish dev analyzed line could mess everything up.)

I can finally say that all the positions I've analyzed have been nothing but wasted time now - but NNUE has leveled the field, and draw rates have gone down, there has never been a better point to start from scratch!

UPDATE - The field has leveled nothing now, even people that do daily updates to their books have sunk in elo and are struggling to recover it. Realistically I'm playing some 100 elo higher than before NNUE, but my elo has sunk to 100 less than what it was before, because the bigger your hardware the better NNUE gets, so the people at the top increased over their already high performance, while the below average performance can't hold up.

The skill level has greatly widened.

While before I could be out-searched by 10 depth and still survive, now being outsearched by 3 depth can be fatal. The score can be 0.30 now, 1.30 3 depths later. The analysis of the first net I downloaded is getting obsolete now, I wonder how long current one will last.

It's as if what NNUE did was getting the game into the most difficult positions that it can, if you think about it that's like Contempt on steroids, once you're on those positions every node searched counts, this might have been the point in history where hardware became more important than the book used.

TalkChess.com

SF-NNUE going forward...

Re: SF-NNUE going forward...

Re: SF-NNUE going forward...

Re: SF-NNUE going forward...

Re: SF-NNUE going forward...

Re: SF-NNUE going forward...

Re: SF-NNUE going forward...

Re: SF-NNUE going forward...

Re: SF-NNUE going forward...

Re: SF-NNUE going forward...

Re: SF-NNUE going forward...