SSD for table lookup

jwes · Post by **jwes** » Sun Jul 19, 2009 5:52 pm

ernest wrote:OK Vincent, so you are speaking about WDL bases (BitBases, ShredderBases, ...) and a better compression.

I don't want to reopen that big debate, but nobody seems to have a clear idea about how to converge to Mate with WDL bases alone.

The simple way to do this is to use WDL inside the search and DTM at the root.

ernest · Post by **ernest** » Sun Jul 19, 2009 6:37 pm

So you still need the (huge) DTM bases...

diep · Post by **diep** » Sun Jul 19, 2009 7:56 pm

ernest wrote:OK Vincent, so you are speaking about WDL bases (BitBases, ShredderBases, ...) and a better compression.

I don't want to reopen that big debate, but nobody seems to have a clear idea about how to converge to Mate with WDL bases alone.

You get more bonuspoints in tournaments if you announce the correct number of moves that you need to mate your opponent?

What is your argument, that we should drop nullmove from our search, as well as double nullmove, because it needs a few ply more to find a zugzwang?

This is a non-argument.

You realize i'm not doing mating extensions in Diep as well, in order to search deeper simply????????

I always wonder how some find arguments to say that producing bricks in an oval manner is better than square.

Even without EGTBs todays engines play near perfect in those far endgames.

Some tests mine from maximin positions in 5 men all were mates in the perfect amount of moves.

Now we can discuss long time you know, but all titled chessplayers will tell you that in the end, there is 3 possible results.

a) a win
b) a draw
c) a loss

However, not many will download the 1.3TB 6 men with DTM. Yet many soon will have all 6 men in WDL form.

So the alternative you have in tournaments is:

a) use all 5 men in nalimov format + KRPP KR
b) use all 6 men in WDL

At todays search depths and nps-es accessing 6 men from disk is too slow simply, or are you one of the guys who likes to study the 6 men in the EGTB-editor that SMK made in shredder classic?

Vincent

bob · Post by **bob** » Sun Jul 19, 2009 11:57 pm

diep wrote:
ernest wrote:
Vinvin wrote:5 pieces : Nalimov=7500 MB ; Shredderbase=157 MB
Ratio = about 50
Salut Vinvin,
Yes, but Nalimov is... Nalimov (tablebases, with DTM), while Shredderbase is BitBase (only W/L/D info) so you cannot use it as ratio for future tablebase compression.
Oh comeon, DTM only got used because they wanted to brag.

Now Nalimov, let's be clear i do not want to say he didn't do a lot of effort, as he did. He did do it and did do it within the given constraints very well (assuming you don't mind bizarre template definitions, that increase your executable size by half a megabyte). Never spit on someone who did do the hard job before you. There was thompson, Edwards and then Nalimov.

Now it is the 21th century and we want to improve upon it to something realistic. Using WDL is not new, already gets used for as long as i live nearly.

Many engines already are using WDL for some years now in chess. Already in 90s Mathias Feist told me he really wanted to switch to WDL for Fritz as well. Of course reality is you just know the name rybka and shredder and that's about it, but there is really a lot of engines out there, and really a lot of them are using WDL.

In checkers, both the american variation and 10x10 international checkers, they *always* only used win/draw/loss scores and never DTM nor any of the other exotic forms such as DTC that initially was getting used.

The compression ratio between DTM 1300GB and a gig, or when you do it sloppy some 20-40 gigs (see shredder), the difference is too huge to ignore.

The difficult task is a customized generic compression that is really good and up to todays standards. Compression is a science in itself.

From the oldie 80s and start 90s algorithms, Kadatch is doing a good job. Most use him. However when i test on my EGTBs, better forms of compression (so generic compression) are up to factor 4 better.

Hear me, FACTOR 4.

You overlook a _key_ point. You can get better compression. But not if you want to compress in "chunks" as we did when Eugene started this project. We wanted to decompress on the fly, without having to decompress the _entire_ file (a bit of a task for 6 piece files for sure, and for some 5-piece files as well). So yes, you can compress better, but you then can not decompress on the fly during the search without getting a far worse than 4x performance hit.

Kadatch and Nalimov had reasons for everything they did. We tested all sorts of ideas and blocksizes on my machines here at UAB before we decided on what was a "one-size-fits-all" approach everyone could live with.

Note that Compression for Nalimov's has different characteristics than for WDL. WDL allows way more n-th order dimensions (google for PPM). The bigger the EGTBs, the more you can achieve there.

However you must keep in mind that uncompressed in Diep format, the 6 men are 1.05 TB

that's already a lot tinier than the uncompressed version of Nalimov, which are i guess at least 5 times bigger.

Ignoring a factor 5 would be impossible, let alone a factor 100 to 1000.

Even default compression at diep's 6 men, they are 34GB compressed. So that is without special treatment yet, which we all know is possible.

Every EGTB i did do some effort for, has a compression ratio of factor 5000 effectively or more.

There is no hard 'perfect' solution for the problem of course.

It is always possible to do better. If you consider how much time EGTBs all together have eaten from me (writing 3 different generators, 4 different formats to store them in, joining the effort to spread them and so on), then the amount of elopoints EGTBs win in todays computer chess is pretty little, whereas the effect of them at checkers and 10x10 checkers is pretty devastating. Maybe it will come in chess also, who knows?

I do not know. I wouldn't dare to predict it to be honest.

Vincent

ernest · Post by **ernest** » Mon Jul 20, 2009 1:19 am

diep wrote:However, not many will download the 1.3TB 6 men with DTM. Yet many soon will have all 6 men in WDL form.

Agreed.
But I simply was saying that in some cases, WDL alone will not allow to converge to the win. Maybe we have to live with that...

diep · Post by **diep** » Mon Jul 20, 2009 2:09 am

bob wrote:
diep wrote:
ernest wrote:
Vinvin wrote:5 pieces : Nalimov=7500 MB ; Shredderbase=157 MB
Ratio = about 50
Salut Vinvin,
Yes, but Nalimov is... Nalimov (tablebases, with DTM), while Shredderbase is BitBase (only W/L/D info) so you cannot use it as ratio for future tablebase compression.
Oh comeon, DTM only got used because they wanted to brag.

Now Nalimov, let's be clear i do not want to say he didn't do a lot of effort, as he did. He did do it and did do it within the given constraints very well (assuming you don't mind bizarre template definitions, that increase your executable size by half a megabyte). Never spit on someone who did do the hard job before you. There was thompson, Edwards and then Nalimov.

Now it is the 21th century and we want to improve upon it to something realistic. Using WDL is not new, already gets used for as long as i live nearly.

Many engines already are using WDL for some years now in chess. Already in 90s Mathias Feist told me he really wanted to switch to WDL for Fritz as well. Of course reality is you just know the name rybka and shredder and that's about it, but there is really a lot of engines out there, and really a lot of them are using WDL.

In checkers, both the american variation and 10x10 international checkers, they *always* only used win/draw/loss scores and never DTM nor any of the other exotic forms such as DTC that initially was getting used.

The compression ratio between DTM 1300GB and a gig, or when you do it sloppy some 20-40 gigs (see shredder), the difference is too huge to ignore.

The difficult task is a customized generic compression that is really good and up to todays standards. Compression is a science in itself.

From the oldie 80s and start 90s algorithms, Kadatch is doing a good job. Most use him. However when i test on my EGTBs, better forms of compression (so generic compression) are up to factor 4 better.

Hear me, FACTOR 4.
You overlook a _key_ point. You can get better compression. But not if you want to compress in "chunks" as we did when Eugene started this project. We wanted to decompress on the fly, without having to decompress the _entire_ file (a bit of a task for 6 piece files for sure, and for some 5-piece files as well). So yes, you can compress better, but you then can not decompress on the fly during the search without getting a far worse than 4x performance hit.

Kadatch and Nalimov had reasons for everything they did. We tested all sorts of ideas and blocksizes on my machines here at UAB before we decided on what was a "one-size-fits-all" approach everyone could live with.

Note that Compression for Nalimov's has different characteristics than for WDL. WDL allows way more n-th order dimensions (google for PPM). The bigger the EGTBs, the more you can achieve there.

However you must keep in mind that uncompressed in Diep format, the 6 men are 1.05 TB

that's already a lot tinier than the uncompressed version of Nalimov, which are i guess at least 5 times bigger.

Ignoring a factor 5 would be impossible, let alone a factor 100 to 1000.

Even default compression at diep's 6 men, they are 34GB compressed. So that is without special treatment yet, which we all know is possible.

Every EGTB i did do some effort for, has a compression ratio of factor 5000 effectively or more.

There is no hard 'perfect' solution for the problem of course.

It is always possible to do better. If you consider how much time EGTBs all together have eaten from me (writing 3 different generators, 4 different formats to store them in, joining the effort to spread them and so on), then the amount of elopoints EGTBs win in todays computer chess is pretty little, whereas the effect of them at checkers and 10x10 checkers is pretty devastating. Maybe it will come in chess also, who knows?

I do not know. I wouldn't dare to predict it to be honest.

Vincent

You're ignoring the mainpoint Bob.

If you've got Nalimov describing in 2 bytes a position (when DTM and that is with more complex EGTBs nearly always the case).

1,2,43,54,1,13,2,3,4,7 etc

So i gave example of 10 entries here using in total 20 bytes
uncompressed.

the same in WDL eats 2 bytes uncompressed and its value is 0xff

Especially in the 42p, usually the side with 4 pieces+pawns outnumbers the 2. So that means that nearly every position is 0xff so to speak.

Now there is a lot of ways to make a decisiontree, but the keypoint is,
you can compress that factor 1000.

On other hand the DTM numbers are very difficult to compress well.
Kadatch really isn't doing that much of a worse job than worlds best compressor on the DTM numbers, as the 'luft' is easy to compress there.

WDL offers far more compression opportunities there than DTM.

That shows. Some EGTBs easily get a factor 5000 compression after some work. Look also to Stefan.

He's getting it in under 129MB for all 5 men. That's *COMPRESSED* 7.5 GB in Nalimov format.

In order to lookup faster during search (Stefan is a speedfreak),
he has a format that puts it in a somewhat bigger format that's
still very tiny.

So the best of the 2 EGTBs compared to each other, it is 129MB shredder vs 7500MB nalimov.

That's according to my calculator factor 58.

So additional to the initial factor 10 size difference in format (2 bytes per entry Nalimov vs 1.6 bits for WDL) he achieves a practical factor 5.8 on top of that. In reality it is far more than factor 5.8 of course that additional compression, because to look it up he doesn't store it in 1.6 bits but needs quite some more space to do achieve it.

Yet Nalimov in 5 men is 30 GB uncompressed and achieves factor 4 compression.

Shredder achieves there including the extra overhead needed,
far over 4 * 6 ==> factor 24

Add to that the size difference of factor 10.

So the total denominating difference weight difference is factor 240 in that case.

We can use that as a bound to predict the maximum size of the 6 men, whereas we already know that there is a lot more 'nonsense EGTBs' that do compress extremely well.

Note also Kadatch achieves better compression at the 6 men, but just a very little bit. From the above sequence {1,2,43,54,1,13,2,3,4,7} you can understand easily why, whereas with WDL you'll get more ranges where the piece configuration is such that it allows genius compression. Yet the fact that Kadatch wins more at the 6 men, meanwhile still using 8KB blocks,
is a clear demonstration of my point that with WDL the 6 men compress relative spoken BETTER than the 5 men, depending of course upon how much effort you put into it.

Knowing Stefan and his resources, i would guess they have put a lot more effort into the 5 men than into the 6 men. Just calculating the 6 men took more time.

Realize please that GENERATING the EGTBs is a lot simpler than supercompressing them. Supercompressing them requires *far more cpu time* than generating all of them.

Really, a factor 1000+ CPU time more or so, but it is relative simple achieving this as it is embarrassingly parallel to do so, and i know Stefan at home has quite some resources there.

Stefan's own time here is the expensive thing, not the CPU time.

On other hand Nalimov is an i/o based format. It requires massive i/o.
Only recently there is harddrives that can handle the entire set at once.

My assumption is that in future CPU speed scales easier than harddrive
storage let alone i/o speed.

i/o speed as we all know doesn't scale. This has been proven 10 years ago, that proof still is relevant and is not under discussion.

That also means that the LATENCY to the i/o hasn't become significant better. It used to be a millisecond or 10, it still is several milliseconds now, whereas CPU's are a factor 1000 faster now in the same timed period.

USB sticks are not a valid compare there for 2 reasons
a) they are more sensitive to errors
b) they can carry only tiny amounts of data compared to the 1.5TB disks
you can get cheap now.

Even then usb sticks, the fastest i tested had 1 millisecond latency.

Realize i'm so used to think in nanosecond latency, that i even have problems writing down milliseconds, as i wrote it initially down as microseconds, which is a factor 1000 too fast. In short milliseconds is factor 1 million times faster than the latency of cached RAM in nanoseconds is.

That difference is way too huge. Forget about decent bandwidth. SATA2 is there now, and if you get 133MB/s on the outer disks that's a lot.

On the inner disks it's maybe 30MB/s.

It is simply complicated to read EGTBs from disk during search.
Let's face it, you had in 2005 all EGTBs in the RAM in Reykjavik,
wasn't it?

So Crafty played with whatever you could put in RAM, which were
the 5 men, whereas you had all 6 men.

CPU's are ever more powerful now, but latency to the i/o hasn't improved and will not improve as fast as the cpu power increases in power.

Maybe we should cry loud that we want more RAM in the machines.
I did do a calculation on my own machine. Now that might not be such a great calculation, as i use old junk hardware.

My machine back in 1996 (october) had 80MB ram.
It's 2009 now. I've got 2 GB of ram in this box and that DDR ECC-reg ram isn't cheap. DDR2 is a lot cheaper and hopefully DDR3 will get cheap also.

Would i win the lottery now, i would get myself a 4 socket AMD mainboard and stick in 2 GB modules and fill it up entirely with 2 GB modules.
So that'll give i guess 32 or 64GB ram.

Fully filled back then the machines had 2GB ram.

So the correct comparision is 2 GB vs 128GB now. Note i'm being optimistic then, as it is in reality 64GB, but we'll find a way to find one that can handle 128GB somehow.

So 2GB vs 128 it is. That's factor 64.

Fastest PC box back then was 4 x 200Mhz pentiumpro.
800Mhz in total or so?

Fastest box now is 24 x 2.6Ghz AMD
Each core a lot faster than the 200Mhz pro back then.

Just based upon Ghz the difference is already 24 * 3.25 = 78.
There you go, a lot more than 64, yet we didn't fully add up yet
all kind of other differences of todays cores. IPC is like 2 times
better now or so. We already reach factor 150 then.

If we'd express it in gflops the difference is a lot bigger than 150.
Closer to like a 1000, and that's what they have lately
optimized those cpu's for especially, floating point.

Back then that required some monster hardware to have some speed in floating point. That's simple pc processors now.

i/o eats really little power compared to cpu power, yet that cpu power keeps scaling and i/o has huge bottlenecks to the cpu.

so my argument is that WDL in that respect is far superior.

To start with if you use some sort of naive cashing, WDL already stores factor 10 more positions in the same number of bytes. A HUGE ADVANTAGE.

In reality we put it supercompressed in RAM. Nalimov puts things uncompressed in RAM using a 32 bits kadatch compressor, which is why the files are chopped up in chunks of 2 GB uncompressed, though NTFS never had problems with that limit.

So from caching viewpoint you need 30GB of RAM just for the 5 men,
versus shredder needs 129MB there additional to that a small buffer of a few megabytes.

*that* is the compare.

It is too much of a difference.

Vincent

diep · Post by **diep** » Mon Jul 20, 2009 2:10 am

bob wrote:
diep wrote:
ernest wrote:
Vinvin wrote:5 pieces : Nalimov=7500 MB ; Shredderbase=157 MB
Ratio = about 50
Salut Vinvin,
Yes, but Nalimov is... Nalimov (tablebases, with DTM), while Shredderbase is BitBase (only W/L/D info) so you cannot use it as ratio for future tablebase compression.
Oh comeon, DTM only got used because they wanted to brag.

Now Nalimov, let's be clear i do not want to say he didn't do a lot of effort, as he did. He did do it and did do it within the given constraints very well (assuming you don't mind bizarre template definitions, that increase your executable size by half a megabyte). Never spit on someone who did do the hard job before you. There was thompson, Edwards and then Nalimov.

Now it is the 21th century and we want to improve upon it to something realistic. Using WDL is not new, already gets used for as long as i live nearly.

Many engines already are using WDL for some years now in chess. Already in 90s Mathias Feist told me he really wanted to switch to WDL for Fritz as well. Of course reality is you just know the name rybka and shredder and that's about it, but there is really a lot of engines out there, and really a lot of them are using WDL.

In checkers, both the american variation and 10x10 international checkers, they *always* only used win/draw/loss scores and never DTM nor any of the other exotic forms such as DTC that initially was getting used.

The compression ratio between DTM 1300GB and a gig, or when you do it sloppy some 20-40 gigs (see shredder), the difference is too huge to ignore.

The difficult task is a customized generic compression that is really good and up to todays standards. Compression is a science in itself.

From the oldie 80s and start 90s algorithms, Kadatch is doing a good job. Most use him. However when i test on my EGTBs, better forms of compression (so generic compression) are up to factor 4 better.

Hear me, FACTOR 4.
You overlook a _key_ point. You can get better compression. But not if you want to compress in "chunks" as we did when Eugene started this project. We wanted to decompress on the fly, without having to decompress the _entire_ file (a bit of a task for 6 piece files for sure, and for some 5-piece files as well). So yes, you can compress better, but you then can not decompress on the fly during the search without getting a far worse than 4x performance hit.

Kadatch and Nalimov had reasons for everything they did. We tested all sorts of ideas and blocksizes on my machines here at UAB before we decided on what was a "one-size-fits-all" approach everyone could live with.

Note that Compression for Nalimov's has different characteristics than for WDL. WDL allows way more n-th order dimensions (google for PPM). The bigger the EGTBs, the more you can achieve there.

However you must keep in mind that uncompressed in Diep format, the 6 men are 1.05 TB

that's already a lot tinier than the uncompressed version of Nalimov, which are i guess at least 5 times bigger.

Ignoring a factor 5 would be impossible, let alone a factor 100 to 1000.

Even default compression at diep's 6 men, they are 34GB compressed. So that is without special treatment yet, which we all know is possible.

Every EGTB i did do some effort for, has a compression ratio of factor 5000 effectively or more.

There is no hard 'perfect' solution for the problem of course.

It is always possible to do better. If you consider how much time EGTBs all together have eaten from me (writing 3 different generators, 4 different formats to store them in, joining the effort to spread them and so on), then the amount of elopoints EGTBs win in todays computer chess is pretty little, whereas the effect of them at checkers and 10x10 checkers is pretty devastating. Maybe it will come in chess also, who knows?

I do not know. I wouldn't dare to predict it to be honest.

Vincent

You're ignoring the mainpoint Bob.

If you've got Nalimov describing in 2 bytes a position (when DTM and that is with more complex EGTBs nearly always the case).

1,2,43,54,1,13,2,3,4,7 etc

So i gave example of 10 entries here using in total 20 bytes
uncompressed.

the same in WDL eats 2 bytes uncompressed and its value is 0xff

Especially in the 42p, usually the side with 4 pieces+pawns outnumbers the 2. So that means that nearly every position is 0xff so to speak.

Now there is a lot of ways to make a decisiontree, but the keypoint is,
you can compress that factor 1000.

On other hand the DTM numbers are very difficult to compress well.
Kadatch really isn't doing that much of a worse job than worlds best compressor on the DTM numbers, as the 'luft' is easy to compress there.

WDL offers far more compression opportunities there than DTM.

That shows. Some EGTBs easily get a factor 5000 compression after some work. Look also to Stefan.

He's getting it in under 129MB for all 5 men. That's *COMPRESSED* 7.5 GB in Nalimov format.

In order to lookup faster during search (Stefan is a speedfreak),
he has a format that puts it in a somewhat bigger format that's
still very tiny.

So the best of the 2 EGTBs compared to each other, it is 129MB shredder vs 7500MB nalimov.

That's according to my calculator factor 58.

So additional to the initial factor 10 size difference in format (2 bytes per entry Nalimov vs 1.6 bits for WDL) he achieves a practical factor 5.8 on top of that. In reality it is far more than factor 5.8 of course that additional compression, because to look it up he doesn't store it in 1.6 bits but needs quite some more space to do achieve it.

Yet Nalimov in 5 men is 30 GB uncompressed and achieves factor 4 compression.

Shredder achieves there including the extra overhead needed,
far over 4 * 6 ==> factor 24

Add to that the size difference of factor 10.

So the total denominating difference weight difference is factor 240 in that case.

We can use that as a bound to predict the maximum size of the 6 men, whereas we already know that there is a lot more 'nonsense EGTBs' that do compress extremely well.

Note also Kadatch achieves better compression at the 6 men, but just a very little bit. From the above sequence {1,2,43,54,1,13,2,3,4,7} you can understand easily why, whereas with WDL you'll get more ranges where the piece configuration is such that it allows genius compression. Yet the fact that Kadatch wins more at the 6 men, meanwhile still using 8KB blocks,
is a clear demonstration of my point that with WDL the 6 men compress relative spoken BETTER than the 5 men, depending of course upon how much effort you put into it.

Knowing Stefan and his resources, i would guess they have put a lot more effort into the 5 men than into the 6 men. Just calculating the 6 men took more time.

Realize please that GENERATING the EGTBs is a lot simpler than supercompressing them. Supercompressing them requires *far more cpu time* than generating all of them.

Really, a factor 1000+ CPU time more or so, but it is relative simple achieving this as it is embarrassingly parallel to do so, and i know Stefan at home has quite some resources there.

Stefan's own time here is the expensive thing, not the CPU time.

On other hand Nalimov is an i/o based format. It requires massive i/o.
Only recently there is harddrives that can handle the entire set at once.

My assumption is that in future CPU speed scales easier than harddrive
storage let alone i/o speed.

i/o speed as we all know doesn't scale. This has been proven 10 years ago, that proof still is relevant and is not under discussion.

That also means that the LATENCY to the i/o hasn't become significant better. It used to be a millisecond or 10, it still is several milliseconds now, whereas CPU's are a factor 1000 faster now in the same timed period.

USB sticks are not a valid compare there for 2 reasons
a) they are more sensitive to errors
b) they can carry only tiny amounts of data compared to the 1.5TB disks
you can get cheap now.

Even then usb sticks, the fastest i tested had 1 millisecond latency.

Realize i'm so used to think in nanosecond latency, that i even have problems writing down milliseconds, as i wrote it initially down as microseconds, which is a factor 1000 too fast. In short milliseconds is factor 1 million times faster than the latency of cached RAM in nanoseconds is.

That difference is way too huge. Forget about decent bandwidth. SATA2 is there now, and if you get 133MB/s on the outer disks that's a lot.

On the inner disks it's maybe 30MB/s.

It is simply complicated to read EGTBs from disk during search.
Let's face it, you had in 2005 all EGTBs in the RAM in Reykjavik,
wasn't it?

So Crafty played with whatever you could put in RAM, which were
the 5 men, whereas you had all 6 men.

CPU's are ever more powerful now, but latency to the i/o hasn't improved and will not improve as fast as the cpu power increases in power.

Maybe we should cry loud that we want more RAM in the machines.
I did do a calculation on my own machine. Now that might not be such a great calculation, as i use old junk hardware.

My machine back in 1996 (october) had 80MB ram.
It's 2009 now. I've got 2 GB of ram in this box and that DDR ECC-reg ram isn't cheap. DDR2 is a lot cheaper and hopefully DDR3 will get cheap also.

Would i win the lottery now, i would get myself a 4 socket AMD mainboard and stick in 2 GB modules and fill it up entirely with 2 GB modules.
So that'll give i guess 32 or 64GB ram.

Fully filled back then the machines had 2GB ram.

So the correct comparision is 2 GB vs 128GB now. Note i'm being optimistic then, as it is in reality 64GB, but we'll find a way to find one that can handle 128GB somehow.

So 2GB vs 128 it is. That's factor 64.

Fastest PC box back then was 4 x 200Mhz pentiumpro.
800Mhz in total or so?

Fastest box now is 24 x 2.6Ghz AMD
Each core a lot faster than the 200Mhz pro back then.

Just based upon Ghz the difference is already 24 * 3.25 = 78.
There you go, a lot more than 64, yet we didn't fully add up yet
all kind of other differences of todays cores. IPC is like 2 times
better now or so. We already reach factor 150 then.

If we'd express it in gflops the difference is a lot bigger than 150.
Closer to like a 1000, and that's what they have lately
optimized those cpu's for especially, floating point.

Back then that required some monster hardware to have some speed in floating point. That's simple pc processors now.

i/o eats really little power compared to cpu power, yet that cpu power keeps scaling and i/o has huge bottlenecks to the cpu.

so my argument is that WDL in that respect is far superior.

To start with if you use some sort of naive cashing, WDL already stores factor 10 more positions in the same number of bytes. A HUGE ADVANTAGE.

In reality we put it supercompressed in RAM. Nalimov puts things uncompressed in RAM using a 32 bits kadatch compressor, which is why the files are chopped up in chunks of 2 GB uncompressed, though NTFS never had problems with that limit.

So from caching viewpoint you need 30GB of RAM just for the 5 men,
versus shredder needs 129MB there additional to that a small buffer of a few megabytes.

*that* is the compare.

It is like comparing a kiloton nalimov h-bomb with a megaton nuke.

What will have more impact?

Vincent

diep · Post by **diep** » Mon Jul 20, 2009 2:17 am

jwes wrote:
ernest wrote:OK Vincent, so you are speaking about WDL bases (BitBases, ShredderBases, ...) and a better compression.

I don't want to reopen that big debate, but nobody seems to have a clear idea about how to converge to Mate with WDL bases alone.
The simple way to do this is to use WDL inside the search and DTM at the root.

Not at all.

Realize how many authors have WDL egtbs by now, and they all mate their opponents. Ever saw 1 complaint?

Maybe what you missed is a paper describing to you how in the 80s Chinook used WDL's to play checkers?

Maybe give an email to Jonathan Schaeffer to explain to you how this problem was already solved 30 years ago?

Vincent

bob · Post by **bob** » Mon Jul 20, 2009 6:22 am

diep wrote:
bob wrote:
diep wrote:
ernest wrote:
Vinvin wrote:5 pieces : Nalimov=7500 MB ; Shredderbase=157 MB
Ratio = about 50
Salut Vinvin,
Yes, but Nalimov is... Nalimov (tablebases, with DTM), while Shredderbase is BitBase (only W/L/D info) so you cannot use it as ratio for future tablebase compression.
Oh comeon, DTM only got used because they wanted to brag.

Now Nalimov, let's be clear i do not want to say he didn't do a lot of effort, as he did. He did do it and did do it within the given constraints very well (assuming you don't mind bizarre template definitions, that increase your executable size by half a megabyte). Never spit on someone who did do the hard job before you. There was thompson, Edwards and then Nalimov.

Now it is the 21th century and we want to improve upon it to something realistic. Using WDL is not new, already gets used for as long as i live nearly.

Many engines already are using WDL for some years now in chess. Already in 90s Mathias Feist told me he really wanted to switch to WDL for Fritz as well. Of course reality is you just know the name rybka and shredder and that's about it, but there is really a lot of engines out there, and really a lot of them are using WDL.

In checkers, both the american variation and 10x10 international checkers, they *always* only used win/draw/loss scores and never DTM nor any of the other exotic forms such as DTC that initially was getting used.

The compression ratio between DTM 1300GB and a gig, or when you do it sloppy some 20-40 gigs (see shredder), the difference is too huge to ignore.

The difficult task is a customized generic compression that is really good and up to todays standards. Compression is a science in itself.

From the oldie 80s and start 90s algorithms, Kadatch is doing a good job. Most use him. However when i test on my EGTBs, better forms of compression (so generic compression) are up to factor 4 better.

Hear me, FACTOR 4.
You overlook a _key_ point. You can get better compression. But not if you want to compress in "chunks" as we did when Eugene started this project. We wanted to decompress on the fly, without having to decompress the _entire_ file (a bit of a task for 6 piece files for sure, and for some 5-piece files as well). So yes, you can compress better, but you then can not decompress on the fly during the search without getting a far worse than 4x performance hit.

Kadatch and Nalimov had reasons for everything they did. We tested all sorts of ideas and blocksizes on my machines here at UAB before we decided on what was a "one-size-fits-all" approach everyone could live with.

Note that Compression for Nalimov's has different characteristics than for WDL. WDL allows way more n-th order dimensions (google for PPM). The bigger the EGTBs, the more you can achieve there.

However you must keep in mind that uncompressed in Diep format, the 6 men are 1.05 TB

that's already a lot tinier than the uncompressed version of Nalimov, which are i guess at least 5 times bigger.

Ignoring a factor 5 would be impossible, let alone a factor 100 to 1000.

Even default compression at diep's 6 men, they are 34GB compressed. So that is without special treatment yet, which we all know is possible.

Every EGTB i did do some effort for, has a compression ratio of factor 5000 effectively or more.

There is no hard 'perfect' solution for the problem of course.

It is always possible to do better. If you consider how much time EGTBs all together have eaten from me (writing 3 different generators, 4 different formats to store them in, joining the effort to spread them and so on), then the amount of elopoints EGTBs win in todays computer chess is pretty little, whereas the effect of them at checkers and 10x10 checkers is pretty devastating. Maybe it will come in chess also, who knows?

I do not know. I wouldn't dare to predict it to be honest.

Vincent
You're ignoring the mainpoint Bob.

If you've got Nalimov describing in 2 bytes a position (when DTM and that is with more complex EGTBs nearly always the case).

1,2,43,54,1,13,2,3,4,7 etc

So i gave example of 10 entries here using in total 20 bytes
uncompressed.

the same in WDL eats 2 bytes uncompressed and its value is 0xff

Especially in the 42p, usually the side with 4 pieces+pawns outnumbers the 2. So that means that nearly every position is 0xff so to speak.

Now there is a lot of ways to make a decisiontree, but the keypoint is,
you can compress that factor 1000.

On other hand the DTM numbers are very difficult to compress well.
Kadatch really isn't doing that much of a worse job than worlds best compressor on the DTM numbers, as the 'luft' is easy to compress there.

WDL offers far more compression opportunities there than DTM.

That shows. Some EGTBs easily get a factor 5000 compression after some work. Look also to Stefan.

He's getting it in under 129MB for all 5 men. That's *COMPRESSED* 7.5 GB in Nalimov format.

In order to lookup faster during search (Stefan is a speedfreak),
he has a format that puts it in a somewhat bigger format that's
still very tiny.

So the best of the 2 EGTBs compared to each other, it is 129MB shredder vs 7500MB nalimov.

That's according to my calculator factor 58.

So additional to the initial factor 10 size difference in format (2 bytes per entry Nalimov vs 1.6 bits for WDL) he achieves a practical factor 5.8 on top of that. In reality it is far more than factor 5.8 of course that additional compression, because to look it up he doesn't store it in 1.6 bits but needs quite some more space to do achieve it.

Yet Nalimov in 5 men is 30 GB uncompressed and achieves factor 4 compression.

Shredder achieves there including the extra overhead needed,
far over 4 * 6 ==> factor 24

Add to that the size difference of factor 10.

So the total denominating difference weight difference is factor 240 in that case.

We can use that as a bound to predict the maximum size of the 6 men, whereas we already know that there is a lot more 'nonsense EGTBs' that do compress extremely well.

Note also Kadatch achieves better compression at the 6 men, but just a very little bit. From the above sequence {1,2,43,54,1,13,2,3,4,7} you can understand easily why, whereas with WDL you'll get more ranges where the piece configuration is such that it allows genius compression. Yet the fact that Kadatch wins more at the 6 men, meanwhile still using 8KB blocks,
is a clear demonstration of my point that with WDL the 6 men compress relative spoken BETTER than the 5 men, depending of course upon how much effort you put into it.

Knowing Stefan and his resources, i would guess they have put a lot more effort into the 5 men than into the 6 men. Just calculating the 6 men took more time.

Realize please that GENERATING the EGTBs is a lot simpler than supercompressing them. Supercompressing them requires *far more cpu time* than generating all of them.

Really, a factor 1000+ CPU time more or so, but it is relative simple achieving this as it is embarrassingly parallel to do so, and i know Stefan at home has quite some resources there.

Stefan's own time here is the expensive thing, not the CPU time.

On other hand Nalimov is an i/o based format. It requires massive i/o.
Only recently there is harddrives that can handle the entire set at once.

My assumption is that in future CPU speed scales easier than harddrive
storage let alone i/o speed.

i/o speed as we all know doesn't scale. This has been proven 10 years ago, that proof still is relevant and is not under discussion.

That also means that the LATENCY to the i/o hasn't become significant better. It used to be a millisecond or 10, it still is several milliseconds now, whereas CPU's are a factor 1000 faster now in the same timed period.

USB sticks are not a valid compare there for 2 reasons
a) they are more sensitive to errors
b) they can carry only tiny amounts of data compared to the 1.5TB disks
you can get cheap now.

Even then usb sticks, the fastest i tested had 1 millisecond latency.

Realize i'm so used to think in nanosecond latency, that i even have problems writing down milliseconds, as i wrote it initially down as microseconds, which is a factor 1000 too fast. In short milliseconds is factor 1 million times faster than the latency of cached RAM in nanoseconds is.

That difference is way too huge. Forget about decent bandwidth. SATA2 is there now, and if you get 133MB/s on the outer disks that's a lot.

On the inner disks it's maybe 30MB/s.

It is simply complicated to read EGTBs from disk during search.
Let's face it, you had in 2005 all EGTBs in the RAM in Reykjavik,
wasn't it?

So Crafty played with whatever you could put in RAM, which were
the 5 men, whereas you had all 6 men.

CPU's are ever more powerful now, but latency to the i/o hasn't improved and will not improve as fast as the cpu power increases in power.

Maybe we should cry loud that we want more RAM in the machines.
I did do a calculation on my own machine. Now that might not be such a great calculation, as i use old junk hardware.

My machine back in 1996 (october) had 80MB ram.
It's 2009 now. I've got 2 GB of ram in this box and that DDR ECC-reg ram isn't cheap. DDR2 is a lot cheaper and hopefully DDR3 will get cheap also.

Would i win the lottery now, i would get myself a 4 socket AMD mainboard and stick in 2 GB modules and fill it up entirely with 2 GB modules.
So that'll give i guess 32 or 64GB ram.

Fully filled back then the machines had 2GB ram.

So the correct comparision is 2 GB vs 128GB now. Note i'm being optimistic then, as it is in reality 64GB, but we'll find a way to find one that can handle 128GB somehow.

So 2GB vs 128 it is. That's factor 64.

Fastest PC box back then was 4 x 200Mhz pentiumpro.
800Mhz in total or so?

Fastest box now is 24 x 2.6Ghz AMD
Each core a lot faster than the 200Mhz pro back then.

Just based upon Ghz the difference is already 24 * 3.25 = 78.
There you go, a lot more than 64, yet we didn't fully add up yet
all kind of other differences of todays cores. IPC is like 2 times
better now or so. We already reach factor 150 then.

If we'd express it in gflops the difference is a lot bigger than 150.
Closer to like a 1000, and that's what they have lately
optimized those cpu's for especially, floating point.

Back then that required some monster hardware to have some speed in floating point. That's simple pc processors now.

i/o eats really little power compared to cpu power, yet that cpu power keeps scaling and i/o has huge bottlenecks to the cpu.

so my argument is that WDL in that respect is far superior.

To start with if you use some sort of naive cashing, WDL already stores factor 10 more positions in the same number of bytes. A HUGE ADVANTAGE.

In reality we put it supercompressed in RAM. Nalimov puts things uncompressed in RAM using a 32 bits kadatch compressor, which is why the files are chopped up in chunks of 2 GB uncompressed, though NTFS never had problems with that limit.

So from caching viewpoint you need 30GB of RAM just for the 5 men,
versus shredder needs 129MB there additional to that a small buffer of a few megabytes.

*that* is the compare.

It is too much of a difference.

Vincent

I don't disagree that wld tables compress better. But they have their own set of problems in that you have to make progress and they don't tell you how. So you need something else. With DTM you don't need a thing, except it would be nice to have a DTM/DTZ combination (discussed often) so that we get the best of both worlds and don't violate 50 move rule considerations as we do now with DTM.

SSD for table lookup

Re: SSD for table lookup

Re: SSD for table lookup

Re: SSD for table lookup

Re: SSD for table lookup

Re: SSD for table lookup

Re: SSD for table lookup

Re: SSD for table lookup

Re: SSD for table lookup

Re: SSD for table lookup