LC0: support for windows mimalloc #1561

Discussion of anything and everything relating to chess playing software and machines.

Moderators: hgm, Rebel, chrisw

User avatar
AdminX
Posts: 6340
Joined: Mon Mar 13, 2006 2:34 pm
Location: Acworth, GA

LC0: support for windows mimalloc #1561

Post by AdminX »

Has anyone tried this over here yet, that uses Windows? I just located the two needed dlls.

https://ci.appveyor.com/project/LeelaChessZero/lc0


Image
"Good decisions come from experience, and experience comes from bad decisions."
__________________________________________________________________
Ted Summers
mar
Posts: 2555
Joined: Fri Nov 26, 2010 2:00 pm
Location: Czech Republic
Full name: Martin Sedlak

Re: LC0: support for windows mimalloc #1561

Post by mar »

hmm, I'm surprised they use heap for node allocation, which is a small fixed-size object.
I can imagine a simple custom pool allocator + small bufer for moves might do better, while still keeping the GC thread
you could even have 2 node types (evasion vs normal) and have a smaller "small buffer" for evasion nodes. interesting...
each heap allocation requires some extra alignment to block size + some extra bookkeeping like 16 extra bytes anyway
Martin Sedlak
Sesse
Posts: 300
Joined: Mon Apr 30, 2018 11:51 pm

Re: LC0: support for windows mimalloc #1561

Post by Sesse »

The Windows allocator is notoriously bad, especially at high concurrency. That said, it's a bit odd you'd need to allocate so much in the first place.

I've had good success with jemalloc on Windows in the past; haven't heard of mimalloc.
Dann Corbit
Posts: 12540
Joined: Wed Mar 08, 2006 8:57 pm
Location: Redmond, WA USA

Re: LC0: support for windows mimalloc #1561

Post by Dann Corbit »

I had to write a suballocator once in the early 1990s.

It was a special situation where thousands and thousands of medium sized blocks (just under 32K) of memory were allocated and freed very rapidly.
So I made a linked list of megabyte sized pools and for each block there was a bit-vector if a block was in use. The free operation just tagged a bit for the 32K block to say it was available (and did a memset of the block to zero since it could contain sensitive data). The allocation operation would search the first block for an open block (which was located as any int in the index not equal to 0xffffffff then getbit()). Then it would give its address to the requestor. If no blocks were found, it would go to the next block in the linked list. If all blocks in the entire list were filled it would allocate a new megabyte block. The same sort of trick will not work so simply if the requested memory objects are of wildly different sizes.

The system went from something that got slower and slower over time and finally crashed once every so often to something very responsive that did not crash. Since it was a system for customer support for millions of customers calling in over the phone {a large company}, that was pretty important, especially since all the live and unsaved messages were lost when it crashed. Today, we have memory mapped database tables (most big vendors like MS and Oracle support them along with dedicated memory based systems), so such a thing would be silly to write today.

If you do a benchmark of malloc()/free() or new/delete you might be astonished how slow it is.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.