Has anyone tried this over here yet, that uses Windows? I just located the two needed dlls.
https://ci.appveyor.com/project/LeelaChessZero/lc0
LC0: support for windows mimalloc #1561
Moderators: hgm, Rebel, chrisw
-
- Posts: 6340
- Joined: Mon Mar 13, 2006 2:34 pm
- Location: Acworth, GA
LC0: support for windows mimalloc #1561
"Good decisions come from experience, and experience comes from bad decisions."
__________________________________________________________________
Ted Summers
__________________________________________________________________
Ted Summers
-
- Posts: 2555
- Joined: Fri Nov 26, 2010 2:00 pm
- Location: Czech Republic
- Full name: Martin Sedlak
Re: LC0: support for windows mimalloc #1561
hmm, I'm surprised they use heap for node allocation, which is a small fixed-size object.
I can imagine a simple custom pool allocator + small bufer for moves might do better, while still keeping the GC thread
you could even have 2 node types (evasion vs normal) and have a smaller "small buffer" for evasion nodes. interesting...
each heap allocation requires some extra alignment to block size + some extra bookkeeping like 16 extra bytes anyway
I can imagine a simple custom pool allocator + small bufer for moves might do better, while still keeping the GC thread
you could even have 2 node types (evasion vs normal) and have a smaller "small buffer" for evasion nodes. interesting...
each heap allocation requires some extra alignment to block size + some extra bookkeeping like 16 extra bytes anyway
Martin Sedlak
-
- Posts: 300
- Joined: Mon Apr 30, 2018 11:51 pm
Re: LC0: support for windows mimalloc #1561
The Windows allocator is notoriously bad, especially at high concurrency. That said, it's a bit odd you'd need to allocate so much in the first place.
I've had good success with jemalloc on Windows in the past; haven't heard of mimalloc.
I've had good success with jemalloc on Windows in the past; haven't heard of mimalloc.
-
- Posts: 12540
- Joined: Wed Mar 08, 2006 8:57 pm
- Location: Redmond, WA USA
Re: LC0: support for windows mimalloc #1561
I had to write a suballocator once in the early 1990s.
It was a special situation where thousands and thousands of medium sized blocks (just under 32K) of memory were allocated and freed very rapidly.
So I made a linked list of megabyte sized pools and for each block there was a bit-vector if a block was in use. The free operation just tagged a bit for the 32K block to say it was available (and did a memset of the block to zero since it could contain sensitive data). The allocation operation would search the first block for an open block (which was located as any int in the index not equal to 0xffffffff then getbit()). Then it would give its address to the requestor. If no blocks were found, it would go to the next block in the linked list. If all blocks in the entire list were filled it would allocate a new megabyte block. The same sort of trick will not work so simply if the requested memory objects are of wildly different sizes.
The system went from something that got slower and slower over time and finally crashed once every so often to something very responsive that did not crash. Since it was a system for customer support for millions of customers calling in over the phone {a large company}, that was pretty important, especially since all the live and unsaved messages were lost when it crashed. Today, we have memory mapped database tables (most big vendors like MS and Oracle support them along with dedicated memory based systems), so such a thing would be silly to write today.
If you do a benchmark of malloc()/free() or new/delete you might be astonished how slow it is.
It was a special situation where thousands and thousands of medium sized blocks (just under 32K) of memory were allocated and freed very rapidly.
So I made a linked list of megabyte sized pools and for each block there was a bit-vector if a block was in use. The free operation just tagged a bit for the 32K block to say it was available (and did a memset of the block to zero since it could contain sensitive data). The allocation operation would search the first block for an open block (which was located as any int in the index not equal to 0xffffffff then getbit()). Then it would give its address to the requestor. If no blocks were found, it would go to the next block in the linked list. If all blocks in the entire list were filled it would allocate a new megabyte block. The same sort of trick will not work so simply if the requested memory objects are of wildly different sizes.
The system went from something that got slower and slower over time and finally crashed once every so often to something very responsive that did not crash. Since it was a system for customer support for millions of customers calling in over the phone {a large company}, that was pretty important, especially since all the live and unsaved messages were lost when it crashed. Today, we have memory mapped database tables (most big vendors like MS and Oracle support them along with dedicated memory based systems), so such a thing would be silly to write today.
If you do a benchmark of malloc()/free() or new/delete you might be astonished how slow it is.
Taking ideas is not a vice, it is a virtue. We have another word for this. It is called learning.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.
But sharing ideas is an even greater virtue. We have another word for this. It is called teaching.