I understand that requests to the same bank have to be served serially, but what I wonder about is to what extent the latencies overlap in time.smatovic wrote: ↑Sun Apr 02, 2023 10:28 pmModern DRAMs have multiple banks to serve multiple memory requests in parallel. However, when two requests go to the same bank, they have to be served serially, exacerbating the high latency of on-chip memory. Adding more banks to the system to mitigate this problem incurs high system cost. Our goal in this work is to achieve the benefits of increasing the number of banks with a low-cost approach. To this end, we propose three new mechanisms, SALP-1, SALP-2, and MASA (Multitude of Activated Subarrays), to reduce the serialization of different requests that go to the same bank. The key observation exploited by our mechanisms is that a modern DRAM bank is implemented as a collection of subarrays that operate largely independently while sharing few global peripheral structures.
Suppose two cores access the same bank simulateneously. If one request takes 100ns, does this mean that the other request has to wait 100ns and then takes another 100ns, giving 200ns in total?