What is memory interleaving? Explain low-order m-way interleaving...

Memory Interleaving and Low-Order m-Way Interleaving

What is Memory Interleaving?

Memory interleaving is a technique used to increase main memory bandwidth by splitting memory into multiple independent modules (also called banks) and distributing consecutive addresses across these modules. By doing this, the system can overlap memory operations and serve multiple accesses in parallel, which is especially beneficial for sequential data access in high-performance processors.

Why Use Memory Interleaving?

Improves effective memory bandwidth without changing the latency of each module.
Supports parallel memory access, reducing stalls in pipelined CPUs.
Works well with sequential access patterns like instruction fetch, array traversal, and cache line fills.

Key Terms

m-way interleaving: Memory is split into m independent modules (banks).
Bank/module: An independent memory unit with its own address and data path.
Bank conflict: Two back-to-back requests targeting the same bank cause a stall until the bank becomes free.

Low-Order m-Way Interleaving: Concept

In low-order interleaving, the lowest-order address bits select the memory bank. This distributes consecutive word addresses across different banks in a round-robin fashion, enabling parallel access for sequential streams.

Core mapping for word-level interleaving:

bank = address mod m
row  = floor(address / m)

Meaning:

Address 0 goes to bank 0, address 1 to bank 1, …, address m−1 to bank m−1, address m to bank 0 again, and so on.
Consecutive addresses map to different banks, minimizing conflicts in sequential access.

Step-by-Step Example (4-Way Low-Order Interleaving)

Assume word addressing and 4 banks (m = 4). The mapping is:

Address  Bank  Row
0        0     0
1        1     0
2        2     0
3        3     0
4        0     1
5        1     1
6        2     1
7        3     1
8        0     2
...      ...   ...

When the processor reads a stream of consecutive words (0,1,2,3,4,…), each request goes to a different bank in turn. While bank 0 is busy serving address 0, banks 1, 2, and 3 can start the next requests. This pipelining increases throughput.

Throughput Intuition

If each bank has a busy time of Tm per access, a fully pipelined m-bank memory can ideally deliver about one word every Tm/m seconds (throughput ≈ m/Tm), assuming requests are perfectly distributed and the bus can keep up.
Actual throughput is limited by the slower of CPU request rate and memory/bus capability.

Block (Cache-Line) Interleaving

If transfers occur in blocks of B words (cache lines), you can interleave at block granularity:

block_number = floor(address / B)
bank         = block_number mod m

This spreads consecutive cache lines across banks, which is useful for multi-word bursts while keeping each block contiguous within a bank.

Advantages

High bandwidth for sequential access patterns (instruction fetch, vector operations, streaming data).
Better overlap of memory operations; reduced pipeline stalls.
Simple bank selection logic using low-order bits.

Limitations and Conflicts

Stride-related conflicts: Access patterns with stride that is a multiple of m may repeatedly hit the same bank, reducing parallelism.
Added controller complexity to schedule requests and avoid hot-spot banks.
Does not reduce single-access latency; it improves throughput.

Low-Order vs High-Order Interleaving (Quick Contrast)

Low-order interleaving: Uses low address bits for bank selection; spreads consecutive addresses; best for sequential access.
High-order interleaving: Uses high address bits; places large contiguous regions in the same bank; can be simpler for relocation but offers less benefit for sequential throughput.

Exam-Friendly Summary

Memory interleaving splits memory into m banks to enable parallel access.
Low-order m-way interleaving maps bank = address mod m, distributing consecutive addresses across banks.
Gives high bandwidth for sequential streams by pipelining accesses across banks.
Main challenge: bank conflicts with certain strides; controller must schedule requests to exploit parallelism.