CampusFlow

🔄 Direct Memory Access (DMA)

DMA is a hardware mechanism that allows peripheral devices to transfer data directly to and from main memory without involving the CPU, dramatically improving system throughput.

What is DMA?

â„šī¸

Definition

Direct Memory Access (DMA) is a feature that enables I/O devices to access memory directly, bypassing the CPU. A dedicated DMA controller manages the data transfer, freeing the CPU to execute other tasks. This is essential for high-bandwidth devices like disk drives, graphics cards, and network interfaces.

CPU Overhead Reduction

Without DMA, CPU copies every byte between device and memory. With DMA, CPU only sets up the transfer.

Parallel Operation

While DMA transfers data, CPU can continue executing instructions, enabling true parallelism.

DMA Controller Functionality

â„šī¸

DMA Controller Components

The DMA controller acts as a bus master that can read from and write to memory independently. It contains registers for source address, destination address, transfer count, control flags, and status.

Source Address Register

Points to the memory location where data is read from (for memory-to-I/O transfers).

Destination Address Register

Points to the memory location where data will be written (for I/O-to-memory transfers).

Transfer Count Register

Number of bytes/words to transfer. Decremented after each transfer. Reaches zero = transfer complete.

Control Register

Configures direction (read/write), transfer mode (burst/cycle steal), and enables interrupts on completion.

DMA Transfer Modes

Burst Mode

DMA controller takes control of the bus and transfers the entire block without releasing it. Fast but blocks the CPU for the entire duration.

Pros: Highest transfer rateCons: CPU is halted for whole transfer

Cycle Stealing

DMA transfers one byte/word at a time, then releases the bus. Alternates bus access between CPU and DMA.

Pros: CPU stays responsiveCons: Slower transfer, more overhead

Transparent Mode

DMA transfers only when CPU is not using the bus (e.g., during CPU internal operations). No CPU slowdown at all.

Pros: Zero CPU overheadCons: Transfer rate depends on CPU idle time

DMA vs Programmed I/O Performance

MetricProgrammed I/ODMA (Cycle Steal)DMA (Burst)
CPU cycles per byte10-1001-2 (setup amortized)0 (total block)
CPU utilization during 1 MB transfer100%~5%0%
Transfer time (1 MB, 100 MHz bus)~10 ms~0.5 ms~0.1 ms
System throughput impactSevereMinimalModerate
Best forByte-at-a-time devicesMedium-speed, always-onHigh-speed, block devices

DMA Channels and Arbitration

â„šī¸

DMA Channels

A DMA controller typically supports multiple channels, each capable of managing an independent data transfer. Each channel has its own set of registers (source, destination, count, control). Arbitration logic decides which channel gets bus access when multiple channels request simultaneously.

Daisy Chaining

Devices are connected in series. Bus grant passes from one device to the next. Simple but can be unfair to devices farther from the CPU.

Independent Request

Each device has dedicated request/grant lines to the arbiter. More complex but allows programmable priority.

Bus Mastering

💡

What is Bus Mastering?

A bus master is a device that can initiate and control bus transactions. The CPU is the default bus master, but the DMA controller can become a bus master to perform transfers. Modern high-performance devices (like NVMe SSDs and GPUs) can act as bus masters themselves.

CPU vs DMA Data Path

Without DMA (Programmed I/O)
CPU→ reads →Device→ writes →Memory
✗ CPU copies every byte — cannot do other work
With DMA
CPU→ setup →DMA Ctrl↔ reads/writes ↔Memory
✓ CPU sets up transfer, then works on other tasks in parallel

Code Example: DMA Transfer Concept

c

#include <stdint.h>

// DMA controller registers (memory-mapped)
#define DMA_SRC    ((volatile uint32_t*)0xFFFFF000)
#define DMA_DST    ((volatile uint32_t*)0xFFFFF004)
#define DMA_CNT    ((volatile uint32_t*)0xFFFFF008)
#define DMA_CTRL   ((volatile uint32_t*)0xFFFFF00C)
#define DMA_STATUS ((volatile uint32_t*)0xFFFFF010)

#define DMA_START  0x1
#define DMA_READ   0x2   // Device → Memory
#define DMA_WRITE  0x4   // Memory → Device
#define DMA_IRQ_EN 0x8

// DMA transfer setup
void dma_transfer(void *src, void *dst, uint32_t count, int dir) {
    *DMA_SRC = (uint32_t)src;
    *DMA_DST = (uint32_t)dst;
    *DMA_CNT = count;
    *DMA_CTRL = dir | DMA_IRQ_EN | DMA_START;
}

// Without DMA: CPU copies every byte (slow)
void programmed_io_transfer(uint8_t *buf, uint32_t count) {
    for (uint32_t i = 0; i < count; i++) {
        while (!(inb(DEV_STATUS) & READY));  // Poll
        buf[i] = inb(DEV_DATA);              // Read byte
        // CPU stalls on every byte — terrible for large transfers
    }
}

DMA Transfer Flow

CPU Setup
→
DMA Grant
→
Data Transfer
→
Count Done
→
DMA IRQ
💡

Real-World DMA Example

A Gigabit Ethernet card receiving data: The NIC buffers a packet, asserts DMA request. The DMA controller transfers the packet (up to 1500 bytes) directly to main memory. The CPU is only interrupted once per packet (not 1500 times) and just processes the data already in memory.

Interview Questions

What is the difference between DMA and programmed I/O?

In programmed I/O, the CPU is involved in every byte transferred — it reads from the device and writes to memory. In DMA, a dedicated controller handles transfers directly between device and memory. DMA requires setup overhead but transfers each byte without CPU involvement, making it vastly more efficient for large or high-speed transfers.

Explain the three DMA transfer modes with examples.

1) Burst mode: DMA transfers entire block without releasing the bus — example: disk controller reading a full sector. 2) Cycle stealing: DMA transfers one word then releases the bus — example: sound card continuously streaming audio. 3) Transparent mode: DMA transfers only when CPU is using internal buses — example: background memory refresh.

What happens during a DMA cycle steal?

The DMA controller requests bus access via HOLD signal. The CPU finishes the current bus cycle, asserts HLDA (Hold Acknowledge), and tri-states its bus lines. The DMA controller performs one bus transfer (read device → write memory, or vice versa), then deasserts HOLD. The CPU resumes. This steals one bus cycle from the CPU.

How does scatter-gather DMA work?

Scatter-gather DMA uses a list of buffer descriptors (source, destination, length tuples) in memory. The DMA controller processes each descriptor sequentially, automatically chaining transfers across multiple non-contiguous buffers. This eliminates the need for the CPU to copy data into a single contiguous buffer. Widely used in network adapters and storage controllers.