Patterns in Practice: Hardware Design
Design patterns aren’t limited to software. The same fundamental challenges — decoupling, flow control, error handling, resource arbitration — appear in hardware, and engineers solve them with structurally identical patterns. The key difference: hardware operates at nanosecond granularity with deterministic timing, spatial parallelism, and physical enforcement guarantees.
Bus Protocols & Interconnects
For each entry, the hardware concept maps to specific pattern(s), with an explanation of how the pattern manifests and key differences from the software version.
PCIe Data Poisoning — Poison Pill + Null Object + Chain of Responsibility
- When a component detects an uncorrectable error (e.g., ECC failure reading memory), it sets the EP (Error Poisoned) bit in the TLP header
- The poisoned TLP is forwarded through the fabric, not dropped. Switches ignore the poison bit.
- Only the final endpoint inspects it — logs via AER, discards payload
- Poison Pill: marked as sentinel “this is invalid data”
- Null Object: same structure as valid packet, but semantically void
- Chain of Responsibility: propagates through switch chain until endpoint handles it
- Key difference from software: In software, Poison Pill typically terminates a consumer. In PCIe, it’s advisory — endpoint may log as Non-Fatal and continue operating.
PCIe Credit-Based Flow Control — Token Bucket + Back-Pressure
- Receivers advertise credits (buffer slots): 1 data credit = 16 bytes, 1 header credit = 1 TLP
- Credits tracked independently for Posted (P), Non-Posted (NP), and Completion (CPL) traffic
- Transmitter cannot send without confirming sufficient credits
- Credits returned via periodic UpdateFC DLLPs
- Key difference: Software token buckets rate-limit; PCIe credits are about buffer capacity. Three independent flow control domains (P/NP/CPL) is more granular than most software.
PCIe TLP Routing — Strategy + Router
- Three routing methods: address-based (memory/IO), ID-based (Bus/Device/Function), implicit (upstream/downstream)
- Switch decodes routing type from TLP header and forwards accordingly
- Strategy encoded in packet header by protocol spec, not selected at runtime
AXI Valid/Ready Handshake — Producer-Consumer + Back-Pressure
- Source asserts VALID when data available; sink asserts READY when it can accept
- Transfer occurs only when both VALID && READY on same clock edge
- Critical rule: Source must NOT wait for READY before asserting VALID (deadlock prevention)
- 5 separate channels (Write Addr, Write Data, Write Response, Read Addr, Read Data) = Command-Query Separation
- Key difference: The no-wait rule on VALID has no software equivalent — it’s a hardware-specific deadlock prevention constraint.
CXL Memory Pooling — Object Pool + Facade + Proxy
- CXL 2.0: switches enable multiple hosts to share a pool of memory devices
- Object Pool: hosts dynamically allocate/deallocate memory segments from shared pool
- Facade: unified “coherent memory access” regardless of location (host DRAM, device HBM, pooled CXL)
- Proxy: CXL.cache lets devices coherently cache host memory; host coherency manager ensures consistency
Chip Architecture
CPU Pipeline — Pipes and Filters
- Classic RISC 5-stage: Fetch → Decode → Execute → Memory → Writeback
- Each stage is a filter; pipeline registers are pipes
- Multiple instructions in-flight simultaneously (spatial parallelism)
- Key difference from software: Hardware hazards (data, control, structural) require forwarding, stalling, and speculation. Each stage executes in exactly one clock cycle. Software filters have variable processing time.
Cache Coherence (MESI/MOESI) — State Pattern + Observer + Mediator
- Each cache line has states: Modified, Exclusive, Shared, Invalid (+ Owned in MOESI)
- State Pattern: transitions triggered by local CPU ops and bus snoop events
- Observer (Pub-Sub): snoop bus broadcasts writes; all cores monitoring that address update state
- Mediator: directory-based coherence (NUMA) uses central directory mediating between cores
- Operates at nanosecond granularity with dedicated snoop filter hardware
Memory Hierarchy — Cache + Proxy + Chain of Responsibility
- L1 → L2 → L3 → DRAM → Disk: faster/smaller caches in front of slower/larger
- Each level acts as transparent proxy for the level below
- On miss, request propagates down the chain until satisfied
- The software “cache pattern” is literally named after this hardware concept
Interrupt Handling — Observer + Chain of Responsibility + Command
- Devices assert interrupt lines; interrupt controller (APIC, GIC) prioritizes and delivers to CPU
- CPU looks up IDT (Interrupt Descriptor Table), dispatches to registered handler
- Shared interrupt lines: handlers form a chain, each checking “is this for me?”
- Key difference: truly asynchronous and preemptive — forcibly suspends current execution
DMA — Command + Future
- CPU programs DMA descriptor (source, dest, size, direction) — a serialized command object
- DMA engine executes autonomously; signals completion via interrupt (async result / Future)
- Scatter-Gather DMA = linked list of descriptors = Composite Command
IOMMU/MMU — Proxy + Adapter + Decorator
- MMU translates CPU virtual → physical addresses via page tables
- IOMMU does same for device DMA — intercepts and translates device addresses
- Proxy: transparent interposition between requestor and memory
- Adapter: resolves address space mismatch in virtualized environments
- Decorator: access control (R/W/X permissions) layered on translation
Register Renaming — Flyweight
- Physical registers are shared intrinsic state
- Logical register names (R3, R7) are extrinsic keys
- Register rename table (RAT) dynamically maps logical → physical
- Eliminates false dependencies (WAW/WAR hazards) — dynamic Flyweight allocation
Clock Domain Crossing — Adapter + Bridge + Producer-Consumer
- Two-FF synchronizer for single-bit signals (metastability settling)
- Async FIFO for multi-bit: dual-port RAM with Gray-coded pointers crossing domains
- Adapter: translates signals between clock domains
- Bridge: decouples domains so they vary independently
- Producer-Consumer: async FIFO is a bounded buffer between clock domains
Digital Logic Design Patterns
Finite State Machines — State Pattern (1:1 Mapping)
- Moore (output = state) and Mealy (output = state + input) machines
- One transition per clock cycle via combinational next-state logic + registers
- Exhaustively verified for all state/input combinations (rarely achieved in software)
FIFO Buffers — Queue / Producer-Consumer
- Circular buffer with read/write pointers
- Hardware adds:
full,empty,almost_full,almost_emptysignals - Ubiquitous: between pipeline stages, at clock domain crossings, in network interfaces
Arbiters — Mediator
- Resolves contention for shared resources (bus, memory port, crossbar)
- Types: Fixed Priority, Round-Robin, Weighted Round-Robin, Lottery
- Single-cycle decisions via combinational logic (priority encoders, rotating masks)
MUX/DEMUX — Strategy / Router
- MUX: select signal chooses which input to route to output (Strategy selection)
- DEMUX: routes single input to one of N outputs (Router / dispatch)
- Essentially a hardware switch statement
Parameterized Modules — Generics / Templates
- Verilog
parameter/ VHDLgenericfor configurable instantiation - Example:
module fifo #(parameter DATA_WIDTH=8, DEPTH=16) - Resolved at synthesis time (like C++ templates) — produces physically distinct hardware
IP Core Reuse — Component + Template Method
- Pre-designed, pre-verified blocks (UART, PCIe controller, DDR PHY)
- Standardized interfaces (AXI, APB) for plug-and-play integration
- Parameterizable with configurable hooks (interrupts, DMA callbacks)
Hardware Security
Hardware Root of Trust — Singleton (Immutable)
- Immutable component: ROM, fused keys, tamper-resistant module
- Contains cryptographic keys and verification logic for entire trust chain
- Examples: Intel CSME, AMD PSP, ARM TrustZone, Apple Secure Enclave, Google Titan
- Key difference: software singletons can be patched or mocked. HRoT is physically immutable.
Secure Boot — Chain of Responsibility + Builder
- Sequential verification: HRoT → bootloader stage 0 → stage 1 → kernel → drivers
- Each stage cryptographically authenticates the next before transferring control
- Failure at any stage halts the entire system (much more drastic than software CoR)
Hardware Isolation (TrustZone/SGX) — Bulkhead / Sandbox
- ARM TrustZone: two execution worlds (Secure, Normal) enforced by hardware
- Intel SGX: hardware-encrypted memory enclaves
- Bus fabric physically refuses transactions from wrong security domain
- Key difference: software sandboxes rely on OS enforcement (bypassable). Hardware isolation is physical.
Side-Channel Mitigations — Constant-Time + Decorator + Bulkhead
- Constant-time execution: crypto ops take same cycles regardless of input
- Power noise injection: dummy operations mask power signatures (Decorator)
- Cache partitioning: separate cache ways per security domain (Bulkhead)
Quick Reference
| Hardware Concept | Pattern(s) | Key Insight |
|---|---|---|
| PCIe Data Poisoning | Poison Pill, Null Object, CoR | Corrupted data forwarded, not dropped; endpoint decides |
| PCIe Credit Flow | Token Bucket, Back-Pressure | Typed credits (P/NP/CPL) more granular than software |
| AXI Handshake | Producer-Consumer, Back-Pressure | VALID must not wait for READY (no software equivalent) |
| CXL Memory Pool | Object Pool, Facade, Proxy | Hardware memory pool with coherency |
| CPU Pipeline | Pipes and Filters | Spatial parallelism; hazards unique to hardware |
| Cache Coherence | State, Observer, Mediator | Snoop bus = pub-sub; directory = mediator |
| Memory Hierarchy | Cache, Proxy, CoR | Software “cache pattern” named after this |
| Interrupts | Observer, CoR, Command | Truly async/preemptive unlike software |
| DMA | Command, Future | Descriptor = command; completion = async result |
| MMU/IOMMU | Proxy, Adapter, Decorator | Transparent translation + access control |
| Register Renaming | Flyweight | Physical regs = shared state; logical = extrinsic key |
| Clock Domain Crossing | Adapter, Bridge | Async FIFO decouples independent domains |
| FSM | State Pattern | 1:1 mapping; one cycle per transition |
| Arbiters | Mediator | Single-cycle via combinational logic |
| Parameterized Modules | Generics/Templates | Resolved at synthesis time |
| Root of Trust | Singleton (immutable) | Physically immutable, unlike software |
| Secure Boot | CoR, Builder | Failure halts entire system |
| TrustZone/SGX | Bulkhead, Sandbox | Bus fabric enforces isolation |
References
| Topic | Resource | Link |
|---|---|---|
| PCIe Data Poisoning | OCP Poison White Paper | opencompute.org |
| PCIe Data Poisoning | Intel Error Reporting | intel.com |
| PCIe Flow Control | Intel Credit Handling | intel.com |
| PCIe AER | Linux AER HOWTO | kernel.org |
| AXI/AMBA | ARM AXI Specification | developer.arm.com |
| AXI Handshake | VHDLwhiz AXI Guide | vhdlwhiz.com |
| CXL | CXL Consortium | computeexpresslink.org |
| CXL | Rambus CXL Overview | rambus.com |
| CPU Pipeline | Berkeley Pipeline Pattern | berkeley.edu |
| Cache Coherence | MESI Protocol | Wikipedia |
| Cache Coherence | Coherence Primer | Wikipedia |
| Memory Hierarchy | Memory Hierarchy | Wikipedia |
| Interrupts | APIC | Wikipedia |
| DMA | DMA Overview | Wikipedia |
| IOMMU | IOMMU Overview | Wikipedia |
| Clock Domain Crossing | Verilog Pro CDC | verilogpro.com |
| Arbiters | Arbiter Design Styles | Paper (PDF) |
| NoC | Network on Chip | Wikipedia |
| Root of Trust | Rambus HRoT | rambus.com |
| Secure Boot | Cloudflare Secure Boot | blog.cloudflare.com |
| TrustZone | ARM TrustZone | developer.arm.com |
| Rust RAII | Rust by Example | doc.rust-lang.org |
| ECC | ECC Memory | Wikipedia |
| Watchdog | Watchdog Best Practices | memfault.com |