Chapter 21: RAS Features

Reliability, Availability, and Serviceability - WHEA, APEI, and error handling.

Overview

When to Use RAS Features

Use RAS features when you need to:

Handle hardware errors (CPU, memory, PCIe) gracefully

Log errors for post-mortem analysis and reporting

Support error injection for validation testing

Meet enterprise/server reliability requirements

Scenario	APEI Table	Purpose
Error source enumeration	HEST	Define CPU, memory, PCIe error sources
Boot error recording	BERT	Report errors from previous boot
Persistent error storage	ERST	Non-volatile error log
Error injection testing	EINJ	Trigger synthetic errors
Error severity handling	HEST	Correctable vs fatal classification

RAS Feature Selection:

Platform Type	RAS Level	Features
Client	Minimal	Basic MCA, no APEI
Workstation	Moderate	MCA, ECC reporting
Server	Full	Complete APEI, WHEA, hot-plug
Mission-critical	Maximum	Redundancy, live migration

Who Implements RAS:

Role	RAS Tasks
Silicon vendor	MCA, AER, error detection logic
Platform developer	APEI tables, error handlers
BIOS engineer	Error handling integration
OS vendor	WHEA consumer, error reporting
Reliability engineer	EINJ testing, validation

Key RAS Concepts:

WHEA: Windows Hardware Error Architecture
MCA/MCE: Machine Check Architecture/Exception
AER: Advanced Error Reporting (PCIe)
ECC: Error Correcting Code (memory)
Correctable vs Uncorrectable: Severity determines handling

RAS Architecture

RAS features ensure system reliability through error detection, logging, and recovery:

flowchart TB
    subgraph Hardware Errors
        MCA[Machine Check<br/>CPU Errors]
        AER[PCIe AER<br/>Bus Errors]
        ECC[Memory ECC<br/>RAM Errors]
        THERMAL[Thermal<br/>Overtemperature]
    end

    subgraph UEFI RAS
        HEST[HEST Table<br/>Error Sources]
        HANDLER[Error Handlers]
        BERT[BERT Table<br/>Boot Errors]
        ERST[ERST Table<br/>Error Storage]
    end

    subgraph Reporting
        WHEA[WHEA Records]
        SEL[BMC SEL]
        OS_LOG[OS Error Log]
    end

    MCA --> HANDLER
    AER --> HANDLER
    ECC --> HANDLER
    THERMAL --> HANDLER
    HEST --> HANDLER
    HANDLER --> BERT
    HANDLER --> ERST
    HANDLER --> WHEA
    WHEA --> SEL
    WHEA --> OS_LOG

    style HANDLER fill:#e74c3c,color:#fff
    style WHEA fill:#3498db,color:#fff
    style HEST fill:#2ecc71,color:#fff

APEI Tables

Table	Full Name	Purpose
HEST	Hardware Error Source Table	Describes error sources
BERT	Boot Error Record Table	Boot-time errors
ERST	Error Record Serialization Table	Persistent error storage
EINJ	Error Injection Table	Error injection for testing

Error Severity Levels

Severity	Action	Example
Corrected	Log only	Single-bit ECC
Recoverable	OS handles	PCIe non-fatal
Fatal	System halt	Uncorrectable memory
Informational	Advisory	Predictive failure

RAS Error Types and Handling

Understanding the different categories of RAS errors and their specific handling requirements is critical for platform developers. Each error type has unique detection mechanisms, containment strategies, and recovery options.

Error Type Categories

flowchart TB
    subgraph System["System/Processor Errors"]
        MCA[Machine Check<br/>Architecture]
        CACHE[Cache Errors]
        TLB[TLB Errors]
        BUS_SYS[Internal Bus<br/>Errors]
    end

    subgraph Memory["Memory Errors"]
        ECC_SE[Single-bit ECC<br/>Correctable]
        ECC_ME[Multi-bit ECC<br/>Uncorrectable]
        PATROL[Patrol Scrub<br/>Errors]
        ADDR[Address/Command<br/>Parity]
    end

    subgraph Peripheral["Peripheral/PCIe Errors"]
        AER_CE[AER Correctable<br/>Bad TLP, Bad DLLP]
        AER_UE[AER Uncorrectable<br/>Poisoned TLP]
        AER_FE[AER Fatal<br/>ECRC, Link Down]
        DMA[DMA Errors]
    end

    MCA --> HANDLER[Error Handler]
    CACHE --> HANDLER
    TLB --> HANDLER
    BUS_SYS --> HANDLER
    ECC_SE --> HANDLER
    ECC_ME --> HANDLER
    PATROL --> HANDLER
    ADDR --> HANDLER
    AER_CE --> HANDLER
    AER_UE --> HANDLER
    AER_FE --> HANDLER
    DMA --> HANDLER

    style System fill:#e74c3c,color:#fff
    style Memory fill:#3498db,color:#fff
    style Peripheral fill:#2ecc71,color:#fff

System/Processor Errors

System errors originate from CPU cores, caches, and internal processor buses. They are detected via Machine Check Architecture (MCA) and require immediate attention due to potential data corruption.

Error Type	Source	Detection	Severity	Handling
Cache ECC	L1/L2/L3 cache	MCA bank	Correctable/Fatal	Log, possible core isolation
TLB Parity	Translation Lookaside Buffer	MCA bank	Fatal	Core reset required
Internal Bus	QPI/UPI/Infinity Fabric	MCA bank	Fatal	System halt
Microcode	Instruction decode	MCA bank	Fatal	Immediate halt
Register File	CPU registers	MCA bank	Fatal	Core offline

System Error Handling Characteristics:

//
// System/Processor Error Handling
//
typedef struct {
  UINT32    BankNumber;           // MCA bank that detected error
  UINT64    McaStatus;            // MC_STATUS register
  UINT64    McaAddress;           // MC_ADDR register (faulting address)
  UINT64    McaMisc;              // MC_MISC register (additional info)
  BOOLEAN   Overflow;             // Multiple errors pending
  BOOLEAN   Uncorrected;          // UC bit - uncorrectable
  BOOLEAN   EnabledError;         // EN bit - error was enabled
  BOOLEAN   ProcessorContextCorrupt;  // PCC bit - context corrupted
} SYSTEM_ERROR_INFO;

EFI_STATUS
HandleSystemError (
  IN SYSTEM_ERROR_INFO  *ErrorInfo
  )
{
  //
  // Check if processor context is corrupted (PCC bit)
  // If PCC=1, no reliable recovery is possible
  //
  if (ErrorInfo->ProcessorContextCorrupt) {
    //
    // FATAL: System must halt - context cannot be trusted
    //
    LogFatalError(ERROR_TYPE_PROCESSOR, ErrorInfo);
    return EFI_DEVICE_ERROR;  // Signal system halt
  }

  //
  // Uncorrected but recoverable (UC=1, PCC=0)
  // OS can attempt recovery (e.g., kill affected process)
  //
  if (ErrorInfo->Uncorrected) {
    CreateWheaRecord(ERROR_TYPE_PROCESSOR,
                     ErrorInfo->McaAddress,
                     ErrorInfo,
                     sizeof(SYSTEM_ERROR_INFO));
    //
    // Signal OS for SRAR (Software Recoverable Action Required)
    //
    return EFI_WARN_STALE_DATA;
  }

  //
  // Corrected error - log and continue
  //
  LogCorrectableError(ERROR_TYPE_PROCESSOR, ErrorInfo);
  IncrementCeCounter(ErrorInfo->BankNumber);

  //
  // Check threshold - too many correctable errors may indicate
  // impending failure
  //
  if (CheckCeThreshold(ErrorInfo->BankNumber)) {
    TriggerPredictiveFailureAlert(ErrorInfo->BankNumber);
  }

  return EFI_SUCCESS;
}

Key System Error Differences:

Immediacy: System errors often require immediate handling (SMI/NMI)
Context corruption: PCC bit indicates if processor state can be trusted
Recovery scope: May require core isolation or full system reset
WHEA section: gEfiProcessorGenericErrorSectionGuid

Memory Errors

Memory errors occur in DRAM, memory controllers, or memory buses. ECC (Error Correcting Code) is the primary detection mechanism. Memory errors are the most common RAS events in server environments.

Error Type	Source	Detection	Severity	Handling
Single-bit ECC	DRAM cell	Memory controller	Correctable	Log, continue
Multi-bit ECC	DRAM cells	Memory controller	Uncorrectable	Page offline
Address parity	Command/Address bus	Memory controller	Fatal	Channel offline
Patrol scrub	Background scrubbing	Memory controller	Correctable	Log, schedule replacement
Memory mirror failover	Redundant channel	Memory controller	Recoverable	Failover to mirror
DIMM thermal	Over-temperature	Thermal sensor	Recoverable	Throttle or offline

Memory Error Handling Characteristics:

//
// Memory Error Handling
//
typedef struct {
  UINT64    PhysicalAddress;      // Faulting physical address
  UINT64    PhysicalAddressMask;  // Mask for affected range
  UINT16    Node;                 // NUMA node
  UINT16    Card;                 // Memory card/riser
  UINT16    Module;               // DIMM slot
  UINT16    Bank;                 // Bank within DIMM
  UINT16    Row;                  // Row address
  UINT16    Column;               // Column address
  UINT8     BitPosition;          // Failed bit (if single-bit)
  UINT8     ErrorType;            // Single-bit, multi-bit, etc.
  BOOLEAN   Corrected;            // Was error corrected by ECC?
} MEMORY_ERROR_INFO;

EFI_STATUS
HandleMemoryError (
  IN MEMORY_ERROR_INFO  *ErrorInfo
  )
{
  //
  // Memory error handling differs based on correctability
  //

  if (ErrorInfo->Corrected) {
    //
    // CORRECTABLE: Single-bit ECC corrected by hardware
    // - Log for trend analysis
    // - Track per-DIMM CE counts
    // - No immediate action required
    //
    LogCorrectableMemoryError(ErrorInfo);
    IncrementDimmCeCount(ErrorInfo->Node, ErrorInfo->Card, ErrorInfo->Module);

    //
    // Check CE threshold for predictive failure
    // (e.g., 24 CEs in 24 hours = likely DIMM failure)
    //
    if (CheckDimmCeThreshold(ErrorInfo->Node, ErrorInfo->Card, ErrorInfo->Module)) {
      //
      // Request proactive DIMM replacement
      //
      TriggerPredictiveFailure(ErrorInfo);
      NotifyBmcDimmFailing(ErrorInfo);
    }

    return EFI_SUCCESS;
  }

  //
  // UNCORRECTABLE: Multi-bit error - cannot recover data
  //
  CreateWheaRecord(ERROR_TYPE_MEMORY,
                   ErrorInfo->PhysicalAddress,
                   ErrorInfo,
                   sizeof(MEMORY_ERROR_INFO));

  //
  // Memory-specific recovery options:
  //

  //
  // Option 1: Page offline (if OS supports memory hot-remove)
  // Mark page as bad so OS won't use it
  //
  if (SupportsPageOffline()) {
    RequestPageOffline(ErrorInfo->PhysicalAddress);
    return EFI_SUCCESS;  // Recoverable
  }

  //
  // Option 2: Memory mirroring failover
  //
  if (IsMemoryMirrored(ErrorInfo->Node, ErrorInfo->Card)) {
    FailoverToMirror(ErrorInfo);
    return EFI_SUCCESS;  // Recoverable via redundancy
  }

  //
  // Option 3: Rank sparing
  //
  if (HasSpareRank(ErrorInfo->Node, ErrorInfo->Card)) {
    ActivateSpareRank(ErrorInfo);
    return EFI_SUCCESS;  // Recoverable via sparing
  }

  //
  // No recovery possible - signal fatal
  //
  return EFI_DEVICE_ERROR;
}

Key Memory Error Differences:

Gradual degradation: Memory errors often increase over time before failure
Physical location: Errors map to specific DIMM/rank/bank for replacement
Recovery options: Page offline, mirroring, rank sparing, DIMM replacement
Threshold tracking: CE counts predict impending UE failures
WHEA section: gEfiPlatformMemoryErrorSectionGuid

Post Package Repair (PPR)

Post Package Repair (PPR) is a DRAM technology that allows faulty memory rows to be replaced with spare rows within the DRAM package itself. This enables in-field repair without physical DIMM replacement.

PPR is defined in JEDEC standards (DDR4/DDR5) and provides two repair modes:

PPR Type	Name	Persistence	Usage	Limitations
sPPR	Soft PPR	Until power cycle	Runtime repair	Lost on reboot, limited repairs
hPPR	Hard PPR	Permanent	Manufacturing/BIOS	One-time per row, irreversible

When to Use PPR:

Scenario	PPR Type	Rationale
Repeated CE on same row	sPPR first	Test repair before committing
Confirmed bad row	hPPR	Permanent fix, survives reboot
Runtime error mitigation	sPPR	Quick fix, evaluate later
DIMM qualification	hPPR	Factory repair of weak cells
Pre-boot repair	hPPR	Fix known bad rows at POST

flowchart LR
    subgraph Detection
        CE[Correctable<br/>Errors] --> TRACK[Track Row<br/>Address]
        TRACK --> THRESH{Threshold<br/>Exceeded?}
    end

    subgraph Repair
        THRESH -->|Yes| SPPR[Apply Soft PPR<br/>sPPR]
        SPPR --> VERIFY{Errors<br/>Continue?}
        VERIFY -->|No| MONITOR[Monitor<br/>Continue]
        VERIFY -->|Yes| HPPR[Apply Hard PPR<br/>hPPR]
        HPPR --> PERMANENT[Permanent<br/>Repair]
    end

    THRESH -->|No| MONITOR

    style SPPR fill:#3498db,color:#fff
    style HPPR fill:#e74c3c,color:#fff
    style PERMANENT fill:#2ecc71,color:#fff

PPR Implementation:

//
// Post Package Repair (PPR) Implementation
//

//
// PPR Mode Register definitions (DDR4/DDR5)
//
#define MR4_PPR_SOFT_MODE      BIT5    // sPPR mode select
#define MR4_PPR_HARD_MODE      BIT4    // hPPR mode select

//
// PPR Guard Key sequence (vendor-specific, example)
//
#define PPR_GUARD_KEY_0        0x0F
#define PPR_GUARD_KEY_1        0xF0
#define PPR_GUARD_KEY_2        0x55
#define PPR_GUARD_KEY_3        0xAA

typedef struct {
  UINT8     Socket;
  UINT8     Channel;
  UINT8     Dimm;
  UINT8     Rank;
  UINT8     BankGroup;
  UINT8     Bank;
  UINT32    Row;              // Failing row address
  UINT8     PprType;          // 0=sPPR, 1=hPPR
  BOOLEAN   GuardKeyRequired; // Vendor-specific protection
} PPR_ADDRESS;

typedef enum {
  PprTypeSoft = 0,            // Soft PPR - temporary
  PprTypeHard = 1             // Hard PPR - permanent
} PPR_TYPE;

typedef enum {
  PprSuccess = 0,
  PprResourceExhausted,       // No spare rows available
  PprAlreadyRepaired,         // Row already repaired (hPPR)
  PprNotSupported,            // DIMM doesn't support PPR
  PprGuardKeyFailed,          // Guard key sequence failed
  PprVerifyFailed             // Post-repair verification failed
} PPR_STATUS;

//
// Check if DIMM supports PPR
//
BOOLEAN
IsPprSupported (
  IN UINT8  Socket,
  IN UINT8  Channel,
  IN UINT8  Dimm,
  OUT BOOLEAN *SoftPprSupported,
  OUT BOOLEAN *HardPprSupported
  )
{
  SPD_DATA *Spd;

  //
  // Read SPD to check PPR support
  // DDR4: SPD byte 9 (bits 7:6)
  // DDR5: SPD byte 9
  //
  Spd = GetDimmSpd(Socket, Channel, Dimm);
  if (Spd == NULL) {
    return FALSE;
  }

  //
  // Check soft PPR support
  //
  *SoftPprSupported = (Spd->ModuleType.Bits.SoftPpr == 1);

  //
  // Check hard PPR support (one row per bank group)
  //
  *HardPprSupported = (Spd->ModuleType.Bits.HardPpr == 1);

  return (*SoftPprSupported || *HardPprSupported);
}

//
// Get available PPR resources
//
EFI_STATUS
GetPprResources (
  IN  UINT8   Socket,
  IN  UINT8   Channel,
  IN  UINT8   Dimm,
  OUT UINT32  *SpareRowsAvailable,
  OUT UINT32  *SpareRowsUsed
  )
{
  //
  // Query memory controller for PPR resource status
  // This is platform/silicon-specific
  //
  *SpareRowsAvailable = ReadMcPprSpareCount(Socket, Channel, Dimm);
  *SpareRowsUsed = ReadMcPprUsedCount(Socket, Channel, Dimm);

  return EFI_SUCCESS;
}

//
// Execute Soft PPR (runtime, temporary)
//
PPR_STATUS
ExecuteSoftPpr (
  IN PPR_ADDRESS  *Address
  )
{
  PPR_STATUS Status;

  DEBUG((DEBUG_INFO, "sPPR: Socket%d Ch%d Dimm%d Rank%d BG%d Bank%d Row0x%x\n",
    Address->Socket, Address->Channel, Address->Dimm,
    Address->Rank, Address->BankGroup, Address->Bank, Address->Row));

  //
  // 1. Enter PPR mode via MR4
  //
  WriteModeRegister(Address, MR4, MR4_PPR_SOFT_MODE);

  //
  // 2. Issue guard key sequence (if required by vendor)
  //
  if (Address->GuardKeyRequired) {
    WriteModeRegister(Address, MR0, PPR_GUARD_KEY_0);
    WriteModeRegister(Address, MR0, PPR_GUARD_KEY_1);
    WriteModeRegister(Address, MR0, PPR_GUARD_KEY_2);
    WriteModeRegister(Address, MR0, PPR_GUARD_KEY_3);
  }

  //
  // 3. Activate the failing row (WRA command to target row)
  //
  IssueActivate(Address->Socket, Address->Channel, Address->Dimm,
                Address->Rank, Address->BankGroup, Address->Bank,
                Address->Row);

  //
  // 4. Wait for tPPR (PPR time, typically 1-2us for sPPR)
  //
  MicroSecondDelay(2);

  //
  // 5. Exit PPR mode
  //
  WriteModeRegister(Address, MR4, 0);

  //
  // 6. Verify repair by reading the row
  //
  Status = VerifyPprRepair(Address);
  if (Status != PprSuccess) {
    DEBUG((DEBUG_ERROR, "sPPR verification failed\n"));
    return PprVerifyFailed;
  }

  //
  // Log successful repair
  //
  LogPprEvent(Address, PprTypeSoft, PprSuccess);

  return PprSuccess;
}

//
// Execute Hard PPR (permanent, requires special handling)
//
PPR_STATUS
ExecuteHardPpr (
  IN PPR_ADDRESS  *Address
  )
{
  UINT32 SpareAvailable;
  UINT32 SpareUsed;
  PPR_STATUS Status;

  DEBUG((DEBUG_WARN, "hPPR: Socket%d Ch%d Dimm%d Rank%d BG%d Bank%d Row0x%x\n",
    Address->Socket, Address->Channel, Address->Dimm,
    Address->Rank, Address->BankGroup, Address->Bank, Address->Row));

  //
  // 1. Check if spare rows are available
  //
  GetPprResources(Address->Socket, Address->Channel, Address->Dimm,
                  &SpareAvailable, &SpareUsed);

  if (SpareAvailable == 0) {
    DEBUG((DEBUG_ERROR, "hPPR: No spare rows available\n"));
    return PprResourceExhausted;
  }

  //
  // 2. CRITICAL: hPPR requires system in specific state
  //    - Typically done during POST or maintenance window
  //    - Requires write leveling to be disabled
  //    - May require elevated voltage (tPPR_H timing)
  //
  PrepareForHardPpr(Address);

  //
  // 3. Enter hPPR mode via MR4
  //
  WriteModeRegister(Address, MR4, MR4_PPR_HARD_MODE);

  //
  // 4. Issue guard key sequence (always required for hPPR)
  //
  WriteModeRegister(Address, MR0, PPR_GUARD_KEY_0);
  WriteModeRegister(Address, MR0, PPR_GUARD_KEY_1);
  WriteModeRegister(Address, MR0, PPR_GUARD_KEY_2);
  WriteModeRegister(Address, MR0, PPR_GUARD_KEY_3);

  //
  // 5. Activate the failing row
  //
  IssueActivate(Address->Socket, Address->Channel, Address->Dimm,
                Address->Rank, Address->BankGroup, Address->Bank,
                Address->Row);

  //
  // 6. Wait for tPPR_H (hPPR time, typically 250ms-1s for eFuse programming)
  //
  MilliSecondDelay(500);

  //
  // 7. Exit PPR mode
  //
  WriteModeRegister(Address, MR4, 0);

  //
  // 8. Restore normal operation
  //
  RestoreAfterHardPpr(Address);

  //
  // 9. Verify repair
  //
  Status = VerifyPprRepair(Address);
  if (Status != PprSuccess) {
    DEBUG((DEBUG_ERROR, "hPPR verification failed - spare row may be bad\n"));
    return PprVerifyFailed;
  }

  //
  // 10. Record repair in persistent storage (for inventory tracking)
  //
  RecordHardPprInNvram(Address);
  LogPprEvent(Address, PprTypeHard, PprSuccess);

  DEBUG((DEBUG_INFO, "hPPR: Success - Spare rows remaining: %d\n",
    SpareAvailable - 1));

  return PprSuccess;
}

//
// Integrate PPR with memory error handling
//
EFI_STATUS
HandleMemoryErrorWithPpr (
  IN MEMORY_ERROR_INFO  *ErrorInfo
  )
{
  PPR_ADDRESS PprAddr;
  PPR_STATUS PprStatus;
  BOOLEAN SoftSupported, HardSupported;

  //
  // Check if this row has excessive correctable errors
  //
  if (!ShouldAttemptPpr(ErrorInfo)) {
    return HandleMemoryError(ErrorInfo);  // Standard handling
  }

  //
  // Build PPR address from error info
  //
  PprAddr.Socket = ErrorInfo->Node;
  PprAddr.Channel = GetChannelFromAddress(ErrorInfo->PhysicalAddress);
  PprAddr.Dimm = ErrorInfo->Module;
  PprAddr.Rank = GetRankFromAddress(ErrorInfo->PhysicalAddress);
  PprAddr.BankGroup = GetBankGroupFromAddress(ErrorInfo->PhysicalAddress);
  PprAddr.Bank = ErrorInfo->Bank;
  PprAddr.Row = ErrorInfo->Row;

  //
  // Check PPR support
  //
  if (!IsPprSupported(PprAddr.Socket, PprAddr.Channel, PprAddr.Dimm,
                      &SoftSupported, &HardSupported)) {
    DEBUG((DEBUG_WARN, "PPR not supported on this DIMM\n"));
    return HandleMemoryError(ErrorInfo);
  }

  //
  // Try soft PPR first (reversible, faster)
  //
  if (SoftSupported) {
    PprStatus = ExecuteSoftPpr(&PprAddr);
    if (PprStatus == PprSuccess) {
      //
      // Monitor for continued errors - if they persist,
      // may need hPPR on next boot
      //
      SchedulePprFollowUp(&PprAddr);
      return EFI_SUCCESS;
    }
  }

  //
  // If sPPR failed or not supported, schedule hPPR for next boot
  // (hPPR typically requires controlled environment - POST)
  //
  if (HardSupported) {
    ScheduleHardPprOnNextBoot(&PprAddr);
    DEBUG((DEBUG_INFO, "hPPR scheduled for next boot\n"));
  }

  //
  // Fall back to standard memory error handling
  //
  return HandleMemoryError(ErrorInfo);
}

PPR Comparison Table:

Aspect	Soft PPR (sPPR)	Hard PPR (hPPR)
Persistence	Lost on power cycle	Permanent (eFuse)
Timing	~2µs	~250ms-1s
When to use	Runtime, testing	POST, confirmed failures
Reversible	Yes	No
Spare rows	Unlimited reuse	One per bank group
Guard key	Optional	Required
System state	Normal operation	Controlled (POST)
Risk level	Low	Moderate (irreversible)

PPR Best Practices:

PPR Strategy:

Track row-level CE counts (not just DIMM-level)

Apply sPPR first to verify repair effectiveness

Schedule hPPR during maintenance windows

Record all hPPR operations for DIMM lifetime tracking

Consider DIMM replacement if hPPR resources exhausted

Leaky Bucket Error Counting

Leaky Bucket Algorithm is a rate-limiting technique for error threshold management. It prevents false positives from transient error bursts while detecting persistent degradation patterns that indicate hardware failure.

The leaky bucket works by:

Incrementing a counter when errors occur
Decrementing (leaking) the counter over time
Triggering action when the counter exceeds a threshold

flowchart LR
    subgraph Input
        CE1[CE Event] --> BUCKET
        CE2[CE Event] --> BUCKET
        CE3[CE Event] --> BUCKET
    end

    subgraph Bucket["Leaky Bucket"]
        BUCKET[Counter<br/>Value] --> LEAK[Time-based<br/>Leak]
        LEAK --> BUCKET
    end

    subgraph Output
        BUCKET --> CHECK{Above<br/>Threshold?}
        CHECK -->|Yes| ACTION[Trigger<br/>Action]
        CHECK -->|No| WAIT[Continue<br/>Monitoring]
    end

    style BUCKET fill:#3498db,color:#fff
    style ACTION fill:#e74c3c,color:#fff

Why Leaky Bucket vs Simple Counting:

Scenario	Simple Counter	Leaky Bucket	Correct Action
10 CEs in 1 second, then none	Threshold hit	Counter drains	No action (transient)
1 CE per hour for 24 hours	Below threshold	Steady accumulation	Action needed (degrading)
Burst during stress test	False positive	Drains after test	No action (expected)
Gradual increase over weeks	May miss pattern	Detects trend	Action needed

Leaky Bucket Implementation:

//
// Leaky Bucket Error Counter Implementation
//

typedef struct {
  UINT32    Count;              // Current error count (bucket level)
  UINT32    Threshold;          // Action threshold
  UINT32    LeakRate;           // Decrements per leak interval
  UINT32    LeakIntervalMs;     // Time between leaks (milliseconds)
  UINT64    LastLeakTimestamp;  // Last leak time (TSC or timer)
  UINT32    IncrementValue;     // How much each error adds
  UINT32    MaxCount;           // Bucket capacity (ceiling)
  BOOLEAN   ThresholdReached;   // Latched threshold indicator
} LEAKY_BUCKET;

//
// Initialize a leaky bucket
//
VOID
InitializeLeakyBucket (
  OUT LEAKY_BUCKET  *Bucket,
  IN  UINT32        Threshold,
  IN  UINT32        LeakRate,
  IN  UINT32        LeakIntervalMs,
  IN  UINT32        IncrementValue,
  IN  UINT32        MaxCount
  )
{
  ZeroMem(Bucket, sizeof(LEAKY_BUCKET));

  Bucket->Threshold = Threshold;
  Bucket->LeakRate = LeakRate;
  Bucket->LeakIntervalMs = LeakIntervalMs;
  Bucket->IncrementValue = IncrementValue;
  Bucket->MaxCount = MaxCount;
  Bucket->LastLeakTimestamp = GetCurrentTimestampMs();
}

//
// Apply time-based leak to the bucket
//
VOID
ApplyLeak (
  IN OUT LEAKY_BUCKET  *Bucket
  )
{
  UINT64 CurrentTime;
  UINT64 ElapsedMs;
  UINT32 LeakAmount;

  CurrentTime = GetCurrentTimestampMs();
  ElapsedMs = CurrentTime - Bucket->LastLeakTimestamp;

  //
  // Calculate how many leak intervals have passed
  //
  if (ElapsedMs >= Bucket->LeakIntervalMs) {
    LeakAmount = (UINT32)(ElapsedMs / Bucket->LeakIntervalMs) * Bucket->LeakRate;

    //
    // Apply leak (don't go below zero)
    //
    if (Bucket->Count > LeakAmount) {
      Bucket->Count -= LeakAmount;
    } else {
      Bucket->Count = 0;
    }

    //
    // Update last leak time (account for partial intervals)
    //
    Bucket->LastLeakTimestamp = CurrentTime -
                                (ElapsedMs % Bucket->LeakIntervalMs);
  }
}

//
// Record an error in the leaky bucket
// Returns TRUE if threshold exceeded
//
BOOLEAN
LeakyBucketRecordError (
  IN OUT LEAKY_BUCKET  *Bucket,
  IN     UINT32        ErrorWeight  OPTIONAL  // 0 = use default IncrementValue
  )
{
  UINT32 Increment;

  //
  // First, apply any pending leak based on elapsed time
  //
  ApplyLeak(Bucket);

  //
  // Add error to bucket
  //
  Increment = (ErrorWeight != 0) ? ErrorWeight : Bucket->IncrementValue;
  Bucket->Count += Increment;

  //
  // Cap at maximum (bucket overflow protection)
  //
  if (Bucket->Count > Bucket->MaxCount) {
    Bucket->Count = Bucket->MaxCount;
  }

  //
  // Check threshold
  //
  if (Bucket->Count >= Bucket->Threshold) {
    Bucket->ThresholdReached = TRUE;
    return TRUE;
  }

  return FALSE;
}

//
// Query current bucket status
//
VOID
GetLeakyBucketStatus (
  IN  LEAKY_BUCKET  *Bucket,
  OUT UINT32        *CurrentCount,
  OUT UINT32        *PercentFull,
  OUT BOOLEAN       *ThresholdReached
  )
{
  //
  // Apply leak before reporting status
  //
  ApplyLeak((LEAKY_BUCKET *)Bucket);

  *CurrentCount = Bucket->Count;
  *PercentFull = (Bucket->Count * 100) / Bucket->Threshold;
  *ThresholdReached = Bucket->ThresholdReached;
}

//
// Reset bucket (after repair action taken)
//
VOID
ResetLeakyBucket (
  IN OUT LEAKY_BUCKET  *Bucket
  )
{
  Bucket->Count = 0;
  Bucket->ThresholdReached = FALSE;
  Bucket->LastLeakTimestamp = GetCurrentTimestampMs();
}

Per-Component Leaky Buckets:

//
// Maintain separate leaky buckets for different error sources
//

//
// Memory: Per-DIMM and per-Row buckets
//
typedef struct {
  LEAKY_BUCKET  DimmBucket;       // DIMM-level CE tracking
  LEAKY_BUCKET  RowBuckets[MAX_TRACKED_ROWS];  // Row-level for PPR
  UINT32        RowAddresses[MAX_TRACKED_ROWS];
  UINT32        TrackedRowCount;
} DIMM_ERROR_TRACKING;

DIMM_ERROR_TRACKING mDimmTracking[MAX_SOCKET][MAX_CHANNEL][MAX_DIMM];

//
// Initialize memory error tracking with recommended thresholds
//
VOID
InitializeMemoryErrorTracking (
  VOID
  )
{
  UINT32 Socket, Channel, Dimm;

  for (Socket = 0; Socket < MAX_SOCKET; Socket++) {
    for (Channel = 0; Channel < MAX_CHANNEL; Channel++) {
      for (Dimm = 0; Dimm < MAX_DIMM; Dimm++) {
        //
        // DIMM-level bucket: 24 CEs in 24 hours triggers alert
        // Leak 1 CE per hour
        //
        InitializeLeakyBucket(
          &mDimmTracking[Socket][Channel][Dimm].DimmBucket,
          24,           // Threshold: 24 CEs
          1,            // Leak rate: 1 CE
          3600000,      // Leak interval: 1 hour (3600000 ms)
          1,            // Each CE adds 1
          48            // Max capacity: 48 (2x threshold)
        );

        //
        // Row-level buckets: 8 CEs on same row triggers PPR
        // Leak 1 CE per 4 hours (more aggressive for row-level)
        //
        for (UINT32 i = 0; i < MAX_TRACKED_ROWS; i++) {
          InitializeLeakyBucket(
            &mDimmTracking[Socket][Channel][Dimm].RowBuckets[i],
            8,            // Threshold: 8 CEs on same row
            1,            // Leak rate: 1 CE
            14400000,     // Leak interval: 4 hours
            1,            // Each CE adds 1
            16            // Max capacity
          );
        }
      }
    }
  }
}

//
// Handle correctable memory error with leaky bucket
//
EFI_STATUS
HandleCorrectableMemoryError (
  IN MEMORY_ERROR_INFO  *ErrorInfo
  )
{
  DIMM_ERROR_TRACKING *Tracking;
  BOOLEAN DimmThreshold, RowThreshold;
  UINT32 RowIndex;

  Tracking = &mDimmTracking[ErrorInfo->Node]
                           [GetChannelFromDimm(ErrorInfo->Module)]
                           [ErrorInfo->Module];

  //
  // Record in DIMM-level bucket
  //
  DimmThreshold = LeakyBucketRecordError(&Tracking->DimmBucket, 0);

  if (DimmThreshold) {
    DEBUG((DEBUG_WARN, "DIMM CE threshold exceeded: Socket%d Ch%d Dimm%d\n",
      ErrorInfo->Node, GetChannelFromDimm(ErrorInfo->Module),
      ErrorInfo->Module));

    //
    // Trigger predictive failure alert
    //
    TriggerPredictiveFailure(ErrorInfo);
    NotifyBmcDimmFailing(ErrorInfo);

    //
    // Reset bucket after taking action
    //
    ResetLeakyBucket(&Tracking->DimmBucket);
  }

  //
  // Record in row-level bucket (for PPR decision)
  //
  RowIndex = FindOrAddRowTracking(Tracking, ErrorInfo->Row);
  if (RowIndex < MAX_TRACKED_ROWS) {
    RowThreshold = LeakyBucketRecordError(&Tracking->RowBuckets[RowIndex], 0);

    if (RowThreshold) {
      DEBUG((DEBUG_WARN, "Row CE threshold exceeded: Row 0x%x\n",
        ErrorInfo->Row));

      //
      // Attempt PPR on this row
      //
      AttemptPprOnRow(ErrorInfo);

      //
      // Reset row bucket after PPR attempt
      //
      ResetLeakyBucket(&Tracking->RowBuckets[RowIndex]);
    }
  }

  //
  // Log the CE regardless of threshold
  //
  LogCorrectableMemoryError(ErrorInfo);

  return EFI_SUCCESS;
}

//
// PCIe: Per-device leaky bucket
//
typedef struct {
  UINT16        Segment;
  UINT8         Bus;
  UINT8         Device;
  UINT8         Function;
  LEAKY_BUCKET  CeBucket;         // Correctable error tracking
  LEAKY_BUCKET  LinkRetrainBucket; // Link retrain tracking
} PCIE_DEVICE_TRACKING;

//
// Initialize PCIe device error tracking
// Different thresholds for different error types
//
VOID
InitializePcieDeviceTracking (
  IN  UINT16              Segment,
  IN  UINT8               Bus,
  IN  UINT8               Device,
  IN  UINT8               Function,
  OUT PCIE_DEVICE_TRACKING *Tracking
  )
{
  Tracking->Segment = Segment;
  Tracking->Bus = Bus;
  Tracking->Device = Device;
  Tracking->Function = Function;

  //
  // Correctable errors: 100 in 1 hour before link degradation warning
  //
  InitializeLeakyBucket(
    &Tracking->CeBucket,
    100,          // Threshold
    10,           // Leak 10 per interval
    360000,       // Leak interval: 6 minutes
    1,            // Each CE adds 1
    200           // Max capacity
  );

  //
  // Link retrains: 5 in 10 minutes indicates unstable link
  //
  InitializeLeakyBucket(
    &Tracking->LinkRetrainBucket,
    5,            // Threshold
    1,            // Leak 1 per interval
    120000,       // Leak interval: 2 minutes
    1,            // Each retrain adds 1
    10            // Max capacity
  );
}

Leaky Bucket Configuration Guidelines:

Error Source	Threshold	Leak Rate	Leak Interval	Rationale
DIMM CE	24	1	1 hour	Industry standard (24 in 24h)
Row CE	8	1	4 hours	Aggressive for PPR trigger
Processor CE	10	1	1 hour	Lower tolerance for CPU
PCIe CE	100	10	6 minutes	Higher tolerance, noisier
Link Retrain	5	1	2 minutes	Link stability check

Tuning Tips:

Higher threshold = fewer false positives, may miss degradation

Lower threshold = more sensitive, more false positives

Faster leak = requires sustained errors to trigger

Slower leak = catches spread-out patterns better

Start with industry standards, adjust based on fleet telemetry

Peripheral/PCIe Errors

Peripheral errors occur on I/O devices, PCIe buses, and device DMA operations. Advanced Error Reporting (AER) is the primary detection mechanism for PCIe devices.

Error Type	Source	Detection	Severity	Handling
Bad TLP	Transaction layer	AER	Correctable	Log, device continues
Bad DLLP	Data link layer	AER	Correctable	Log, retransmit
Replay timeout	Link retrain	AER	Correctable	Log, link retrain
Poisoned TLP	Data corruption	AER	Uncorrectable	Device reset
Completion timeout	Request timeout	AER	Uncorrectable	Device reset
ECRC error	End-to-end CRC	AER	Uncorrectable	Device reset
Surprise link down	Hot-remove/failure	AER	Fatal	Device offline
DL protocol error	Link layer	AER	Fatal	Link down

Peripheral Error Handling Characteristics:

//
// Peripheral/PCIe Error Handling
//
typedef struct {
  UINT16    SegmentNumber;        // PCIe segment
  UINT8     BusNumber;            // PCIe bus
  UINT8     DeviceNumber;         // PCIe device
  UINT8     FunctionNumber;       // PCIe function
  UINT32    UncorrectableStatus;  // AER uncorrectable status
  UINT32    CorrectableStatus;    // AER correctable status
  UINT32    HeaderLog[4];         // TLP header log
  UINT16    DeviceId;             // Device ID
  UINT16    VendorId;             // Vendor ID
  UINT8     DeviceSerialNumber[8]; // Device serial (if available)
} PCIE_ERROR_INFO;

EFI_STATUS
HandlePcieError (
  IN PCIE_ERROR_INFO  *ErrorInfo
  )
{
  //
  // PCIe error handling based on error type
  //

  //
  // CORRECTABLE ERRORS: Log and continue
  // - Bad TLP, Bad DLLP, Replay Timer, Receiver Error
  // Hardware handles recovery via retransmission
  //
  if (ErrorInfo->CorrectableStatus != 0) {
    LogCorrectablePcieError(ErrorInfo);

    //
    // Check for excessive correctable errors (link degradation)
    //
    if (CheckPcieCeThreshold(ErrorInfo)) {
      TriggerLinkDegradationWarning(ErrorInfo);
    }

    //
    // Clear correctable status
    //
    ClearAerCorrectableStatus(ErrorInfo);
    return EFI_SUCCESS;
  }

  //
  // UNCORRECTABLE NON-FATAL: Device-specific recovery
  // - Poisoned TLP, Completion Timeout, Unexpected Completion
  //
  if (IsNonFatalUncorrectable(ErrorInfo->UncorrectableStatus)) {
    CreateWheaRecord(ERROR_TYPE_PCIE,
                     0,  // No specific address
                     ErrorInfo,
                     sizeof(PCIE_ERROR_INFO));

    //
    // Device-specific recovery options:
    //

    //
    // Option 1: Function Level Reset (FLR)
    // Preferred for devices that support it
    //
    if (SupportsFunctionLevelReset(ErrorInfo)) {
      PerformFunctionLevelReset(ErrorInfo);
      ReinitializeDevice(ErrorInfo);
      return EFI_SUCCESS;  // Recovered via FLR
    }

    //
    // Option 2: Secondary Bus Reset
    // Resets device and all downstream devices
    //
    if (!IsRootPort(ErrorInfo)) {
      PerformSecondaryBusReset(ErrorInfo);
      ReinitializeDevice(ErrorInfo);
      return EFI_SUCCESS;  // Recovered via bus reset
    }

    //
    // Option 3: Hot-reset link
    //
    PerformHotReset(ErrorInfo);
    return EFI_SUCCESS;
  }

  //
  // FATAL ERRORS: Device/link offline
  // - Surprise Down, DL Protocol Error, Flow Control Protocol
  //
  LogFatalPcieError(ErrorInfo);
  CreateWheaRecord(ERROR_TYPE_PCIE, 0, ErrorInfo, sizeof(PCIE_ERROR_INFO));

  //
  // Contain the error - isolate the device
  //
  DisableDeviceIo(ErrorInfo);
  MarkDeviceOffline(ErrorInfo);

  //
  // Notify OS for device removal
  //
  NotifyDeviceRemoval(ErrorInfo);

  return EFI_DEVICE_ERROR;
}

Key Peripheral Error Differences:

Device isolation: Errors can be contained to specific devices
Hot-plug support: Devices can be reset/replaced without system reboot
Recovery hierarchy: FLR → Bus Reset → Hot Reset → Device Offline
Link-level vs device-level: Some errors affect the link, others just the device
WHEA section: gEfiPcieErrorSectionGuid

Error Type Comparison Summary

Aspect	System/Processor	Memory	Peripheral/PCIe
Detection	MCA banks, MCE	Memory controller, ECC	AER capability
Scope	Core/socket	DIMM/rank/page	Device/link
Isolation	Core offline	Page offline, sparing	Device offline
Hot-repair	Rarely	DIMM replacement	Device hot-swap
Containment	Difficult	Moderate	Excellent
Recovery	Limited	Good (redundancy)	Good (reset)
WHEA section	Processor Generic	Platform Memory	PCIe Error
Typical CE rate	Low	High	Moderate

Error Handling Decision Flow

flowchart TB
    START[Error Detected] --> TYPE{Error<br/>Type?}

    TYPE -->|System/Processor| SYS_PCC{PCC Bit<br/>Set?}
    SYS_PCC -->|Yes| SYS_FATAL[System Halt<br/>Context Corrupt]
    SYS_PCC -->|No| SYS_UC{Uncorrected?}
    SYS_UC -->|Yes| SYS_SRAR[Signal OS<br/>SRAR Recovery]
    SYS_UC -->|No| SYS_LOG[Log CE<br/>Check Threshold]

    TYPE -->|Memory| MEM_UC{Uncorrected?}
    MEM_UC -->|No| MEM_LOG[Log CE<br/>Track DIMM]
    MEM_UC -->|Yes| MEM_MIRROR{Mirrored?}
    MEM_MIRROR -->|Yes| MEM_FAILOVER[Failover to<br/>Mirror]
    MEM_MIRROR -->|No| MEM_SPARE{Spare<br/>Available?}
    MEM_SPARE -->|Yes| MEM_SPARING[Activate<br/>Spare Rank]
    MEM_SPARE -->|No| MEM_PAGE[Request Page<br/>Offline]

    TYPE -->|Peripheral| PCIE_SEV{Severity?}
    PCIE_SEV -->|Correctable| PCIE_LOG[Log Error<br/>Clear Status]
    PCIE_SEV -->|Non-Fatal| PCIE_FLR{Supports<br/>FLR?}
    PCIE_FLR -->|Yes| PCIE_RESET[Function<br/>Level Reset]
    PCIE_FLR -->|No| PCIE_BUS[Secondary<br/>Bus Reset]
    PCIE_SEV -->|Fatal| PCIE_OFFLINE[Device<br/>Offline]

    style SYS_FATAL fill:#e74c3c,color:#fff
    style MEM_FAILOVER fill:#2ecc71,color:#fff
    style MEM_SPARING fill:#2ecc71,color:#fff
    style PCIE_RESET fill:#3498db,color:#fff

Initialization

HEST (Hardware Error Source Table)

#include <IndustryStandard/Acpi62.h>

//
// HEST describes all hardware error sources in the system
//

#pragma pack(1)

typedef struct {
  EFI_ACPI_6_2_HARDWARE_ERROR_SOURCE_TABLE_HEADER  Header;
  // Error source structures follow
} MY_HEST_TABLE;

//
// Generic Hardware Error Source (GHES)
//
typedef struct {
  UINT16  Type;                       // 9 = Generic Hardware Error Source
  UINT16  SourceId;
  UINT16  RelatedSourceId;
  UINT8   Flags;
  UINT8   Enabled;
  UINT32  NumberOfRecordsToPreallocate;
  UINT32  MaxSectionsPerRecord;
  UINT32  MaxRawDataLength;
  EFI_ACPI_6_2_GENERIC_ADDRESS_STRUCTURE ErrorStatusAddress;
  EFI_ACPI_6_2_HARDWARE_ERROR_NOTIFICATION_STRUCTURE Notify;
  UINT32  ErrorStatusBlockLength;
} ACPI_HEST_GENERIC_HARDWARE_ERROR_SOURCE;

#pragma pack()

EFI_STATUS
BuildHestTable (
  OUT MY_HEST_TABLE  **Hest,
  OUT UINTN          *HestSize
  )
{
  MY_HEST_TABLE *Table;
  UINTN Size;
  ACPI_HEST_GENERIC_HARDWARE_ERROR_SOURCE *Ghes;

  //
  // Calculate size
  //
  Size = sizeof(EFI_ACPI_6_2_HARDWARE_ERROR_SOURCE_TABLE_HEADER) +
         sizeof(ACPI_HEST_GENERIC_HARDWARE_ERROR_SOURCE);

  Table = AllocateZeroPool(Size);
  if (Table == NULL) {
    return EFI_OUT_OF_RESOURCES;
  }

  //
  // Fill header
  //
  Table->Header.Header.Signature = EFI_ACPI_6_2_HARDWARE_ERROR_SOURCE_TABLE_SIGNATURE;
  Table->Header.Header.Length = (UINT32)Size;
  Table->Header.Header.Revision = EFI_ACPI_6_2_HARDWARE_ERROR_SOURCE_TABLE_REVISION;
  CopyMem(Table->Header.Header.OemId, "OEMID ", 6);
  Table->Header.Header.OemTableId = SIGNATURE_64('H','E','S','T','T','B','L',' ');
  Table->Header.ErrorSourceCount = 1;

  //
  // Add Generic Hardware Error Source
  //
  Ghes = (ACPI_HEST_GENERIC_HARDWARE_ERROR_SOURCE *)(&Table->Header + 1);
  Ghes->Type = EFI_ACPI_6_2_GENERIC_HARDWARE_ERROR;
  Ghes->SourceId = 0;
  Ghes->Enabled = 1;
  Ghes->NumberOfRecordsToPreallocate = 1;
  Ghes->MaxSectionsPerRecord = 1;

  //
  // Error status address (where firmware writes error info)
  //
  Ghes->ErrorStatusAddress.AddressSpaceId = EFI_ACPI_6_2_SYSTEM_MEMORY;
  Ghes->ErrorStatusAddress.RegisterBitWidth = 64;
  Ghes->ErrorStatusAddress.Address = AllocateErrorStatusBlock();

  //
  // Notification method
  //
  Ghes->Notify.Type = EFI_ACPI_6_2_HARDWARE_ERROR_NOTIFICATION_SCI;
  Ghes->Notify.Length = sizeof(EFI_ACPI_6_2_HARDWARE_ERROR_NOTIFICATION_STRUCTURE);

  //
  // Calculate checksum
  //
  Table->Header.Header.Checksum = CalculateChecksum((UINT8 *)Table, Size);

  *Hest = Table;
  *HestSize = Size;

  return EFI_SUCCESS;
}

Error Handler Registration

//
// Register platform error handler (typically in SMM)
//
typedef
EFI_STATUS
(EFIAPI *PLATFORM_ERROR_HANDLER)(
  IN UINT32  ErrorType,
  IN UINT64  ErrorAddress,
  IN VOID    *ErrorData,
  IN UINTN   ErrorDataSize
  );

PLATFORM_ERROR_HANDLER mErrorHandler;

EFI_STATUS
RegisterErrorHandler (
  IN PLATFORM_ERROR_HANDLER  Handler
  )
{
  mErrorHandler = Handler;
  return EFI_SUCCESS;
}

//
// Error handler implementation
//
EFI_STATUS
EFIAPI
PlatformErrorHandler (
  IN UINT32  ErrorType,
  IN UINT64  ErrorAddress,
  IN VOID    *ErrorData,
  IN UINTN   ErrorDataSize
  )
{
  //
  // 1. Log error to persistent storage (ERST)
  //
  LogErrorToErst(ErrorType, ErrorAddress, ErrorData, ErrorDataSize);

  //
  // 2. Send error to BMC (if available)
  //
  ReportErrorToBmc(ErrorType, ErrorAddress);

  //
  // 3. Create WHEA error record
  //
  CreateWheaRecord(ErrorType, ErrorAddress, ErrorData, ErrorDataSize);

  //
  // 4. Determine recovery action
  //
  if (ErrorType == ERROR_TYPE_FATAL) {
    //
    // Fatal error - system halt required
    //
    return EFI_DEVICE_ERROR;
  }

  //
  // Correctable/recoverable - continue
  //
  return EFI_SUCCESS;
}

Configuration

BERT (Boot Error Record Table)

//
// BERT reports errors that occurred during boot
//

#pragma pack(1)

typedef struct {
  EFI_ACPI_6_2_BOOT_ERROR_RECORD_TABLE_HEADER  Header;
} MY_BERT_TABLE;

typedef struct {
  UINT32  BlockStatus;
  UINT32  RawDataOffset;
  UINT32  RawDataLength;
  UINT32  DataLength;
  UINT32  ErrorSeverity;
  // Generic Error Data Entry follows
} BOOT_ERROR_REGION;

#pragma pack()

EFI_STATUS
CreateBootErrorRecord (
  IN UINT32  ErrorType,
  IN UINT8   *ErrorData,
  IN UINTN   ErrorDataSize
  )
{
  BOOT_ERROR_REGION *ErrorRegion;
  EFI_ACPI_6_2_GENERIC_ERROR_DATA_ENTRY_STRUCTURE *DataEntry;

  //
  // Allocate boot error region
  //
  ErrorRegion = AllocateReservedPool(
                  sizeof(BOOT_ERROR_REGION) +
                  sizeof(EFI_ACPI_6_2_GENERIC_ERROR_DATA_ENTRY_STRUCTURE) +
                  ErrorDataSize
                );

  if (ErrorRegion == NULL) {
    return EFI_OUT_OF_RESOURCES;
  }

  //
  // Fill error region
  //
  ErrorRegion->BlockStatus = 1;  // Uncorrectable error present
  ErrorRegion->ErrorSeverity = EFI_ACPI_6_2_ERROR_SEVERITY_FATAL;
  ErrorRegion->DataLength = sizeof(EFI_ACPI_6_2_GENERIC_ERROR_DATA_ENTRY_STRUCTURE) +
                            ErrorDataSize;

  //
  // Fill data entry
  //
  DataEntry = (EFI_ACPI_6_2_GENERIC_ERROR_DATA_ENTRY_STRUCTURE *)(ErrorRegion + 1);
  // Set SectionType GUID based on error type
  DataEntry->ErrorDataLength = (UINT32)ErrorDataSize;
  DataEntry->ErrorSeverity = EFI_ACPI_6_2_ERROR_SEVERITY_FATAL;

  //
  // Copy error data
  //
  CopyMem(DataEntry + 1, ErrorData, ErrorDataSize);

  //
  // Update BERT table pointer
  //
  UpdateBertTable(ErrorRegion);

  return EFI_SUCCESS;
}

ERST (Error Record Serialization Table)

//
// ERST provides persistent storage for error records
//

//
// ERST Actions
//
#define ERST_BEGIN_WRITE_OPERATION        0x00
#define ERST_BEGIN_READ_OPERATION         0x01
#define ERST_BEGIN_CLEAR_OPERATION        0x02
#define ERST_END_OPERATION                0x03
#define ERST_SET_RECORD_OFFSET            0x04
#define ERST_EXECUTE_OPERATION            0x05
#define ERST_CHECK_BUSY_STATUS            0x06
#define ERST_GET_COMMAND_STATUS           0x07
#define ERST_GET_RECORD_IDENTIFIER        0x08
#define ERST_SET_RECORD_IDENTIFIER        0x09
#define ERST_GET_RECORD_COUNT             0x0A
#define ERST_BEGIN_DUMMY_WRITE            0x0B
#define ERST_GET_ERROR_LOG_ADDRESS_RANGE  0x0D
#define ERST_GET_ERROR_LOG_ADDRESS_LENGTH 0x0E
#define ERST_GET_ERROR_LOG_ADDRESS_ATTR   0x0F
#define ERST_EXECUTE_TIMINGS              0x10

//
// ERST serialization entry
//
typedef struct {
  UINT8   SerializationAction;
  UINT8   Instruction;
  UINT8   Flags;
  UINT8   Reserved;
  EFI_ACPI_6_2_GENERIC_ADDRESS_STRUCTURE RegisterRegion;
  UINT64  Value;
  UINT64  Mask;
} ACPI_ERST_SERIALIZATION_INSTRUCTION_ENTRY;

EFI_STATUS
PersistErrorRecord (
  IN UINT64  RecordId,
  IN VOID    *ErrorRecord,
  IN UINTN   RecordSize
  )
{
  //
  // Implementation uses ERST actions:
  // 1. BEGIN_WRITE_OPERATION
  // 2. SET_RECORD_IDENTIFIER
  // 3. Write data to error log address range
  // 4. EXECUTE_OPERATION
  // 5. CHECK_BUSY_STATUS until complete
  // 6. GET_COMMAND_STATUS to verify success
  // 7. END_OPERATION
  //

  return EFI_SUCCESS;
}

EINJ (Error Injection)

//
// EINJ enables error injection for RAS testing
//

//
// EINJ Actions
//
#define EINJ_BEGIN_INJECTION_OPERATION  0x00
#define EINJ_GET_TRIGGER_ERROR_ACTION   0x01
#define EINJ_SET_ERROR_TYPE             0x02
#define EINJ_GET_ERROR_TYPE             0x03
#define EINJ_END_OPERATION              0x04
#define EINJ_EXECUTE_OPERATION          0x05
#define EINJ_CHECK_BUSY_STATUS          0x06
#define EINJ_GET_COMMAND_STATUS         0x07
#define EINJ_SET_ERROR_TYPE_WITH_ADDR   0x08

//
// Error types for injection
//
#define EINJ_ERROR_PROCESSOR_CORRECTABLE         0x00000001
#define EINJ_ERROR_PROCESSOR_UNCORRECTABLE       0x00000002
#define EINJ_ERROR_PROCESSOR_FATAL               0x00000004
#define EINJ_ERROR_MEMORY_CORRECTABLE            0x00000008
#define EINJ_ERROR_MEMORY_UNCORRECTABLE          0x00000010
#define EINJ_ERROR_MEMORY_FATAL                  0x00000020
#define EINJ_ERROR_PCIE_CORRECTABLE              0x00000040
#define EINJ_ERROR_PCIE_UNCORRECTABLE            0x00000080
#define EINJ_ERROR_PCIE_FATAL                    0x00000100
#define EINJ_ERROR_PLATFORM_CORRECTABLE          0x00000200
#define EINJ_ERROR_PLATFORM_UNCORRECTABLE        0x00000400
#define EINJ_ERROR_PLATFORM_FATAL                0x00000800

EFI_STATUS
InjectError (
  IN UINT32  ErrorType,
  IN UINT64  Address  OPTIONAL
  )
{
  //
  // Error injection process:
  // 1. BEGIN_INJECTION_OPERATION
  // 2. SET_ERROR_TYPE or SET_ERROR_TYPE_WITH_ADDR
  // 3. EXECUTE_OPERATION
  // 4. GET_TRIGGER_ERROR_ACTION (if needed)
  // 5. Execute trigger action
  // 6. END_OPERATION
  //

  DEBUG((DEBUG_INFO, "Injecting error type 0x%x at 0x%lx\n", ErrorType, Address));

  return EFI_SUCCESS;
}

Porting Guide

Platform RAS Configuration

#
# Platform DSC file - RAS configuration
#

[PcdsFixedAtBuild]
  # Enable WHEA support
  gEfiMdeModulePkgTokenSpaceGuid.PcdWheaSupport|TRUE

  # Error log size
  gPlatformPkgTokenSpaceGuid.PcdErrorLogSize|0x10000

  # Enable error injection (debug only)
  gPlatformPkgTokenSpaceGuid.PcdEnableEinj|TRUE

[Components]
  # RAS infrastructure
  $(PLATFORM_PKG)/Ras/WheaDxe/WheaDxe.inf
  $(PLATFORM_PKG)/Ras/HestDxe/HestDxe.inf
  $(PLATFORM_PKG)/Ras/BertDxe/BertDxe.inf
  $(PLATFORM_PKG)/Ras/ErstDxe/ErstDxe.inf

  # SMM error handler
  $(PLATFORM_PKG)/Ras/RasSmm/RasSmm.inf

MCA (Machine Check Architecture)

//
// Machine Check Architecture error handling
//

#include <Register/Intel/Msr.h>

//
// Read Machine Check Bank registers
//
EFI_STATUS
ReadMcaBanks (
  VOID
  )
{
  UINT32 McaBankCount;
  UINT32 Bank;
  UINT64 McgCap;
  UINT64 McStatus;
  UINT64 McAddr;
  UINT64 McMisc;

  //
  // Get number of MCA banks
  //
  McgCap = AsmReadMsr64(MSR_IA32_MCG_CAP);
  McaBankCount = (UINT32)(McgCap & 0xFF);

  DEBUG((DEBUG_INFO, "MCA Banks: %d\n", McaBankCount));

  for (Bank = 0; Bank < McaBankCount; Bank++) {
    McStatus = AsmReadMsr64(MSR_IA32_MC0_STATUS + Bank * 4);

    if (McStatus & BIT63) {  // Valid bit
      McAddr = AsmReadMsr64(MSR_IA32_MC0_ADDR + Bank * 4);
      McMisc = AsmReadMsr64(MSR_IA32_MC0_MISC + Bank * 4);

      DEBUG((DEBUG_ERROR, "MCA Bank %d Error:\n", Bank));
      DEBUG((DEBUG_ERROR, "  Status: 0x%016lx\n", McStatus));
      DEBUG((DEBUG_ERROR, "  Address: 0x%016lx\n", McAddr));
      DEBUG((DEBUG_ERROR, "  Misc: 0x%016lx\n", McMisc));

      //
      // Process error based on MCA bank type
      //
      ProcessMcaError(Bank, McStatus, McAddr, McMisc);

      //
      // Clear error (write 0 to status)
      //
      AsmWriteMsr64(MSR_IA32_MC0_STATUS + Bank * 4, 0);
    }
  }

  return EFI_SUCCESS;
}

PCIe AER (Advanced Error Reporting)

//
// PCIe Advanced Error Reporting
//

#include <IndustryStandard/PciExpress31.h>

EFI_STATUS
HandlePcieAerError (
  IN UINT8  Bus,
  IN UINT8  Device,
  IN UINT8  Function
  )
{
  UINT32 UncorrectableStatus;
  UINT32 CorrectableStatus;
  PCI_EXPRESS_EXTENDED_CAPABILITIES_ADVANCED_ERROR_REPORTING *Aer;

  //
  // Find AER capability
  //
  Aer = FindPcieCapability(Bus, Device, Function,
                           PCI_EXPRESS_EXTENDED_CAPABILITY_ADVANCED_ERROR_REPORTING_ID);

  if (Aer == NULL) {
    return EFI_NOT_FOUND;
  }

  //
  // Read error status
  //
  UncorrectableStatus = Aer->UncorrectableErrorStatus;
  CorrectableStatus = Aer->CorrectableErrorStatus;

  if (UncorrectableStatus != 0) {
    DEBUG((DEBUG_ERROR, "PCIe Uncorrectable Error on %02x:%02x.%x: 0x%08x\n",
      Bus, Device, Function, UncorrectableStatus));

    //
    // Log error
    //
    LogPcieError(Bus, Device, Function, UncorrectableStatus, FALSE);

    //
    // Clear status
    //
    Aer->UncorrectableErrorStatus = UncorrectableStatus;
  }

  if (CorrectableStatus != 0) {
    DEBUG((DEBUG_INFO, "PCIe Correctable Error on %02x:%02x.%x: 0x%08x\n",
      Bus, Device, Function, CorrectableStatus));

    LogPcieError(Bus, Device, Function, CorrectableStatus, TRUE);
    Aer->CorrectableErrorStatus = CorrectableStatus;
  }

  return EFI_SUCCESS;
}

WHEA Error Records

Creating WHEA Records

//
// WHEA (Windows Hardware Error Architecture) compatible error records
//

#include <Guid/Cper.h>

EFI_STATUS
CreateWheaRecord (
  IN UINT32                ErrorType,
  IN UINT64                ErrorAddress,
  IN VOID                  *ErrorData,
  IN UINTN                 ErrorDataSize
  )
{
  EFI_COMMON_ERROR_RECORD_HEADER *CperHeader;
  EFI_ERROR_SECTION_DESCRIPTOR *SectionDesc;
  VOID *SectionData;
  UINTN RecordSize;

  //
  // Calculate record size
  //
  RecordSize = sizeof(EFI_COMMON_ERROR_RECORD_HEADER) +
               sizeof(EFI_ERROR_SECTION_DESCRIPTOR) +
               ErrorDataSize;

  CperHeader = AllocateZeroPool(RecordSize);
  if (CperHeader == NULL) {
    return EFI_OUT_OF_RESOURCES;
  }

  //
  // Fill CPER header
  //
  CperHeader->SignatureStart = EFI_ERROR_RECORD_SIGNATURE_START;
  CperHeader->Revision = EFI_ERROR_RECORD_REVISION;
  CperHeader->SignatureEnd = EFI_ERROR_RECORD_SIGNATURE_END;
  CperHeader->SectionCount = 1;
  CperHeader->ErrorSeverity = GetErrorSeverity(ErrorType);
  CperHeader->ValidationBits = 0;
  CperHeader->RecordLength = (UINT32)RecordSize;

  //
  // Generate unique record ID
  //
  CperHeader->RecordID = GenerateRecordId();

  //
  // Get current time
  //
  GetTime(&CperHeader->TimeStamp);

  //
  // Fill section descriptor
  //
  SectionDesc = (EFI_ERROR_SECTION_DESCRIPTOR *)(CperHeader + 1);
  SectionDesc->SectionOffset = sizeof(EFI_COMMON_ERROR_RECORD_HEADER) +
                               sizeof(EFI_ERROR_SECTION_DESCRIPTOR);
  SectionDesc->SectionLength = (UINT32)ErrorDataSize;
  SectionDesc->Revision = EFI_ERROR_SECTION_REVISION;
  SectionDesc->SectionFlags = 0;

  //
  // Set section type based on error
  //
  switch (ErrorType) {
    case ERROR_TYPE_MEMORY:
      CopyGuid(&SectionDesc->SectionType, &gEfiPlatformMemoryErrorSectionGuid);
      break;
    case ERROR_TYPE_PCIE:
      CopyGuid(&SectionDesc->SectionType, &gEfiPcieErrorSectionGuid);
      break;
    case ERROR_TYPE_PROCESSOR:
      CopyGuid(&SectionDesc->SectionType, &gEfiProcessorGenericErrorSectionGuid);
      break;
    default:
      CopyGuid(&SectionDesc->SectionType, &gEfiFirmwareErrorSectionGuid);
  }

  //
  // Copy section data
  //
  SectionData = (VOID *)((UINT8 *)CperHeader + SectionDesc->SectionOffset);
  CopyMem(SectionData, ErrorData, ErrorDataSize);

  //
  // Persist record
  //
  PersistErrorRecord(CperHeader->RecordID, CperHeader, RecordSize);

  FreePool(CperHeader);

  return EFI_SUCCESS;
}

Example: RAS Status Display

/** @file
  RAS Status Display
**/

#include <Uefi.h>
#include <Library/UefiLib.h>
#include <Library/UefiBootServicesTableLib.h>
#include <IndustryStandard/Acpi62.h>

EFI_STATUS
EFIAPI
UefiMain (
  IN EFI_HANDLE        ImageHandle,
  IN EFI_SYSTEM_TABLE  *SystemTable
  )
{
  EFI_STATUS Status;
  VOID *Table;
  EFI_ACPI_6_2_ROOT_SYSTEM_DESCRIPTION_POINTER *Rsdp;

  Print(L"=== RAS Configuration ===\n\n");

  //
  // Find ACPI tables
  //
  Status = EfiGetSystemConfigurationTable(&gEfiAcpi20TableGuid, (VOID **)&Rsdp);
  if (EFI_ERROR(Status)) {
    Print(L"ACPI tables not found\n");
    return Status;
  }

  //
  // Check for HEST
  //
  Status = FindAcpiTable(EFI_ACPI_6_2_HARDWARE_ERROR_SOURCE_TABLE_SIGNATURE, &Table);
  if (!EFI_ERROR(Status)) {
    EFI_ACPI_6_2_HARDWARE_ERROR_SOURCE_TABLE_HEADER *Hest = Table;
    Print(L"HEST (Hardware Error Sources):\n");
    Print(L"  Error Source Count: %d\n", Hest->ErrorSourceCount);
    Print(L"  Table Length: %d bytes\n\n", Hest->Header.Length);
  } else {
    Print(L"HEST: Not present\n\n");
  }

  //
  // Check for BERT
  //
  Status = FindAcpiTable(EFI_ACPI_6_2_BOOT_ERROR_RECORD_TABLE_SIGNATURE, &Table);
  if (!EFI_ERROR(Status)) {
    EFI_ACPI_6_2_BOOT_ERROR_RECORD_TABLE_HEADER *Bert = Table;
    Print(L"BERT (Boot Error Record):\n");
    Print(L"  Error Region Length: %d bytes\n", Bert->BootErrorRegionLength);
    Print(L"  Error Region: 0x%016lx\n\n", Bert->BootErrorRegion);
  } else {
    Print(L"BERT: Not present (no boot errors)\n\n");
  }

  //
  // Check for ERST
  //
  Status = FindAcpiTable(EFI_ACPI_6_2_ERROR_RECORD_SERIALIZATION_TABLE_SIGNATURE, &Table);
  if (!EFI_ERROR(Status)) {
    EFI_ACPI_6_2_ERROR_RECORD_SERIALIZATION_TABLE_HEADER *Erst = Table;
    Print(L"ERST (Error Serialization):\n");
    Print(L"  Serialization Header Length: %d\n", Erst->SerializationHeaderLength);
    Print(L"  Instruction Entry Count: %d\n\n", Erst->InstructionEntryCount);
  } else {
    Print(L"ERST: Not present\n\n");
  }

  //
  // Check for EINJ
  //
  Status = FindAcpiTable(EFI_ACPI_6_2_ERROR_INJECTION_TABLE_SIGNATURE, &Table);
  if (!EFI_ERROR(Status)) {
    Print(L"EINJ (Error Injection): Present\n");
    Print(L"  (Error injection available for testing)\n\n");
  } else {
    Print(L"EINJ: Not present\n\n");
  }

  Print(L"Press any key to exit...\n");
  {
    EFI_INPUT_KEY Key;
    UINTN Index;
    gBS->WaitForEvent(1, &gST->ConIn->WaitForKey, &Index);
    gST->ConIn->ReadKeyStroke(gST->ConIn, &Key);
  }

  return EFI_SUCCESS;
}

Specification Reference

ACPI 6.4 Specification: Chapter 18 - APEI
UEFI Specification: Platform Error Handling
WHEA Specification: Windows Hardware Error Architecture

Summary

HEST describes hardware error sources
BERT captures boot-time errors
ERST provides persistent error storage
EINJ enables error injection testing
MCA/AER handle CPU and PCIe errors
WHEA records standardize error reporting

Next Steps

Chapter 22: eSPI Interface - Enhanced SPI
Chapter 23: ARM UEFI - ARM development

Server Focus: RAS features are primarily for server and enterprise platforms where reliability is critical.