Chapter 21: RAS Features
Reliability, Availability, and Serviceability - WHEA, APEI, and error handling.
Overview
When to Use RAS Features
Use RAS features when you need to:
- Handle hardware errors (CPU, memory, PCIe) gracefully
- Log errors for post-mortem analysis and reporting
- Support error injection for validation testing
- Meet enterprise/server reliability requirements
| Scenario | APEI Table | Purpose |
|---|---|---|
| Error source enumeration | HEST | Define CPU, memory, PCIe error sources |
| Boot error recording | BERT | Report errors from previous boot |
| Persistent error storage | ERST | Non-volatile error log |
| Error injection testing | EINJ | Trigger synthetic errors |
| Error severity handling | HEST | Correctable vs fatal classification |
RAS Feature Selection:
| Platform Type | RAS Level | Features |
|---|---|---|
| Client | Minimal | Basic MCA, no APEI |
| Workstation | Moderate | MCA, ECC reporting |
| Server | Full | Complete APEI, WHEA, hot-plug |
| Mission-critical | Maximum | Redundancy, live migration |
Who Implements RAS:
| Role | RAS Tasks |
|---|---|
| Silicon vendor | MCA, AER, error detection logic |
| Platform developer | APEI tables, error handlers |
| BIOS engineer | Error handling integration |
| OS vendor | WHEA consumer, error reporting |
| Reliability engineer | EINJ testing, validation |
Key RAS Concepts:
- WHEA: Windows Hardware Error Architecture
- MCA/MCE: Machine Check Architecture/Exception
- AER: Advanced Error Reporting (PCIe)
- ECC: Error Correcting Code (memory)
- Correctable vs Uncorrectable: Severity determines handling
RAS Architecture
RAS features ensure system reliability through error detection, logging, and recovery:
flowchart TB
subgraph Hardware Errors
MCA[Machine Check<br/>CPU Errors]
AER[PCIe AER<br/>Bus Errors]
ECC[Memory ECC<br/>RAM Errors]
THERMAL[Thermal<br/>Overtemperature]
end
subgraph UEFI RAS
HEST[HEST Table<br/>Error Sources]
HANDLER[Error Handlers]
BERT[BERT Table<br/>Boot Errors]
ERST[ERST Table<br/>Error Storage]
end
subgraph Reporting
WHEA[WHEA Records]
SEL[BMC SEL]
OS_LOG[OS Error Log]
end
MCA --> HANDLER
AER --> HANDLER
ECC --> HANDLER
THERMAL --> HANDLER
HEST --> HANDLER
HANDLER --> BERT
HANDLER --> ERST
HANDLER --> WHEA
WHEA --> SEL
WHEA --> OS_LOG
style HANDLER fill:#e74c3c,color:#fff
style WHEA fill:#3498db,color:#fff
style HEST fill:#2ecc71,color:#fff
APEI Tables
| Table | Full Name | Purpose |
|---|---|---|
| HEST | Hardware Error Source Table | Describes error sources |
| BERT | Boot Error Record Table | Boot-time errors |
| ERST | Error Record Serialization Table | Persistent error storage |
| EINJ | Error Injection Table | Error injection for testing |
Error Severity Levels
| Severity | Action | Example |
|---|---|---|
| Corrected | Log only | Single-bit ECC |
| Recoverable | OS handles | PCIe non-fatal |
| Fatal | System halt | Uncorrectable memory |
| Informational | Advisory | Predictive failure |
RAS Error Types and Handling
Understanding the different categories of RAS errors and their specific handling requirements is critical for platform developers. Each error type has unique detection mechanisms, containment strategies, and recovery options.
Error Type Categories
flowchart TB
subgraph System["System/Processor Errors"]
MCA[Machine Check<br/>Architecture]
CACHE[Cache Errors]
TLB[TLB Errors]
BUS_SYS[Internal Bus<br/>Errors]
end
subgraph Memory["Memory Errors"]
ECC_SE[Single-bit ECC<br/>Correctable]
ECC_ME[Multi-bit ECC<br/>Uncorrectable]
PATROL[Patrol Scrub<br/>Errors]
ADDR[Address/Command<br/>Parity]
end
subgraph Peripheral["Peripheral/PCIe Errors"]
AER_CE[AER Correctable<br/>Bad TLP, Bad DLLP]
AER_UE[AER Uncorrectable<br/>Poisoned TLP]
AER_FE[AER Fatal<br/>ECRC, Link Down]
DMA[DMA Errors]
end
MCA --> HANDLER[Error Handler]
CACHE --> HANDLER
TLB --> HANDLER
BUS_SYS --> HANDLER
ECC_SE --> HANDLER
ECC_ME --> HANDLER
PATROL --> HANDLER
ADDR --> HANDLER
AER_CE --> HANDLER
AER_UE --> HANDLER
AER_FE --> HANDLER
DMA --> HANDLER
style System fill:#e74c3c,color:#fff
style Memory fill:#3498db,color:#fff
style Peripheral fill:#2ecc71,color:#fff
System/Processor Errors
System errors originate from CPU cores, caches, and internal processor buses. They are detected via Machine Check Architecture (MCA) and require immediate attention due to potential data corruption.
| Error Type | Source | Detection | Severity | Handling |
|---|---|---|---|---|
| Cache ECC | L1/L2/L3 cache | MCA bank | Correctable/Fatal | Log, possible core isolation |
| TLB Parity | Translation Lookaside Buffer | MCA bank | Fatal | Core reset required |
| Internal Bus | QPI/UPI/Infinity Fabric | MCA bank | Fatal | System halt |
| Microcode | Instruction decode | MCA bank | Fatal | Immediate halt |
| Register File | CPU registers | MCA bank | Fatal | Core offline |
System Error Handling Characteristics:
//
// System/Processor Error Handling
//
typedef struct {
UINT32 BankNumber; // MCA bank that detected error
UINT64 McaStatus; // MC_STATUS register
UINT64 McaAddress; // MC_ADDR register (faulting address)
UINT64 McaMisc; // MC_MISC register (additional info)
BOOLEAN Overflow; // Multiple errors pending
BOOLEAN Uncorrected; // UC bit - uncorrectable
BOOLEAN EnabledError; // EN bit - error was enabled
BOOLEAN ProcessorContextCorrupt; // PCC bit - context corrupted
} SYSTEM_ERROR_INFO;
EFI_STATUS
HandleSystemError (
IN SYSTEM_ERROR_INFO *ErrorInfo
)
{
//
// Check if processor context is corrupted (PCC bit)
// If PCC=1, no reliable recovery is possible
//
if (ErrorInfo->ProcessorContextCorrupt) {
//
// FATAL: System must halt - context cannot be trusted
//
LogFatalError(ERROR_TYPE_PROCESSOR, ErrorInfo);
return EFI_DEVICE_ERROR; // Signal system halt
}
//
// Uncorrected but recoverable (UC=1, PCC=0)
// OS can attempt recovery (e.g., kill affected process)
//
if (ErrorInfo->Uncorrected) {
CreateWheaRecord(ERROR_TYPE_PROCESSOR,
ErrorInfo->McaAddress,
ErrorInfo,
sizeof(SYSTEM_ERROR_INFO));
//
// Signal OS for SRAR (Software Recoverable Action Required)
//
return EFI_WARN_STALE_DATA;
}
//
// Corrected error - log and continue
//
LogCorrectableError(ERROR_TYPE_PROCESSOR, ErrorInfo);
IncrementCeCounter(ErrorInfo->BankNumber);
//
// Check threshold - too many correctable errors may indicate
// impending failure
//
if (CheckCeThreshold(ErrorInfo->BankNumber)) {
TriggerPredictiveFailureAlert(ErrorInfo->BankNumber);
}
return EFI_SUCCESS;
}
Key System Error Differences:
- Immediacy: System errors often require immediate handling (SMI/NMI)
- Context corruption: PCC bit indicates if processor state can be trusted
- Recovery scope: May require core isolation or full system reset
- WHEA section:
gEfiProcessorGenericErrorSectionGuid
Memory Errors
Memory errors occur in DRAM, memory controllers, or memory buses. ECC (Error Correcting Code) is the primary detection mechanism. Memory errors are the most common RAS events in server environments.
| Error Type | Source | Detection | Severity | Handling |
|---|---|---|---|---|
| Single-bit ECC | DRAM cell | Memory controller | Correctable | Log, continue |
| Multi-bit ECC | DRAM cells | Memory controller | Uncorrectable | Page offline |
| Address parity | Command/Address bus | Memory controller | Fatal | Channel offline |
| Patrol scrub | Background scrubbing | Memory controller | Correctable | Log, schedule replacement |
| Memory mirror failover | Redundant channel | Memory controller | Recoverable | Failover to mirror |
| DIMM thermal | Over-temperature | Thermal sensor | Recoverable | Throttle or offline |
Memory Error Handling Characteristics:
//
// Memory Error Handling
//
typedef struct {
UINT64 PhysicalAddress; // Faulting physical address
UINT64 PhysicalAddressMask; // Mask for affected range
UINT16 Node; // NUMA node
UINT16 Card; // Memory card/riser
UINT16 Module; // DIMM slot
UINT16 Bank; // Bank within DIMM
UINT16 Row; // Row address
UINT16 Column; // Column address
UINT8 BitPosition; // Failed bit (if single-bit)
UINT8 ErrorType; // Single-bit, multi-bit, etc.
BOOLEAN Corrected; // Was error corrected by ECC?
} MEMORY_ERROR_INFO;
EFI_STATUS
HandleMemoryError (
IN MEMORY_ERROR_INFO *ErrorInfo
)
{
//
// Memory error handling differs based on correctability
//
if (ErrorInfo->Corrected) {
//
// CORRECTABLE: Single-bit ECC corrected by hardware
// - Log for trend analysis
// - Track per-DIMM CE counts
// - No immediate action required
//
LogCorrectableMemoryError(ErrorInfo);
IncrementDimmCeCount(ErrorInfo->Node, ErrorInfo->Card, ErrorInfo->Module);
//
// Check CE threshold for predictive failure
// (e.g., 24 CEs in 24 hours = likely DIMM failure)
//
if (CheckDimmCeThreshold(ErrorInfo->Node, ErrorInfo->Card, ErrorInfo->Module)) {
//
// Request proactive DIMM replacement
//
TriggerPredictiveFailure(ErrorInfo);
NotifyBmcDimmFailing(ErrorInfo);
}
return EFI_SUCCESS;
}
//
// UNCORRECTABLE: Multi-bit error - cannot recover data
//
CreateWheaRecord(ERROR_TYPE_MEMORY,
ErrorInfo->PhysicalAddress,
ErrorInfo,
sizeof(MEMORY_ERROR_INFO));
//
// Memory-specific recovery options:
//
//
// Option 1: Page offline (if OS supports memory hot-remove)
// Mark page as bad so OS won't use it
//
if (SupportsPageOffline()) {
RequestPageOffline(ErrorInfo->PhysicalAddress);
return EFI_SUCCESS; // Recoverable
}
//
// Option 2: Memory mirroring failover
//
if (IsMemoryMirrored(ErrorInfo->Node, ErrorInfo->Card)) {
FailoverToMirror(ErrorInfo);
return EFI_SUCCESS; // Recoverable via redundancy
}
//
// Option 3: Rank sparing
//
if (HasSpareRank(ErrorInfo->Node, ErrorInfo->Card)) {
ActivateSpareRank(ErrorInfo);
return EFI_SUCCESS; // Recoverable via sparing
}
//
// No recovery possible - signal fatal
//
return EFI_DEVICE_ERROR;
}
Key Memory Error Differences:
- Gradual degradation: Memory errors often increase over time before failure
- Physical location: Errors map to specific DIMM/rank/bank for replacement
- Recovery options: Page offline, mirroring, rank sparing, DIMM replacement
- Threshold tracking: CE counts predict impending UE failures
- WHEA section:
gEfiPlatformMemoryErrorSectionGuid
Post Package Repair (PPR)
Post Package Repair (PPR) is a DRAM technology that allows faulty memory rows to be replaced with spare rows within the DRAM package itself. This enables in-field repair without physical DIMM replacement.
PPR is defined in JEDEC standards (DDR4/DDR5) and provides two repair modes:
| PPR Type | Name | Persistence | Usage | Limitations |
|---|---|---|---|---|
| sPPR | Soft PPR | Until power cycle | Runtime repair | Lost on reboot, limited repairs |
| hPPR | Hard PPR | Permanent | Manufacturing/BIOS | One-time per row, irreversible |
When to Use PPR:
| Scenario | PPR Type | Rationale |
|---|---|---|
| Repeated CE on same row | sPPR first | Test repair before committing |
| Confirmed bad row | hPPR | Permanent fix, survives reboot |
| Runtime error mitigation | sPPR | Quick fix, evaluate later |
| DIMM qualification | hPPR | Factory repair of weak cells |
| Pre-boot repair | hPPR | Fix known bad rows at POST |
flowchart LR
subgraph Detection
CE[Correctable<br/>Errors] --> TRACK[Track Row<br/>Address]
TRACK --> THRESH{Threshold<br/>Exceeded?}
end
subgraph Repair
THRESH -->|Yes| SPPR[Apply Soft PPR<br/>sPPR]
SPPR --> VERIFY{Errors<br/>Continue?}
VERIFY -->|No| MONITOR[Monitor<br/>Continue]
VERIFY -->|Yes| HPPR[Apply Hard PPR<br/>hPPR]
HPPR --> PERMANENT[Permanent<br/>Repair]
end
THRESH -->|No| MONITOR
style SPPR fill:#3498db,color:#fff
style HPPR fill:#e74c3c,color:#fff
style PERMANENT fill:#2ecc71,color:#fff
PPR Implementation:
//
// Post Package Repair (PPR) Implementation
//
//
// PPR Mode Register definitions (DDR4/DDR5)
//
#define MR4_PPR_SOFT_MODE BIT5 // sPPR mode select
#define MR4_PPR_HARD_MODE BIT4 // hPPR mode select
//
// PPR Guard Key sequence (vendor-specific, example)
//
#define PPR_GUARD_KEY_0 0x0F
#define PPR_GUARD_KEY_1 0xF0
#define PPR_GUARD_KEY_2 0x55
#define PPR_GUARD_KEY_3 0xAA
typedef struct {
UINT8 Socket;
UINT8 Channel;
UINT8 Dimm;
UINT8 Rank;
UINT8 BankGroup;
UINT8 Bank;
UINT32 Row; // Failing row address
UINT8 PprType; // 0=sPPR, 1=hPPR
BOOLEAN GuardKeyRequired; // Vendor-specific protection
} PPR_ADDRESS;
typedef enum {
PprTypeSoft = 0, // Soft PPR - temporary
PprTypeHard = 1 // Hard PPR - permanent
} PPR_TYPE;
typedef enum {
PprSuccess = 0,
PprResourceExhausted, // No spare rows available
PprAlreadyRepaired, // Row already repaired (hPPR)
PprNotSupported, // DIMM doesn't support PPR
PprGuardKeyFailed, // Guard key sequence failed
PprVerifyFailed // Post-repair verification failed
} PPR_STATUS;
//
// Check if DIMM supports PPR
//
BOOLEAN
IsPprSupported (
IN UINT8 Socket,
IN UINT8 Channel,
IN UINT8 Dimm,
OUT BOOLEAN *SoftPprSupported,
OUT BOOLEAN *HardPprSupported
)
{
SPD_DATA *Spd;
//
// Read SPD to check PPR support
// DDR4: SPD byte 9 (bits 7:6)
// DDR5: SPD byte 9
//
Spd = GetDimmSpd(Socket, Channel, Dimm);
if (Spd == NULL) {
return FALSE;
}
//
// Check soft PPR support
//
*SoftPprSupported = (Spd->ModuleType.Bits.SoftPpr == 1);
//
// Check hard PPR support (one row per bank group)
//
*HardPprSupported = (Spd->ModuleType.Bits.HardPpr == 1);
return (*SoftPprSupported || *HardPprSupported);
}
//
// Get available PPR resources
//
EFI_STATUS
GetPprResources (
IN UINT8 Socket,
IN UINT8 Channel,
IN UINT8 Dimm,
OUT UINT32 *SpareRowsAvailable,
OUT UINT32 *SpareRowsUsed
)
{
//
// Query memory controller for PPR resource status
// This is platform/silicon-specific
//
*SpareRowsAvailable = ReadMcPprSpareCount(Socket, Channel, Dimm);
*SpareRowsUsed = ReadMcPprUsedCount(Socket, Channel, Dimm);
return EFI_SUCCESS;
}
//
// Execute Soft PPR (runtime, temporary)
//
PPR_STATUS
ExecuteSoftPpr (
IN PPR_ADDRESS *Address
)
{
PPR_STATUS Status;
DEBUG((DEBUG_INFO, "sPPR: Socket%d Ch%d Dimm%d Rank%d BG%d Bank%d Row0x%x\n",
Address->Socket, Address->Channel, Address->Dimm,
Address->Rank, Address->BankGroup, Address->Bank, Address->Row));
//
// 1. Enter PPR mode via MR4
//
WriteModeRegister(Address, MR4, MR4_PPR_SOFT_MODE);
//
// 2. Issue guard key sequence (if required by vendor)
//
if (Address->GuardKeyRequired) {
WriteModeRegister(Address, MR0, PPR_GUARD_KEY_0);
WriteModeRegister(Address, MR0, PPR_GUARD_KEY_1);
WriteModeRegister(Address, MR0, PPR_GUARD_KEY_2);
WriteModeRegister(Address, MR0, PPR_GUARD_KEY_3);
}
//
// 3. Activate the failing row (WRA command to target row)
//
IssueActivate(Address->Socket, Address->Channel, Address->Dimm,
Address->Rank, Address->BankGroup, Address->Bank,
Address->Row);
//
// 4. Wait for tPPR (PPR time, typically 1-2us for sPPR)
//
MicroSecondDelay(2);
//
// 5. Exit PPR mode
//
WriteModeRegister(Address, MR4, 0);
//
// 6. Verify repair by reading the row
//
Status = VerifyPprRepair(Address);
if (Status != PprSuccess) {
DEBUG((DEBUG_ERROR, "sPPR verification failed\n"));
return PprVerifyFailed;
}
//
// Log successful repair
//
LogPprEvent(Address, PprTypeSoft, PprSuccess);
return PprSuccess;
}
//
// Execute Hard PPR (permanent, requires special handling)
//
PPR_STATUS
ExecuteHardPpr (
IN PPR_ADDRESS *Address
)
{
UINT32 SpareAvailable;
UINT32 SpareUsed;
PPR_STATUS Status;
DEBUG((DEBUG_WARN, "hPPR: Socket%d Ch%d Dimm%d Rank%d BG%d Bank%d Row0x%x\n",
Address->Socket, Address->Channel, Address->Dimm,
Address->Rank, Address->BankGroup, Address->Bank, Address->Row));
//
// 1. Check if spare rows are available
//
GetPprResources(Address->Socket, Address->Channel, Address->Dimm,
&SpareAvailable, &SpareUsed);
if (SpareAvailable == 0) {
DEBUG((DEBUG_ERROR, "hPPR: No spare rows available\n"));
return PprResourceExhausted;
}
//
// 2. CRITICAL: hPPR requires system in specific state
// - Typically done during POST or maintenance window
// - Requires write leveling to be disabled
// - May require elevated voltage (tPPR_H timing)
//
PrepareForHardPpr(Address);
//
// 3. Enter hPPR mode via MR4
//
WriteModeRegister(Address, MR4, MR4_PPR_HARD_MODE);
//
// 4. Issue guard key sequence (always required for hPPR)
//
WriteModeRegister(Address, MR0, PPR_GUARD_KEY_0);
WriteModeRegister(Address, MR0, PPR_GUARD_KEY_1);
WriteModeRegister(Address, MR0, PPR_GUARD_KEY_2);
WriteModeRegister(Address, MR0, PPR_GUARD_KEY_3);
//
// 5. Activate the failing row
//
IssueActivate(Address->Socket, Address->Channel, Address->Dimm,
Address->Rank, Address->BankGroup, Address->Bank,
Address->Row);
//
// 6. Wait for tPPR_H (hPPR time, typically 250ms-1s for eFuse programming)
//
MilliSecondDelay(500);
//
// 7. Exit PPR mode
//
WriteModeRegister(Address, MR4, 0);
//
// 8. Restore normal operation
//
RestoreAfterHardPpr(Address);
//
// 9. Verify repair
//
Status = VerifyPprRepair(Address);
if (Status != PprSuccess) {
DEBUG((DEBUG_ERROR, "hPPR verification failed - spare row may be bad\n"));
return PprVerifyFailed;
}
//
// 10. Record repair in persistent storage (for inventory tracking)
//
RecordHardPprInNvram(Address);
LogPprEvent(Address, PprTypeHard, PprSuccess);
DEBUG((DEBUG_INFO, "hPPR: Success - Spare rows remaining: %d\n",
SpareAvailable - 1));
return PprSuccess;
}
//
// Integrate PPR with memory error handling
//
EFI_STATUS
HandleMemoryErrorWithPpr (
IN MEMORY_ERROR_INFO *ErrorInfo
)
{
PPR_ADDRESS PprAddr;
PPR_STATUS PprStatus;
BOOLEAN SoftSupported, HardSupported;
//
// Check if this row has excessive correctable errors
//
if (!ShouldAttemptPpr(ErrorInfo)) {
return HandleMemoryError(ErrorInfo); // Standard handling
}
//
// Build PPR address from error info
//
PprAddr.Socket = ErrorInfo->Node;
PprAddr.Channel = GetChannelFromAddress(ErrorInfo->PhysicalAddress);
PprAddr.Dimm = ErrorInfo->Module;
PprAddr.Rank = GetRankFromAddress(ErrorInfo->PhysicalAddress);
PprAddr.BankGroup = GetBankGroupFromAddress(ErrorInfo->PhysicalAddress);
PprAddr.Bank = ErrorInfo->Bank;
PprAddr.Row = ErrorInfo->Row;
//
// Check PPR support
//
if (!IsPprSupported(PprAddr.Socket, PprAddr.Channel, PprAddr.Dimm,
&SoftSupported, &HardSupported)) {
DEBUG((DEBUG_WARN, "PPR not supported on this DIMM\n"));
return HandleMemoryError(ErrorInfo);
}
//
// Try soft PPR first (reversible, faster)
//
if (SoftSupported) {
PprStatus = ExecuteSoftPpr(&PprAddr);
if (PprStatus == PprSuccess) {
//
// Monitor for continued errors - if they persist,
// may need hPPR on next boot
//
SchedulePprFollowUp(&PprAddr);
return EFI_SUCCESS;
}
}
//
// If sPPR failed or not supported, schedule hPPR for next boot
// (hPPR typically requires controlled environment - POST)
//
if (HardSupported) {
ScheduleHardPprOnNextBoot(&PprAddr);
DEBUG((DEBUG_INFO, "hPPR scheduled for next boot\n"));
}
//
// Fall back to standard memory error handling
//
return HandleMemoryError(ErrorInfo);
}
PPR Comparison Table:
| Aspect | Soft PPR (sPPR) | Hard PPR (hPPR) |
|---|---|---|
| Persistence | Lost on power cycle | Permanent (eFuse) |
| Timing | ~2µs | ~250ms-1s |
| When to use | Runtime, testing | POST, confirmed failures |
| Reversible | Yes | No |
| Spare rows | Unlimited reuse | One per bank group |
| Guard key | Optional | Required |
| System state | Normal operation | Controlled (POST) |
| Risk level | Low | Moderate (irreversible) |
PPR Best Practices:
PPR Strategy:
- Track row-level CE counts (not just DIMM-level)
- Apply sPPR first to verify repair effectiveness
- Schedule hPPR during maintenance windows
- Record all hPPR operations for DIMM lifetime tracking
- Consider DIMM replacement if hPPR resources exhausted
Leaky Bucket Error Counting
Leaky Bucket Algorithm is a rate-limiting technique for error threshold management. It prevents false positives from transient error bursts while detecting persistent degradation patterns that indicate hardware failure.
The leaky bucket works by:
- Incrementing a counter when errors occur
- Decrementing (leaking) the counter over time
- Triggering action when the counter exceeds a threshold
flowchart LR
subgraph Input
CE1[CE Event] --> BUCKET
CE2[CE Event] --> BUCKET
CE3[CE Event] --> BUCKET
end
subgraph Bucket["Leaky Bucket"]
BUCKET[Counter<br/>Value] --> LEAK[Time-based<br/>Leak]
LEAK --> BUCKET
end
subgraph Output
BUCKET --> CHECK{Above<br/>Threshold?}
CHECK -->|Yes| ACTION[Trigger<br/>Action]
CHECK -->|No| WAIT[Continue<br/>Monitoring]
end
style BUCKET fill:#3498db,color:#fff
style ACTION fill:#e74c3c,color:#fff
Why Leaky Bucket vs Simple Counting:
| Scenario | Simple Counter | Leaky Bucket | Correct Action |
|---|---|---|---|
| 10 CEs in 1 second, then none | Threshold hit | Counter drains | No action (transient) |
| 1 CE per hour for 24 hours | Below threshold | Steady accumulation | Action needed (degrading) |
| Burst during stress test | False positive | Drains after test | No action (expected) |
| Gradual increase over weeks | May miss pattern | Detects trend | Action needed |
Leaky Bucket Implementation:
//
// Leaky Bucket Error Counter Implementation
//
typedef struct {
UINT32 Count; // Current error count (bucket level)
UINT32 Threshold; // Action threshold
UINT32 LeakRate; // Decrements per leak interval
UINT32 LeakIntervalMs; // Time between leaks (milliseconds)
UINT64 LastLeakTimestamp; // Last leak time (TSC or timer)
UINT32 IncrementValue; // How much each error adds
UINT32 MaxCount; // Bucket capacity (ceiling)
BOOLEAN ThresholdReached; // Latched threshold indicator
} LEAKY_BUCKET;
//
// Initialize a leaky bucket
//
VOID
InitializeLeakyBucket (
OUT LEAKY_BUCKET *Bucket,
IN UINT32 Threshold,
IN UINT32 LeakRate,
IN UINT32 LeakIntervalMs,
IN UINT32 IncrementValue,
IN UINT32 MaxCount
)
{
ZeroMem(Bucket, sizeof(LEAKY_BUCKET));
Bucket->Threshold = Threshold;
Bucket->LeakRate = LeakRate;
Bucket->LeakIntervalMs = LeakIntervalMs;
Bucket->IncrementValue = IncrementValue;
Bucket->MaxCount = MaxCount;
Bucket->LastLeakTimestamp = GetCurrentTimestampMs();
}
//
// Apply time-based leak to the bucket
//
VOID
ApplyLeak (
IN OUT LEAKY_BUCKET *Bucket
)
{
UINT64 CurrentTime;
UINT64 ElapsedMs;
UINT32 LeakAmount;
CurrentTime = GetCurrentTimestampMs();
ElapsedMs = CurrentTime - Bucket->LastLeakTimestamp;
//
// Calculate how many leak intervals have passed
//
if (ElapsedMs >= Bucket->LeakIntervalMs) {
LeakAmount = (UINT32)(ElapsedMs / Bucket->LeakIntervalMs) * Bucket->LeakRate;
//
// Apply leak (don't go below zero)
//
if (Bucket->Count > LeakAmount) {
Bucket->Count -= LeakAmount;
} else {
Bucket->Count = 0;
}
//
// Update last leak time (account for partial intervals)
//
Bucket->LastLeakTimestamp = CurrentTime -
(ElapsedMs % Bucket->LeakIntervalMs);
}
}
//
// Record an error in the leaky bucket
// Returns TRUE if threshold exceeded
//
BOOLEAN
LeakyBucketRecordError (
IN OUT LEAKY_BUCKET *Bucket,
IN UINT32 ErrorWeight OPTIONAL // 0 = use default IncrementValue
)
{
UINT32 Increment;
//
// First, apply any pending leak based on elapsed time
//
ApplyLeak(Bucket);
//
// Add error to bucket
//
Increment = (ErrorWeight != 0) ? ErrorWeight : Bucket->IncrementValue;
Bucket->Count += Increment;
//
// Cap at maximum (bucket overflow protection)
//
if (Bucket->Count > Bucket->MaxCount) {
Bucket->Count = Bucket->MaxCount;
}
//
// Check threshold
//
if (Bucket->Count >= Bucket->Threshold) {
Bucket->ThresholdReached = TRUE;
return TRUE;
}
return FALSE;
}
//
// Query current bucket status
//
VOID
GetLeakyBucketStatus (
IN LEAKY_BUCKET *Bucket,
OUT UINT32 *CurrentCount,
OUT UINT32 *PercentFull,
OUT BOOLEAN *ThresholdReached
)
{
//
// Apply leak before reporting status
//
ApplyLeak((LEAKY_BUCKET *)Bucket);
*CurrentCount = Bucket->Count;
*PercentFull = (Bucket->Count * 100) / Bucket->Threshold;
*ThresholdReached = Bucket->ThresholdReached;
}
//
// Reset bucket (after repair action taken)
//
VOID
ResetLeakyBucket (
IN OUT LEAKY_BUCKET *Bucket
)
{
Bucket->Count = 0;
Bucket->ThresholdReached = FALSE;
Bucket->LastLeakTimestamp = GetCurrentTimestampMs();
}
Per-Component Leaky Buckets:
//
// Maintain separate leaky buckets for different error sources
//
//
// Memory: Per-DIMM and per-Row buckets
//
typedef struct {
LEAKY_BUCKET DimmBucket; // DIMM-level CE tracking
LEAKY_BUCKET RowBuckets[MAX_TRACKED_ROWS]; // Row-level for PPR
UINT32 RowAddresses[MAX_TRACKED_ROWS];
UINT32 TrackedRowCount;
} DIMM_ERROR_TRACKING;
DIMM_ERROR_TRACKING mDimmTracking[MAX_SOCKET][MAX_CHANNEL][MAX_DIMM];
//
// Initialize memory error tracking with recommended thresholds
//
VOID
InitializeMemoryErrorTracking (
VOID
)
{
UINT32 Socket, Channel, Dimm;
for (Socket = 0; Socket < MAX_SOCKET; Socket++) {
for (Channel = 0; Channel < MAX_CHANNEL; Channel++) {
for (Dimm = 0; Dimm < MAX_DIMM; Dimm++) {
//
// DIMM-level bucket: 24 CEs in 24 hours triggers alert
// Leak 1 CE per hour
//
InitializeLeakyBucket(
&mDimmTracking[Socket][Channel][Dimm].DimmBucket,
24, // Threshold: 24 CEs
1, // Leak rate: 1 CE
3600000, // Leak interval: 1 hour (3600000 ms)
1, // Each CE adds 1
48 // Max capacity: 48 (2x threshold)
);
//
// Row-level buckets: 8 CEs on same row triggers PPR
// Leak 1 CE per 4 hours (more aggressive for row-level)
//
for (UINT32 i = 0; i < MAX_TRACKED_ROWS; i++) {
InitializeLeakyBucket(
&mDimmTracking[Socket][Channel][Dimm].RowBuckets[i],
8, // Threshold: 8 CEs on same row
1, // Leak rate: 1 CE
14400000, // Leak interval: 4 hours
1, // Each CE adds 1
16 // Max capacity
);
}
}
}
}
}
//
// Handle correctable memory error with leaky bucket
//
EFI_STATUS
HandleCorrectableMemoryError (
IN MEMORY_ERROR_INFO *ErrorInfo
)
{
DIMM_ERROR_TRACKING *Tracking;
BOOLEAN DimmThreshold, RowThreshold;
UINT32 RowIndex;
Tracking = &mDimmTracking[ErrorInfo->Node]
[GetChannelFromDimm(ErrorInfo->Module)]
[ErrorInfo->Module];
//
// Record in DIMM-level bucket
//
DimmThreshold = LeakyBucketRecordError(&Tracking->DimmBucket, 0);
if (DimmThreshold) {
DEBUG((DEBUG_WARN, "DIMM CE threshold exceeded: Socket%d Ch%d Dimm%d\n",
ErrorInfo->Node, GetChannelFromDimm(ErrorInfo->Module),
ErrorInfo->Module));
//
// Trigger predictive failure alert
//
TriggerPredictiveFailure(ErrorInfo);
NotifyBmcDimmFailing(ErrorInfo);
//
// Reset bucket after taking action
//
ResetLeakyBucket(&Tracking->DimmBucket);
}
//
// Record in row-level bucket (for PPR decision)
//
RowIndex = FindOrAddRowTracking(Tracking, ErrorInfo->Row);
if (RowIndex < MAX_TRACKED_ROWS) {
RowThreshold = LeakyBucketRecordError(&Tracking->RowBuckets[RowIndex], 0);
if (RowThreshold) {
DEBUG((DEBUG_WARN, "Row CE threshold exceeded: Row 0x%x\n",
ErrorInfo->Row));
//
// Attempt PPR on this row
//
AttemptPprOnRow(ErrorInfo);
//
// Reset row bucket after PPR attempt
//
ResetLeakyBucket(&Tracking->RowBuckets[RowIndex]);
}
}
//
// Log the CE regardless of threshold
//
LogCorrectableMemoryError(ErrorInfo);
return EFI_SUCCESS;
}
//
// PCIe: Per-device leaky bucket
//
typedef struct {
UINT16 Segment;
UINT8 Bus;
UINT8 Device;
UINT8 Function;
LEAKY_BUCKET CeBucket; // Correctable error tracking
LEAKY_BUCKET LinkRetrainBucket; // Link retrain tracking
} PCIE_DEVICE_TRACKING;
//
// Initialize PCIe device error tracking
// Different thresholds for different error types
//
VOID
InitializePcieDeviceTracking (
IN UINT16 Segment,
IN UINT8 Bus,
IN UINT8 Device,
IN UINT8 Function,
OUT PCIE_DEVICE_TRACKING *Tracking
)
{
Tracking->Segment = Segment;
Tracking->Bus = Bus;
Tracking->Device = Device;
Tracking->Function = Function;
//
// Correctable errors: 100 in 1 hour before link degradation warning
//
InitializeLeakyBucket(
&Tracking->CeBucket,
100, // Threshold
10, // Leak 10 per interval
360000, // Leak interval: 6 minutes
1, // Each CE adds 1
200 // Max capacity
);
//
// Link retrains: 5 in 10 minutes indicates unstable link
//
InitializeLeakyBucket(
&Tracking->LinkRetrainBucket,
5, // Threshold
1, // Leak 1 per interval
120000, // Leak interval: 2 minutes
1, // Each retrain adds 1
10 // Max capacity
);
}
Leaky Bucket Configuration Guidelines:
| Error Source | Threshold | Leak Rate | Leak Interval | Rationale |
|---|---|---|---|---|
| DIMM CE | 24 | 1 | 1 hour | Industry standard (24 in 24h) |
| Row CE | 8 | 1 | 4 hours | Aggressive for PPR trigger |
| Processor CE | 10 | 1 | 1 hour | Lower tolerance for CPU |
| PCIe CE | 100 | 10 | 6 minutes | Higher tolerance, noisier |
| Link Retrain | 5 | 1 | 2 minutes | Link stability check |
Tuning Tips:
- Higher threshold = fewer false positives, may miss degradation
- Lower threshold = more sensitive, more false positives
- Faster leak = requires sustained errors to trigger
- Slower leak = catches spread-out patterns better
- Start with industry standards, adjust based on fleet telemetry
Peripheral/PCIe Errors
Peripheral errors occur on I/O devices, PCIe buses, and device DMA operations. Advanced Error Reporting (AER) is the primary detection mechanism for PCIe devices.
| Error Type | Source | Detection | Severity | Handling |
|---|---|---|---|---|
| Bad TLP | Transaction layer | AER | Correctable | Log, device continues |
| Bad DLLP | Data link layer | AER | Correctable | Log, retransmit |
| Replay timeout | Link retrain | AER | Correctable | Log, link retrain |
| Poisoned TLP | Data corruption | AER | Uncorrectable | Device reset |
| Completion timeout | Request timeout | AER | Uncorrectable | Device reset |
| ECRC error | End-to-end CRC | AER | Uncorrectable | Device reset |
| Surprise link down | Hot-remove/failure | AER | Fatal | Device offline |
| DL protocol error | Link layer | AER | Fatal | Link down |
Peripheral Error Handling Characteristics:
//
// Peripheral/PCIe Error Handling
//
typedef struct {
UINT16 SegmentNumber; // PCIe segment
UINT8 BusNumber; // PCIe bus
UINT8 DeviceNumber; // PCIe device
UINT8 FunctionNumber; // PCIe function
UINT32 UncorrectableStatus; // AER uncorrectable status
UINT32 CorrectableStatus; // AER correctable status
UINT32 HeaderLog[4]; // TLP header log
UINT16 DeviceId; // Device ID
UINT16 VendorId; // Vendor ID
UINT8 DeviceSerialNumber[8]; // Device serial (if available)
} PCIE_ERROR_INFO;
EFI_STATUS
HandlePcieError (
IN PCIE_ERROR_INFO *ErrorInfo
)
{
//
// PCIe error handling based on error type
//
//
// CORRECTABLE ERRORS: Log and continue
// - Bad TLP, Bad DLLP, Replay Timer, Receiver Error
// Hardware handles recovery via retransmission
//
if (ErrorInfo->CorrectableStatus != 0) {
LogCorrectablePcieError(ErrorInfo);
//
// Check for excessive correctable errors (link degradation)
//
if (CheckPcieCeThreshold(ErrorInfo)) {
TriggerLinkDegradationWarning(ErrorInfo);
}
//
// Clear correctable status
//
ClearAerCorrectableStatus(ErrorInfo);
return EFI_SUCCESS;
}
//
// UNCORRECTABLE NON-FATAL: Device-specific recovery
// - Poisoned TLP, Completion Timeout, Unexpected Completion
//
if (IsNonFatalUncorrectable(ErrorInfo->UncorrectableStatus)) {
CreateWheaRecord(ERROR_TYPE_PCIE,
0, // No specific address
ErrorInfo,
sizeof(PCIE_ERROR_INFO));
//
// Device-specific recovery options:
//
//
// Option 1: Function Level Reset (FLR)
// Preferred for devices that support it
//
if (SupportsFunctionLevelReset(ErrorInfo)) {
PerformFunctionLevelReset(ErrorInfo);
ReinitializeDevice(ErrorInfo);
return EFI_SUCCESS; // Recovered via FLR
}
//
// Option 2: Secondary Bus Reset
// Resets device and all downstream devices
//
if (!IsRootPort(ErrorInfo)) {
PerformSecondaryBusReset(ErrorInfo);
ReinitializeDevice(ErrorInfo);
return EFI_SUCCESS; // Recovered via bus reset
}
//
// Option 3: Hot-reset link
//
PerformHotReset(ErrorInfo);
return EFI_SUCCESS;
}
//
// FATAL ERRORS: Device/link offline
// - Surprise Down, DL Protocol Error, Flow Control Protocol
//
LogFatalPcieError(ErrorInfo);
CreateWheaRecord(ERROR_TYPE_PCIE, 0, ErrorInfo, sizeof(PCIE_ERROR_INFO));
//
// Contain the error - isolate the device
//
DisableDeviceIo(ErrorInfo);
MarkDeviceOffline(ErrorInfo);
//
// Notify OS for device removal
//
NotifyDeviceRemoval(ErrorInfo);
return EFI_DEVICE_ERROR;
}
Key Peripheral Error Differences:
- Device isolation: Errors can be contained to specific devices
- Hot-plug support: Devices can be reset/replaced without system reboot
- Recovery hierarchy: FLR → Bus Reset → Hot Reset → Device Offline
- Link-level vs device-level: Some errors affect the link, others just the device
- WHEA section:
gEfiPcieErrorSectionGuid
Error Type Comparison Summary
| Aspect | System/Processor | Memory | Peripheral/PCIe |
|---|---|---|---|
| Detection | MCA banks, MCE | Memory controller, ECC | AER capability |
| Scope | Core/socket | DIMM/rank/page | Device/link |
| Isolation | Core offline | Page offline, sparing | Device offline |
| Hot-repair | Rarely | DIMM replacement | Device hot-swap |
| Containment | Difficult | Moderate | Excellent |
| Recovery | Limited | Good (redundancy) | Good (reset) |
| WHEA section | Processor Generic | Platform Memory | PCIe Error |
| Typical CE rate | Low | High | Moderate |
Error Handling Decision Flow
flowchart TB
START[Error Detected] --> TYPE{Error<br/>Type?}
TYPE -->|System/Processor| SYS_PCC{PCC Bit<br/>Set?}
SYS_PCC -->|Yes| SYS_FATAL[System Halt<br/>Context Corrupt]
SYS_PCC -->|No| SYS_UC{Uncorrected?}
SYS_UC -->|Yes| SYS_SRAR[Signal OS<br/>SRAR Recovery]
SYS_UC -->|No| SYS_LOG[Log CE<br/>Check Threshold]
TYPE -->|Memory| MEM_UC{Uncorrected?}
MEM_UC -->|No| MEM_LOG[Log CE<br/>Track DIMM]
MEM_UC -->|Yes| MEM_MIRROR{Mirrored?}
MEM_MIRROR -->|Yes| MEM_FAILOVER[Failover to<br/>Mirror]
MEM_MIRROR -->|No| MEM_SPARE{Spare<br/>Available?}
MEM_SPARE -->|Yes| MEM_SPARING[Activate<br/>Spare Rank]
MEM_SPARE -->|No| MEM_PAGE[Request Page<br/>Offline]
TYPE -->|Peripheral| PCIE_SEV{Severity?}
PCIE_SEV -->|Correctable| PCIE_LOG[Log Error<br/>Clear Status]
PCIE_SEV -->|Non-Fatal| PCIE_FLR{Supports<br/>FLR?}
PCIE_FLR -->|Yes| PCIE_RESET[Function<br/>Level Reset]
PCIE_FLR -->|No| PCIE_BUS[Secondary<br/>Bus Reset]
PCIE_SEV -->|Fatal| PCIE_OFFLINE[Device<br/>Offline]
style SYS_FATAL fill:#e74c3c,color:#fff
style MEM_FAILOVER fill:#2ecc71,color:#fff
style MEM_SPARING fill:#2ecc71,color:#fff
style PCIE_RESET fill:#3498db,color:#fff
Initialization
HEST (Hardware Error Source Table)
#include <IndustryStandard/Acpi62.h>
//
// HEST describes all hardware error sources in the system
//
#pragma pack(1)
typedef struct {
EFI_ACPI_6_2_HARDWARE_ERROR_SOURCE_TABLE_HEADER Header;
// Error source structures follow
} MY_HEST_TABLE;
//
// Generic Hardware Error Source (GHES)
//
typedef struct {
UINT16 Type; // 9 = Generic Hardware Error Source
UINT16 SourceId;
UINT16 RelatedSourceId;
UINT8 Flags;
UINT8 Enabled;
UINT32 NumberOfRecordsToPreallocate;
UINT32 MaxSectionsPerRecord;
UINT32 MaxRawDataLength;
EFI_ACPI_6_2_GENERIC_ADDRESS_STRUCTURE ErrorStatusAddress;
EFI_ACPI_6_2_HARDWARE_ERROR_NOTIFICATION_STRUCTURE Notify;
UINT32 ErrorStatusBlockLength;
} ACPI_HEST_GENERIC_HARDWARE_ERROR_SOURCE;
#pragma pack()
EFI_STATUS
BuildHestTable (
OUT MY_HEST_TABLE **Hest,
OUT UINTN *HestSize
)
{
MY_HEST_TABLE *Table;
UINTN Size;
ACPI_HEST_GENERIC_HARDWARE_ERROR_SOURCE *Ghes;
//
// Calculate size
//
Size = sizeof(EFI_ACPI_6_2_HARDWARE_ERROR_SOURCE_TABLE_HEADER) +
sizeof(ACPI_HEST_GENERIC_HARDWARE_ERROR_SOURCE);
Table = AllocateZeroPool(Size);
if (Table == NULL) {
return EFI_OUT_OF_RESOURCES;
}
//
// Fill header
//
Table->Header.Header.Signature = EFI_ACPI_6_2_HARDWARE_ERROR_SOURCE_TABLE_SIGNATURE;
Table->Header.Header.Length = (UINT32)Size;
Table->Header.Header.Revision = EFI_ACPI_6_2_HARDWARE_ERROR_SOURCE_TABLE_REVISION;
CopyMem(Table->Header.Header.OemId, "OEMID ", 6);
Table->Header.Header.OemTableId = SIGNATURE_64('H','E','S','T','T','B','L',' ');
Table->Header.ErrorSourceCount = 1;
//
// Add Generic Hardware Error Source
//
Ghes = (ACPI_HEST_GENERIC_HARDWARE_ERROR_SOURCE *)(&Table->Header + 1);
Ghes->Type = EFI_ACPI_6_2_GENERIC_HARDWARE_ERROR;
Ghes->SourceId = 0;
Ghes->Enabled = 1;
Ghes->NumberOfRecordsToPreallocate = 1;
Ghes->MaxSectionsPerRecord = 1;
//
// Error status address (where firmware writes error info)
//
Ghes->ErrorStatusAddress.AddressSpaceId = EFI_ACPI_6_2_SYSTEM_MEMORY;
Ghes->ErrorStatusAddress.RegisterBitWidth = 64;
Ghes->ErrorStatusAddress.Address = AllocateErrorStatusBlock();
//
// Notification method
//
Ghes->Notify.Type = EFI_ACPI_6_2_HARDWARE_ERROR_NOTIFICATION_SCI;
Ghes->Notify.Length = sizeof(EFI_ACPI_6_2_HARDWARE_ERROR_NOTIFICATION_STRUCTURE);
//
// Calculate checksum
//
Table->Header.Header.Checksum = CalculateChecksum((UINT8 *)Table, Size);
*Hest = Table;
*HestSize = Size;
return EFI_SUCCESS;
}
Error Handler Registration
//
// Register platform error handler (typically in SMM)
//
typedef
EFI_STATUS
(EFIAPI *PLATFORM_ERROR_HANDLER)(
IN UINT32 ErrorType,
IN UINT64 ErrorAddress,
IN VOID *ErrorData,
IN UINTN ErrorDataSize
);
PLATFORM_ERROR_HANDLER mErrorHandler;
EFI_STATUS
RegisterErrorHandler (
IN PLATFORM_ERROR_HANDLER Handler
)
{
mErrorHandler = Handler;
return EFI_SUCCESS;
}
//
// Error handler implementation
//
EFI_STATUS
EFIAPI
PlatformErrorHandler (
IN UINT32 ErrorType,
IN UINT64 ErrorAddress,
IN VOID *ErrorData,
IN UINTN ErrorDataSize
)
{
//
// 1. Log error to persistent storage (ERST)
//
LogErrorToErst(ErrorType, ErrorAddress, ErrorData, ErrorDataSize);
//
// 2. Send error to BMC (if available)
//
ReportErrorToBmc(ErrorType, ErrorAddress);
//
// 3. Create WHEA error record
//
CreateWheaRecord(ErrorType, ErrorAddress, ErrorData, ErrorDataSize);
//
// 4. Determine recovery action
//
if (ErrorType == ERROR_TYPE_FATAL) {
//
// Fatal error - system halt required
//
return EFI_DEVICE_ERROR;
}
//
// Correctable/recoverable - continue
//
return EFI_SUCCESS;
}
Configuration
BERT (Boot Error Record Table)
//
// BERT reports errors that occurred during boot
//
#pragma pack(1)
typedef struct {
EFI_ACPI_6_2_BOOT_ERROR_RECORD_TABLE_HEADER Header;
} MY_BERT_TABLE;
typedef struct {
UINT32 BlockStatus;
UINT32 RawDataOffset;
UINT32 RawDataLength;
UINT32 DataLength;
UINT32 ErrorSeverity;
// Generic Error Data Entry follows
} BOOT_ERROR_REGION;
#pragma pack()
EFI_STATUS
CreateBootErrorRecord (
IN UINT32 ErrorType,
IN UINT8 *ErrorData,
IN UINTN ErrorDataSize
)
{
BOOT_ERROR_REGION *ErrorRegion;
EFI_ACPI_6_2_GENERIC_ERROR_DATA_ENTRY_STRUCTURE *DataEntry;
//
// Allocate boot error region
//
ErrorRegion = AllocateReservedPool(
sizeof(BOOT_ERROR_REGION) +
sizeof(EFI_ACPI_6_2_GENERIC_ERROR_DATA_ENTRY_STRUCTURE) +
ErrorDataSize
);
if (ErrorRegion == NULL) {
return EFI_OUT_OF_RESOURCES;
}
//
// Fill error region
//
ErrorRegion->BlockStatus = 1; // Uncorrectable error present
ErrorRegion->ErrorSeverity = EFI_ACPI_6_2_ERROR_SEVERITY_FATAL;
ErrorRegion->DataLength = sizeof(EFI_ACPI_6_2_GENERIC_ERROR_DATA_ENTRY_STRUCTURE) +
ErrorDataSize;
//
// Fill data entry
//
DataEntry = (EFI_ACPI_6_2_GENERIC_ERROR_DATA_ENTRY_STRUCTURE *)(ErrorRegion + 1);
// Set SectionType GUID based on error type
DataEntry->ErrorDataLength = (UINT32)ErrorDataSize;
DataEntry->ErrorSeverity = EFI_ACPI_6_2_ERROR_SEVERITY_FATAL;
//
// Copy error data
//
CopyMem(DataEntry + 1, ErrorData, ErrorDataSize);
//
// Update BERT table pointer
//
UpdateBertTable(ErrorRegion);
return EFI_SUCCESS;
}
ERST (Error Record Serialization Table)
//
// ERST provides persistent storage for error records
//
//
// ERST Actions
//
#define ERST_BEGIN_WRITE_OPERATION 0x00
#define ERST_BEGIN_READ_OPERATION 0x01
#define ERST_BEGIN_CLEAR_OPERATION 0x02
#define ERST_END_OPERATION 0x03
#define ERST_SET_RECORD_OFFSET 0x04
#define ERST_EXECUTE_OPERATION 0x05
#define ERST_CHECK_BUSY_STATUS 0x06
#define ERST_GET_COMMAND_STATUS 0x07
#define ERST_GET_RECORD_IDENTIFIER 0x08
#define ERST_SET_RECORD_IDENTIFIER 0x09
#define ERST_GET_RECORD_COUNT 0x0A
#define ERST_BEGIN_DUMMY_WRITE 0x0B
#define ERST_GET_ERROR_LOG_ADDRESS_RANGE 0x0D
#define ERST_GET_ERROR_LOG_ADDRESS_LENGTH 0x0E
#define ERST_GET_ERROR_LOG_ADDRESS_ATTR 0x0F
#define ERST_EXECUTE_TIMINGS 0x10
//
// ERST serialization entry
//
typedef struct {
UINT8 SerializationAction;
UINT8 Instruction;
UINT8 Flags;
UINT8 Reserved;
EFI_ACPI_6_2_GENERIC_ADDRESS_STRUCTURE RegisterRegion;
UINT64 Value;
UINT64 Mask;
} ACPI_ERST_SERIALIZATION_INSTRUCTION_ENTRY;
EFI_STATUS
PersistErrorRecord (
IN UINT64 RecordId,
IN VOID *ErrorRecord,
IN UINTN RecordSize
)
{
//
// Implementation uses ERST actions:
// 1. BEGIN_WRITE_OPERATION
// 2. SET_RECORD_IDENTIFIER
// 3. Write data to error log address range
// 4. EXECUTE_OPERATION
// 5. CHECK_BUSY_STATUS until complete
// 6. GET_COMMAND_STATUS to verify success
// 7. END_OPERATION
//
return EFI_SUCCESS;
}
EINJ (Error Injection)
//
// EINJ enables error injection for RAS testing
//
//
// EINJ Actions
//
#define EINJ_BEGIN_INJECTION_OPERATION 0x00
#define EINJ_GET_TRIGGER_ERROR_ACTION 0x01
#define EINJ_SET_ERROR_TYPE 0x02
#define EINJ_GET_ERROR_TYPE 0x03
#define EINJ_END_OPERATION 0x04
#define EINJ_EXECUTE_OPERATION 0x05
#define EINJ_CHECK_BUSY_STATUS 0x06
#define EINJ_GET_COMMAND_STATUS 0x07
#define EINJ_SET_ERROR_TYPE_WITH_ADDR 0x08
//
// Error types for injection
//
#define EINJ_ERROR_PROCESSOR_CORRECTABLE 0x00000001
#define EINJ_ERROR_PROCESSOR_UNCORRECTABLE 0x00000002
#define EINJ_ERROR_PROCESSOR_FATAL 0x00000004
#define EINJ_ERROR_MEMORY_CORRECTABLE 0x00000008
#define EINJ_ERROR_MEMORY_UNCORRECTABLE 0x00000010
#define EINJ_ERROR_MEMORY_FATAL 0x00000020
#define EINJ_ERROR_PCIE_CORRECTABLE 0x00000040
#define EINJ_ERROR_PCIE_UNCORRECTABLE 0x00000080
#define EINJ_ERROR_PCIE_FATAL 0x00000100
#define EINJ_ERROR_PLATFORM_CORRECTABLE 0x00000200
#define EINJ_ERROR_PLATFORM_UNCORRECTABLE 0x00000400
#define EINJ_ERROR_PLATFORM_FATAL 0x00000800
EFI_STATUS
InjectError (
IN UINT32 ErrorType,
IN UINT64 Address OPTIONAL
)
{
//
// Error injection process:
// 1. BEGIN_INJECTION_OPERATION
// 2. SET_ERROR_TYPE or SET_ERROR_TYPE_WITH_ADDR
// 3. EXECUTE_OPERATION
// 4. GET_TRIGGER_ERROR_ACTION (if needed)
// 5. Execute trigger action
// 6. END_OPERATION
//
DEBUG((DEBUG_INFO, "Injecting error type 0x%x at 0x%lx\n", ErrorType, Address));
return EFI_SUCCESS;
}
Porting Guide
Platform RAS Configuration
#
# Platform DSC file - RAS configuration
#
[PcdsFixedAtBuild]
# Enable WHEA support
gEfiMdeModulePkgTokenSpaceGuid.PcdWheaSupport|TRUE
# Error log size
gPlatformPkgTokenSpaceGuid.PcdErrorLogSize|0x10000
# Enable error injection (debug only)
gPlatformPkgTokenSpaceGuid.PcdEnableEinj|TRUE
[Components]
# RAS infrastructure
$(PLATFORM_PKG)/Ras/WheaDxe/WheaDxe.inf
$(PLATFORM_PKG)/Ras/HestDxe/HestDxe.inf
$(PLATFORM_PKG)/Ras/BertDxe/BertDxe.inf
$(PLATFORM_PKG)/Ras/ErstDxe/ErstDxe.inf
# SMM error handler
$(PLATFORM_PKG)/Ras/RasSmm/RasSmm.inf
MCA (Machine Check Architecture)
//
// Machine Check Architecture error handling
//
#include <Register/Intel/Msr.h>
//
// Read Machine Check Bank registers
//
EFI_STATUS
ReadMcaBanks (
VOID
)
{
UINT32 McaBankCount;
UINT32 Bank;
UINT64 McgCap;
UINT64 McStatus;
UINT64 McAddr;
UINT64 McMisc;
//
// Get number of MCA banks
//
McgCap = AsmReadMsr64(MSR_IA32_MCG_CAP);
McaBankCount = (UINT32)(McgCap & 0xFF);
DEBUG((DEBUG_INFO, "MCA Banks: %d\n", McaBankCount));
for (Bank = 0; Bank < McaBankCount; Bank++) {
McStatus = AsmReadMsr64(MSR_IA32_MC0_STATUS + Bank * 4);
if (McStatus & BIT63) { // Valid bit
McAddr = AsmReadMsr64(MSR_IA32_MC0_ADDR + Bank * 4);
McMisc = AsmReadMsr64(MSR_IA32_MC0_MISC + Bank * 4);
DEBUG((DEBUG_ERROR, "MCA Bank %d Error:\n", Bank));
DEBUG((DEBUG_ERROR, " Status: 0x%016lx\n", McStatus));
DEBUG((DEBUG_ERROR, " Address: 0x%016lx\n", McAddr));
DEBUG((DEBUG_ERROR, " Misc: 0x%016lx\n", McMisc));
//
// Process error based on MCA bank type
//
ProcessMcaError(Bank, McStatus, McAddr, McMisc);
//
// Clear error (write 0 to status)
//
AsmWriteMsr64(MSR_IA32_MC0_STATUS + Bank * 4, 0);
}
}
return EFI_SUCCESS;
}
PCIe AER (Advanced Error Reporting)
//
// PCIe Advanced Error Reporting
//
#include <IndustryStandard/PciExpress31.h>
EFI_STATUS
HandlePcieAerError (
IN UINT8 Bus,
IN UINT8 Device,
IN UINT8 Function
)
{
UINT32 UncorrectableStatus;
UINT32 CorrectableStatus;
PCI_EXPRESS_EXTENDED_CAPABILITIES_ADVANCED_ERROR_REPORTING *Aer;
//
// Find AER capability
//
Aer = FindPcieCapability(Bus, Device, Function,
PCI_EXPRESS_EXTENDED_CAPABILITY_ADVANCED_ERROR_REPORTING_ID);
if (Aer == NULL) {
return EFI_NOT_FOUND;
}
//
// Read error status
//
UncorrectableStatus = Aer->UncorrectableErrorStatus;
CorrectableStatus = Aer->CorrectableErrorStatus;
if (UncorrectableStatus != 0) {
DEBUG((DEBUG_ERROR, "PCIe Uncorrectable Error on %02x:%02x.%x: 0x%08x\n",
Bus, Device, Function, UncorrectableStatus));
//
// Log error
//
LogPcieError(Bus, Device, Function, UncorrectableStatus, FALSE);
//
// Clear status
//
Aer->UncorrectableErrorStatus = UncorrectableStatus;
}
if (CorrectableStatus != 0) {
DEBUG((DEBUG_INFO, "PCIe Correctable Error on %02x:%02x.%x: 0x%08x\n",
Bus, Device, Function, CorrectableStatus));
LogPcieError(Bus, Device, Function, CorrectableStatus, TRUE);
Aer->CorrectableErrorStatus = CorrectableStatus;
}
return EFI_SUCCESS;
}
WHEA Error Records
Creating WHEA Records
//
// WHEA (Windows Hardware Error Architecture) compatible error records
//
#include <Guid/Cper.h>
EFI_STATUS
CreateWheaRecord (
IN UINT32 ErrorType,
IN UINT64 ErrorAddress,
IN VOID *ErrorData,
IN UINTN ErrorDataSize
)
{
EFI_COMMON_ERROR_RECORD_HEADER *CperHeader;
EFI_ERROR_SECTION_DESCRIPTOR *SectionDesc;
VOID *SectionData;
UINTN RecordSize;
//
// Calculate record size
//
RecordSize = sizeof(EFI_COMMON_ERROR_RECORD_HEADER) +
sizeof(EFI_ERROR_SECTION_DESCRIPTOR) +
ErrorDataSize;
CperHeader = AllocateZeroPool(RecordSize);
if (CperHeader == NULL) {
return EFI_OUT_OF_RESOURCES;
}
//
// Fill CPER header
//
CperHeader->SignatureStart = EFI_ERROR_RECORD_SIGNATURE_START;
CperHeader->Revision = EFI_ERROR_RECORD_REVISION;
CperHeader->SignatureEnd = EFI_ERROR_RECORD_SIGNATURE_END;
CperHeader->SectionCount = 1;
CperHeader->ErrorSeverity = GetErrorSeverity(ErrorType);
CperHeader->ValidationBits = 0;
CperHeader->RecordLength = (UINT32)RecordSize;
//
// Generate unique record ID
//
CperHeader->RecordID = GenerateRecordId();
//
// Get current time
//
GetTime(&CperHeader->TimeStamp);
//
// Fill section descriptor
//
SectionDesc = (EFI_ERROR_SECTION_DESCRIPTOR *)(CperHeader + 1);
SectionDesc->SectionOffset = sizeof(EFI_COMMON_ERROR_RECORD_HEADER) +
sizeof(EFI_ERROR_SECTION_DESCRIPTOR);
SectionDesc->SectionLength = (UINT32)ErrorDataSize;
SectionDesc->Revision = EFI_ERROR_SECTION_REVISION;
SectionDesc->SectionFlags = 0;
//
// Set section type based on error
//
switch (ErrorType) {
case ERROR_TYPE_MEMORY:
CopyGuid(&SectionDesc->SectionType, &gEfiPlatformMemoryErrorSectionGuid);
break;
case ERROR_TYPE_PCIE:
CopyGuid(&SectionDesc->SectionType, &gEfiPcieErrorSectionGuid);
break;
case ERROR_TYPE_PROCESSOR:
CopyGuid(&SectionDesc->SectionType, &gEfiProcessorGenericErrorSectionGuid);
break;
default:
CopyGuid(&SectionDesc->SectionType, &gEfiFirmwareErrorSectionGuid);
}
//
// Copy section data
//
SectionData = (VOID *)((UINT8 *)CperHeader + SectionDesc->SectionOffset);
CopyMem(SectionData, ErrorData, ErrorDataSize);
//
// Persist record
//
PersistErrorRecord(CperHeader->RecordID, CperHeader, RecordSize);
FreePool(CperHeader);
return EFI_SUCCESS;
}
Example: RAS Status Display
/** @file
RAS Status Display
**/
#include <Uefi.h>
#include <Library/UefiLib.h>
#include <Library/UefiBootServicesTableLib.h>
#include <IndustryStandard/Acpi62.h>
EFI_STATUS
EFIAPI
UefiMain (
IN EFI_HANDLE ImageHandle,
IN EFI_SYSTEM_TABLE *SystemTable
)
{
EFI_STATUS Status;
VOID *Table;
EFI_ACPI_6_2_ROOT_SYSTEM_DESCRIPTION_POINTER *Rsdp;
Print(L"=== RAS Configuration ===\n\n");
//
// Find ACPI tables
//
Status = EfiGetSystemConfigurationTable(&gEfiAcpi20TableGuid, (VOID **)&Rsdp);
if (EFI_ERROR(Status)) {
Print(L"ACPI tables not found\n");
return Status;
}
//
// Check for HEST
//
Status = FindAcpiTable(EFI_ACPI_6_2_HARDWARE_ERROR_SOURCE_TABLE_SIGNATURE, &Table);
if (!EFI_ERROR(Status)) {
EFI_ACPI_6_2_HARDWARE_ERROR_SOURCE_TABLE_HEADER *Hest = Table;
Print(L"HEST (Hardware Error Sources):\n");
Print(L" Error Source Count: %d\n", Hest->ErrorSourceCount);
Print(L" Table Length: %d bytes\n\n", Hest->Header.Length);
} else {
Print(L"HEST: Not present\n\n");
}
//
// Check for BERT
//
Status = FindAcpiTable(EFI_ACPI_6_2_BOOT_ERROR_RECORD_TABLE_SIGNATURE, &Table);
if (!EFI_ERROR(Status)) {
EFI_ACPI_6_2_BOOT_ERROR_RECORD_TABLE_HEADER *Bert = Table;
Print(L"BERT (Boot Error Record):\n");
Print(L" Error Region Length: %d bytes\n", Bert->BootErrorRegionLength);
Print(L" Error Region: 0x%016lx\n\n", Bert->BootErrorRegion);
} else {
Print(L"BERT: Not present (no boot errors)\n\n");
}
//
// Check for ERST
//
Status = FindAcpiTable(EFI_ACPI_6_2_ERROR_RECORD_SERIALIZATION_TABLE_SIGNATURE, &Table);
if (!EFI_ERROR(Status)) {
EFI_ACPI_6_2_ERROR_RECORD_SERIALIZATION_TABLE_HEADER *Erst = Table;
Print(L"ERST (Error Serialization):\n");
Print(L" Serialization Header Length: %d\n", Erst->SerializationHeaderLength);
Print(L" Instruction Entry Count: %d\n\n", Erst->InstructionEntryCount);
} else {
Print(L"ERST: Not present\n\n");
}
//
// Check for EINJ
//
Status = FindAcpiTable(EFI_ACPI_6_2_ERROR_INJECTION_TABLE_SIGNATURE, &Table);
if (!EFI_ERROR(Status)) {
Print(L"EINJ (Error Injection): Present\n");
Print(L" (Error injection available for testing)\n\n");
} else {
Print(L"EINJ: Not present\n\n");
}
Print(L"Press any key to exit...\n");
{
EFI_INPUT_KEY Key;
UINTN Index;
gBS->WaitForEvent(1, &gST->ConIn->WaitForKey, &Index);
gST->ConIn->ReadKeyStroke(gST->ConIn, &Key);
}
return EFI_SUCCESS;
}
Specification Reference
- ACPI 6.4 Specification: Chapter 18 - APEI
- UEFI Specification: Platform Error Handling
- WHEA Specification: Windows Hardware Error Architecture
Summary
- HEST describes hardware error sources
- BERT captures boot-time errors
- ERST provides persistent error storage
- EINJ enables error injection testing
- MCA/AER handle CPU and PCIe errors
- WHEA records standardize error reporting
Next Steps
- Chapter 22: eSPI Interface - Enhanced SPI
- Chapter 23: ARM UEFI - ARM development
Server Focus: RAS features are primarily for server and enterprise platforms where reliability is critical.