AMD Debug & Management Guide

Configure AMD EPYC system management and debug capabilities on OpenBMC.

Table of Contents

  1. Overview
  2. AMD Management Interfaces
    1. SB-RMI (Sideband Remote Management Interface)
    2. SB-TSI (Sideband Thermal Sensor Interface)
    3. HDT (Hardware Debug Tool)
  3. APML Setup
    1. Build-Time Configuration (Yocto)
    2. Kernel Modules
    3. Device Tree Configuration
  4. APML Library Usage
    1. Installation
    2. Reading CPU Temperature
    3. Power Management
    4. RAS (Reliability, Availability, Serviceability)
    5. Mailbox Commands
  5. D-Bus Integration
    1. Sensor Integration
    2. Power Monitoring
  6. AMD OpenBMC Platform
    1. AMDESE OpenBMC
    2. Supported Platforms
  7. HDT Debug Access
    1. HDT Overview
    2. BMC JTAG Mux
    3. Commercial Solutions
  8. Crash Dump Collection
    1. RAS Error Logging
    2. BERT (Boot Error Record Table)
    3. Collecting Debug Data
  9. Troubleshooting
    1. APML Not Working
    2. Temperature Reading Fails
    3. Command Timeouts
    4. Permission Issues
  10. Security Considerations
    1. Access Control
    2. Audit Logging
    3. Production Recommendations
  11. References
  12. Deep Dive
    1. APML Communication Architecture
    2. SB-RMI Mailbox Protocol
    3. SB-TSI Temperature Sensing
    4. AMD MCA (Machine Check Architecture) Error Handling
    5. Source Code References

Overview

AMD EPYC platforms provide system management and debug capabilities through APML (Advanced Platform Management Link) and HDT (Hardware Debug Tool). This guide covers the BMC integration for AMD server platforms.

---
title: AMD Platform Management Architecture
---
flowchart TB
    subgraph bmcapps["BMC Applications"]
        direction LR
        telemetry["Telemetry<br/>Monitoring"]
        ras["RAS/Error<br/>Handling"]
        powermgmt["Power Mgmt<br/>Control"]
    end

    subgraph apml["APML Library"]
        apmllib["apml_sbrmi / apml_sbtsi drivers"]
    end

    subgraph bus["I2C / I3C Bus"]
        i2cbus["Communication Bus"]
    end

    subgraph cpu["AMD EPYC Processor"]
        direction LR
        sbrmi["SB-RMI<br/>(Remote Mgmt I/F)"]
        sbtsi["SB-TSI<br/>(Thermal Sensor)"]
        hdt["HDT Port<br/>(Hardware Debug)"]
    end

    bmcapps --> apml
    apml --> bus
    bus --> cpu
ASCII-art version (for comparison)
+----------------------------------------------------------------------+
|                  AMD Platform Management Architecture                |
+----------------------------------------------------------------------+
|                                                                      |
|  +-----------------------------------------------------------------+ |
|  |                       BMC Applications                          | |
|  |                                                                 | |
|  |   +-------------+  +-------------+  +-------------+             | |
|  |   | Telemetry   |  | RAS/Error   |  | Power Mgmt  |             | |
|  |   | Monitoring  |  | Handling    |  | Control     |             | |
|  |   +------+------+  +------+------+  +------+------+             | |
|  +---------+-----------------+----------------+--------------------+ |
|            |                 |                |                      |
|  +---------+-----------------+----------------+--------------------+ |
|  |                      APML Library                               | |
|  |            (apml_sbrmi / apml_sbtsi drivers)                    | |
|  +---------+-----------------+----------------+--------------------+ |
|            |                 |                |                      |
|  +---------+-----------------+----------------+--------------------+ |
|  |                    I2C / I3C Bus                                | |
|  +-----------------------------------------------------------------+ |
|            |                                                         |
|  +---------+-------------------------------------------------------+ |
|  |                    AMD EPYC Processor                           | |
|  |                                                                 | |
|  |   +-------------+  +-------------+  +-------------+             | |
|  |   | SB-RMI      |  | SB-TSI      |  | HDT Port    |             | |
|  |   | (Remote     |  | (Thermal    |  | (Hardware   |             | |
|  |   |  Mgmt I/F)  |  |  Sensor)    |  |  Debug)     |             | |
|  |   +-------------+  +-------------+  +-------------+             | |
|  +-----------------------------------------------------------------+ |
+----------------------------------------------------------------------+

AMD Management Interfaces

SB-RMI (Sideband Remote Management Interface)

Provides system management commands:

Function Description
Power Management RAPL power limits, throttling
RAS Machine check, error reporting
Mailbox Commands CPU configuration, CPUID
Performance Core boost, C-states

SB-TSI (Sideband Thermal Sensor Interface)

Provides thermal monitoring:

Function Description
CPU Temperature Die temperature reading
Thermal Alerts High/low temperature thresholds
Thermal Throttling PROCHOT status

HDT (Hardware Debug Tool)

Low-level debug access (requires special hardware/NDA):

Function Description
JTAG Access CPU debug via JTAG chain
Run Control Halt, step, breakpoints
Register Access MSR, GPR access
Trace Instruction trace

APML Setup

Build-Time Configuration (Yocto)

# In your machine .conf or local.conf

# Include APML support
IMAGE_INSTALL:append = " \
    apml-modules \
    apml-library \
    esmi-oob-library \
"

# Kernel configuration
KERNEL_MODULE_AUTOLOAD:append = " apml_sbrmi apml_sbtsi"

Kernel Modules

# Load APML modules
modprobe apml_sbrmi
modprobe apml_sbtsi

# Verify modules loaded
lsmod | grep apml

# Check I2C devices
i2cdetect -l
i2cdetect -y <bus>  # Find SB-RMI/SB-TSI addresses

Device Tree Configuration

&i2c3 {
    status = "okay";

    /* SB-TSI - Thermal sensor */
    sbtsi@4c {
        compatible = "amd,sbtsi";
        reg = <0x4c>;
    };

    /* SB-RMI - Remote management */
    sbrmi@3c {
        compatible = "amd,sbrmi";
        reg = <0x3c>;
    };
};

APML Library Usage

Installation

# Check APML library
ls /usr/lib/libapml*

# Check APML tools
which esmi_oob_tool

Reading CPU Temperature

# Via hwmon (SB-TSI driver)
cat /sys/class/hwmon/hwmon*/temp1_input

# Via APML library tool
esmi_oob_tool -s 0 --showtemprange

Power Management

# Read current power
esmi_oob_tool -s 0 --showpower

# Read power limit
esmi_oob_tool -s 0 --showpowerlimit

# Set power limit (in milliwatts)
esmi_oob_tool -s 0 --setpowerlimit 200000

# Read TDP
esmi_oob_tool -s 0 --showtdp

RAS (Reliability, Availability, Serviceability)

# Read MCA (Machine Check Architecture) status
esmi_oob_tool -s 0 --showmcastatus

# Read DIMM temperature
esmi_oob_tool -s 0 --showdimmtemp

# Read DIMM power
esmi_oob_tool -s 0 --showdimmpower

# Read DIMM thermal sensor
esmi_oob_tool -s 0 --showdimmthermal

Mailbox Commands

# Read CPUID
esmi_oob_tool -s 0 --showcpuid

# Read processor info
esmi_oob_tool -s 0 --showprocinfo

# Read boost limit
esmi_oob_tool -s 0 --showboostlimit

# Set boost limit per core
esmi_oob_tool -s 0 --setboostlimit <core> <limit>

D-Bus Integration

Sensor Integration

APML sensors integrate with OpenBMC’s sensor framework:

# Temperature sensors from SB-TSI appear in D-Bus
busctl tree xyz.openbmc_project.HwmonTempSensor

# Example: CPU temperature
busctl get-property xyz.openbmc_project.HwmonTempSensor \
    /xyz/openbmc_project/sensors/temperature/CPU0_Temp \
    xyz.openbmc_project.Sensor.Value Value

Power Monitoring

# Power sensors via APML
busctl tree xyz.openbmc_project.Sensor

# Socket power reading
busctl get-property xyz.openbmc_project.Sensor \
    /xyz/openbmc_project/sensors/power/CPU0_Power \
    xyz.openbmc_project.Sensor.Value Value

AMD OpenBMC Platform

AMDESE OpenBMC

AMD maintains an OpenBMC distribution for EPYC platforms:

# Clone AMD's OpenBMC
git clone https://github.com/AMDESE/OpenBMC.git

# Build for Genoa platform
cd OpenBMC
. setup onyx  # or other platform
bitbake obmc-phosphor-image

Supported Platforms

Platform Codename Socket
Onyx Genoa SP5
Quartz Genoa SP5
Ruby Turin SP5
Chalupa Genoa SP5

HDT Debug Access

HDT provides deep hardware access. Full documentation requires AMD NDA. Use only in controlled debug environments.

HDT Overview

HDT (Hardware Debug Tool) is AMD’s JTAG-based debug interface:

  • Accessed via dedicated debug header or BMC mux
  • Requires specialized debug tools (AMD HDT software or third-party)
  • Provides run-control (halt, step, breakpoints)
  • Full register and memory access

BMC JTAG Mux

Some platforms support JTAG muxing through BMC:

# Check for JTAG mux support
cat /sys/kernel/debug/gpio | grep -i jtag

# JTAG mux control (platform-specific)
gpioset gpiochip0 <jtag_mux_gpio>=1  # Route JTAG to BMC

Commercial Solutions

For at-scale debug, commercial tools provide BMC-embedded debug:

  • ASSET InterTech ScanWorks Embedded Diagnostics (SED)
  • Lauterbach TRACE32 with remote access
  • AMD HDT software (under NDA)

Crash Dump Collection

RAS Error Logging

# Machine Check errors are logged to SEL
ipmitool sel list | grep -i "Machine Check"

# View via Redfish
curl -k -u root:0penBmc \
    https://localhost/redfish/v1/Systems/system/LogServices/EventLog/Entries

BERT (Boot Error Record Table)

# BERT records are available via ACPI
cat /sys/firmware/acpi/tables/BERT

# Decoded in system logs
dmesg | grep -i "BERT"
journalctl | grep -i "hardware error"

Collecting Debug Data

# Collect system state dump
dreport -d /tmp/debug -n amd_debug -t user

# Include APML data
esmi_oob_tool -s 0 --showallinfo > /tmp/apml_state.txt

# Read MCA banks
esmi_oob_tool -s 0 --showmcastatus > /tmp/mca_status.txt

Troubleshooting

APML Not Working

# Check I2C bus connectivity
i2cdetect -y 3  # Replace with correct bus

# Verify kernel modules
lsmod | grep apml

# Check dmesg for errors
dmesg | grep -i "sbrmi\|sbtsi\|apml"

# Verify device nodes
ls -la /dev/sbrmi* /dev/sbtsi*

Temperature Reading Fails

# Check hwmon path
ls /sys/class/hwmon/

# Find correct hwmon device
for h in /sys/class/hwmon/hwmon*; do
    name=$(cat $h/name 2>/dev/null)
    echo "$h: $name"
done

# Read directly
cat /sys/class/hwmon/hwmon<N>/temp1_input

Command Timeouts

# APML commands may timeout under heavy load
# Increase timeout in library calls

# Check SMBus errors
dmesg | grep -i "i2c\|smbus"

# Retry with delay
sleep 1 && esmi_oob_tool -s 0 --showpower

Permission Issues

# APML requires root or proper group membership
ls -la /dev/sbrmi0 /dev/sbtsi0

# Add user to appropriate group
usermod -aG i2c <username>

Security Considerations

Access Control

# Restrict APML access to administrators only
chmod 600 /dev/sbrmi* /dev/sbtsi*

# Use D-Bus policy for access control
# /etc/dbus-1/system.d/apml.conf

Audit Logging

# Log APML access
# Integrate with phosphor-logging for audit trail

Production Recommendations

  • Disable HDT access in production (via fuses or GPIO)
  • Restrict APML to essential operations
  • Monitor for anomalous power/thermal changes
  • Log all RAS events

References


Deep Dive

This section provides detailed technical information for developers who want to understand AMD APML internals and SB-RMI/SB-TSI protocols.

APML Communication Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                      APML Sideband Interface Architecture                   │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  BMC                                        AMD EPYC Processor              │
│  ┌────────────────────────────┐             ┌────────────────────────────┐  │
│  │                            │             │                            │  │
│  │  ┌──────────────────────┐  │  I2C/I3C   │  ┌──────────────────────┐  │  │
│  │  │  APML Library        │  │  ──────────│──│  System Management   │  │  │
│  │  │  (esmi_oob_library)  │  │  Bus       │  │  Unit (SMU)          │  │  │
│  │  └──────────┬───────────┘  │             │  └──────────────────────┘  │  │
│  │             │              │             │             │              │  │
│  │  ┌──────────┴───────────┐  │             │  ┌──────────┴───────────┐  │  │
│  │  │  Kernel Driver       │  │             │  │  Sideband Interfaces │  │  │
│  │  │  ┌─────────────────┐ │  │             │  │                      │  │  │
│  │  │  │  apml_sbrmi.ko  │ │  │   I2C      │  │  SB-RMI (0x3C)       │  │  │
│  │  │  │  /dev/sbrmi0    │─│──│───────────>│──│  Remote Management   │  │  │
│  │  │  └─────────────────┘ │  │             │  │  Interface           │  │  │
│  │  │  ┌─────────────────┐ │  │             │  │                      │  │  │
│  │  │  │  apml_sbtsi.ko  │ │  │   I2C      │  │  SB-TSI (0x4C)       │  │  │
│  │  │  │  /dev/sbtsi0    │─│──│───────────>│──│  Thermal Sensor      │  │  │
│  │  │  └─────────────────┘ │  │             │  │  Interface           │  │  │
│  │  └──────────────────────┘  │             │  └──────────────────────┘  │  │
│  │                            │             │                            │  │
│  └────────────────────────────┘             └────────────────────────────┘  │
│                                                                             │
│  I2C Bus Configuration:                                                     │
│  ┌────────────────────────────────────────────────────────────────────────┐ │
│  │  Address  │ Interface │ Function                                       │ │
│  │  ─────────┼───────────┼─────────────────────────────────────────────── │ │
│  │  0x3C     │ SB-RMI    │ Remote management, mailbox commands             │ │
│  │  0x4C     │ SB-TSI    │ Thermal sensor, temperature alerts              │ │
│  │  0x3D     │ SB-RMI    │ Secondary socket (Socket 1)                     │ │
│  │  0x4D     │ SB-TSI    │ Secondary socket (Socket 1)                     │ │
│  └────────────────────────────────────────────────────────────────────────┘ │
│                                                                             │
│  I2C Speed: Standard mode (100 kHz) or Fast mode (400 kHz)                  │
│  Pull-up: 2.2kΩ to 3.3V                                                     │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

SB-RMI Mailbox Protocol

┌─────────────────────────────────────────────────────────────────────────────┐
│                    SB-RMI Mailbox Command Sequence                          │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  Mailbox Architecture:                                                      │
│  ┌────────────────────────────────────────────────────────────────────────┐ │
│  │  SMU (System Management Unit) processes mailbox commands               │ │
│  │  BMC writes command/data, waits for SMU to process, reads response     │ │
│  └────────────────────────────────────────────────────────────────────────┘ │
│                                                                             │
│  Mailbox Transaction Flow:                                                  │
│  ┌────────────────────────────────────────────────────────────────────────┐ │
│  │                                                                        │ │
│  │  Step 1: Check mailbox not busy                                        │ │
│  │  ┌──────────────────────────────────────────────────────────────────┐  │ │
│  │  │  Read SB_RMI_STATUS (0x02)                                       │  │ │
│  │  │  Wait for SwAlertSts bit to clear                                │  │ │
│  │  └──────────────────────────────────────────────────────────────────┘  │ │
│  │                              │                                         │ │
│  │                              v                                         │ │
│  │  Step 2: Write input arguments (if any)                                │ │
│  │  ┌──────────────────────────────────────────────────────────────────┐  │ │
│  │  │  Write to InBndMsg registers (0x50-0x5F)                         │  │ │
│  │  │  Up to 16 bytes of input data                                    │  │ │
│  │  └──────────────────────────────────────────────────────────────────┘  │ │
│  │                              │                                         │ │
│  │                              v                                         │ │
│  │  Step 3: Write command                                                 │ │
│  │  ┌──────────────────────────────────────────────────────────────────┐  │ │
│  │  │  Write command ID to SB_RMI_SW_MAILBOX_CMD (0x3B)                │  │ │
│  │  │  This triggers SMU to process the command                        │  │ │
│  │  └──────────────────────────────────────────────────────────────────┘  │ │
│  │                              │                                         │ │
│  │                              v                                         │ │
│  │  Step 4: Wait for completion                                           │ │
│  │  ┌──────────────────────────────────────────────────────────────────┐  │ │
│  │  │  Poll SB_RMI_STATUS for SwAlertSts bit                           │  │ │
│  │  │  Timeout after ~100ms                                            │  │ │
│  │  └──────────────────────────────────────────────────────────────────┘  │ │
│  │                              │                                         │ │
│  │                              v                                         │ │
│  │  Step 5: Read response                                                 │ │
│  │  ┌──────────────────────────────────────────────────────────────────┐  │ │
│  │  │  Read OutBndMsg registers (0x60-0x6F)                            │  │ │
│  │  │  Read SB_RMI_SW_MAILBOX_STATUS (0x3F) for result code            │  │ │
│  │  └──────────────────────────────────────────────────────────────────┘  │ │
│  │                                                                        │ │
│  └────────────────────────────────────────────────────────────────────────┘ │
│                                                                             │
│  Common SB-RMI Mailbox Commands:                                            │
│  ┌────────────────────────────────────────────────────────────────────────┐ │
│  │  Command ID │ Name               │ Description                         │ │
│  │  ───────────┼────────────────────┼───────────────────────────────────── │ │
│  │  0x01       │ ReadPkgPowerLimit  │ Read socket power limit (TDP cap)   │ │
│  │  0x02       │ WritePkgPowerLimit │ Set socket power limit              │ │
│  │  0x03       │ ReadPkgPowerConsumption │ Read current power draw        │ │
│  │  0x04       │ ReadDDRBandwidth   │ Read memory bandwidth usage         │ │
│  │  0x05       │ ReadBoostLimit     │ Read boost frequency limit          │ │
│  │  0x06       │ WriteBoostLimit    │ Set boost frequency limit           │ │
│  │  0x10       │ ReadMcaData        │ Read Machine Check registers        │ │
│  │  0x11       │ ReadDimmTemp       │ Read DIMM temperature               │ │
│  │  0x12       │ ReadDimmPower      │ Read DIMM power consumption         │ │
│  │  0x20       │ ReadCpuId          │ Read CPU family/model/stepping      │ │
│  └────────────────────────────────────────────────────────────────────────┘ │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

SB-TSI Temperature Sensing

┌─────────────────────────────────────────────────────────────────────────────┐
│                    SB-TSI Thermal Interface Protocol                        │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  Register Map:                                                              │
│  ┌────────────────────────────────────────────────────────────────────────┐ │
│  │  Offset │ Register          │ Access │ Description                     │ │
│  │  ───────┼───────────────────┼────────┼──────────────────────────────── │ │
│  │  0x00   │ CPU_TEMP_INT      │ R      │ CPU temp integer part (°C)      │ │
│  │  0x01   │ STATUS            │ R      │ Status and alert flags          │ │
│  │  0x02   │ CONFIG            │ R/W    │ Configuration register          │ │
│  │  0x03   │ UPDATE_RATE       │ R/W    │ Temperature update rate         │ │
│  │  0x04   │ HI_TEMP_INT       │ R/W    │ High temp threshold integer     │ │
│  │  0x05   │ LO_TEMP_INT       │ R/W    │ Low temp threshold integer      │ │
│  │  0x10   │ CPU_TEMP_DEC      │ R      │ CPU temp decimal part           │ │
│  │  0x11   │ CPU_TEMP_OFFSET_INT │ R/W  │ Temperature offset integer      │ │
│  │  0x13   │ HI_TEMP_DEC       │ R/W    │ High temp threshold decimal     │ │
│  │  0x14   │ LO_TEMP_DEC       │ R/W    │ Low temp threshold decimal      │ │
│  │  0x19   │ TIMEOUT_CONFIG    │ R/W    │ Alert timeout configuration     │ │
│  │  0xFE   │ VENDOR_ID         │ R      │ Vendor ID (0x00 = AMD)          │ │
│  │  0xFF   │ DEVICE_ID         │ R      │ Device ID                       │ │
│  └────────────────────────────────────────────────────────────────────────┘ │
│                                                                             │
│  Temperature Reading:                                                       │
│  ┌────────────────────────────────────────────────────────────────────────┐ │
│  │  Temperature = CPU_TEMP_INT + (CPU_TEMP_DEC × 0.125)                   │ │
│  │                                                                        │ │
│  │  CPU_TEMP_DEC bits[7:5] represent 0.125°C increments:                  │ │
│  │    000 = 0.000°C                                                       │ │
│  │    001 = 0.125°C                                                       │ │
│  │    010 = 0.250°C                                                       │ │
│  │    011 = 0.375°C                                                       │ │
│  │    100 = 0.500°C                                                       │ │
│  │    101 = 0.625°C                                                       │ │
│  │    110 = 0.750°C                                                       │ │
│  │    111 = 0.875°C                                                       │ │
│  │                                                                        │ │
│  │  Example: INT=55, DEC=0x40 (010 in bits[7:5])                          │ │
│  │           Temperature = 55 + 0.250 = 55.25°C                           │ │
│  │                                                                        │ │
│  └────────────────────────────────────────────────────────────────────────┘ │
│                                                                             │
│  Threshold Alert Mechanism:                                                 │
│  ┌────────────────────────────────────────────────────────────────────────┐ │
│  │                                                                        │ │
│  │  Temperature                                                           │ │
│  │       ^                                                                │ │
│  │       │    ┌─── High Threshold (HI_TEMP_INT + HI_TEMP_DEC)             │ │
│  │       │    │    ALERT_H bit set when temp >= threshold                 │ │
│  │  100°C├────┼───────────────────────────────────────────                │ │
│  │       │    │                                                           │ │
│  │       │    │   ~~~~ Actual Temperature ~~~~                            │ │
│  │   75°C├────┤                                                           │ │
│  │       │    │                                                           │ │
│  │       │    └─── Low Threshold (LO_TEMP_INT + LO_TEMP_DEC)              │ │
│  │       │         ALERT_L bit set when temp <= threshold                 │ │
│  │   50°C├─────────────────────────────────────────────────               │ │
│  │       │                                                                │ │
│  │       └──────────────────────────────────────────────> Time            │ │
│  │                                                                        │ │
│  │  STATUS Register (0x01) flags:                                         │ │
│  │    Bit 0: ALERT_L - Temperature at or below low threshold              │ │
│  │    Bit 1: ALERT_H - Temperature at or above high threshold             │ │
│  │    Bit 7: Busy - Temperature conversion in progress                    │ │
│  │                                                                        │ │
│  └────────────────────────────────────────────────────────────────────────┘ │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

AMD MCA (Machine Check Architecture) Error Handling

┌─────────────────────────────────────────────────────────────────────────────┐
│                    AMD MCA Error Collection via APML                        │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  MCA Bank Layout (AMD Zen Architecture):                                    │
│  ┌────────────────────────────────────────────────────────────────────────┐ │
│  │  Bank │ Description                    │ Error Types                   │ │
│  │  ─────┼────────────────────────────────┼─────────────────────────────── │ │
│  │  0    │ Load-Store Unit                │ Cache/TLB errors              │ │
│  │  1    │ Instruction Fetch Unit         │ IC/IT errors                  │ │
│  │  2    │ L2 Cache Unit                  │ L2 tag/data errors            │ │
│  │  3    │ Decode Unit                    │ Microcode/instruction errors  │ │
│  │  4    │ Reserved                       │                               │ │
│  │  5    │ Execution Unit                 │ FP/INT execution errors       │ │
│  │  6    │ Floating Point Unit            │ FP register file errors       │ │
│  │  7-15 │ Core-specific banks            │ Varies by generation          │ │
│  │  16+  │ Uncore banks (UMC, DF, etc.)  │ Memory, fabric errors         │ │
│  └────────────────────────────────────────────────────────────────────────┘ │
│                                                                             │
│  Reading MCA via APML Mailbox:                                              │
│  ┌────────────────────────────────────────────────────────────────────────┐ │
│  │  Command: ReadMcaData (0x10)                                           │ │
│  │                                                                        │ │
│  │  Input:                                                                │ │
│  │    InBndMsg[0]: Core/Thread ID                                         │ │
│  │    InBndMsg[1]: MCA Bank number                                        │ │
│  │    InBndMsg[2]: Register offset (0=STATUS, 1=ADDR, 2=MISC, etc.)       │ │
│  │                                                                        │ │
│  │  Output:                                                               │ │
│  │    OutBndMsg[0-7]: 64-bit register value                               │ │
│  │                                                                        │ │
│  │  Example reading MC0_STATUS:                                           │ │
│  │    esmi_oob_tool -s 0 --readmsr --core 0 --bank 0 --offset 0           │ │
│  │                                                                        │ │
│  └────────────────────────────────────────────────────────────────────────┘ │
│                                                                             │
│  MCA_STATUS Register Format (AMD Zen):                                      │
│  ┌────────────────────────────────────────────────────────────────────────┐ │
│  │  Bit   │ Field    │ Description                                        │ │
│  │  ──────┼──────────┼─────────────────────────────────────────────────── │ │
│  │  63    │ Val      │ Error information valid                            │ │
│  │  62    │ Overflow │ Multiple errors, information may be lost           │ │
│  │  61    │ UC       │ Uncorrected error                                  │ │
│  │  60    │ En       │ Error reporting enabled                            │ │
│  │  59    │ MiscV    │ MCA_MISC register valid                            │ │
│  │  58    │ AddrV    │ MCA_ADDR register valid                            │ │
│  │  57    │ PCC      │ Processor context corrupt                          │ │
│  │  56    │ TCC      │ Task context corrupt                               │ │
│  │  55    │ Poison   │ Poisoned data consumed                             │ │
│  │  54    │ Deferred │ Deferred error (reported later)                    │ │
│  │  47:32 │ ExtErrCode │ Extended error code                              │ │
│  │  31:16 │ ModelErrCode │ Model-specific error code                      │ │
│  │  15:0  │ ErrCode  │ Error code type                                    │ │
│  └────────────────────────────────────────────────────────────────────────┘ │
│                                                                             │
│  Error Types (ErrCode bits 15:0):                                           │ │
│  ┌────────────────────────────────────────────────────────────────────────┐ │
│  │  Pattern     │ Type              │ Description                         │ │
│  │  ────────────┼───────────────────┼──────────────────────────────────── │ │
│  │  0000 0000   │ No error          │                                     │ │
│  │  0000 0001   │ Unclassified      │ Internal parity/protocol error      │ │
│  │  0000 0101   │ Internal parity   │ Core internal error                 │ │
│  │  000x xxxx   │ TLB errors        │ TLB parity/multimatch               │ │
│  │  001x xxxx   │ Memory errors     │ Cache hierarchy errors              │ │
│  │  010x xxxx   │ Bus/Interconnect  │ Data fabric/IF errors               │ │
│  │  1xxx xxxx   │ Memory controller │ DRAM/UMC errors                     │ │
│  └────────────────────────────────────────────────────────────────────────┘ │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Source Code References

Component Repository Key Files
APML Kernel Modules amd/apml_modules sbrmi/apml_sbrmi.c, sbtsi/apml_sbtsi.c
E-SMI OOB Library amd/esmi_oob_library src/esmi_oob.c, src/apml.c
E-SMI In-Band Library amd/esmi_ib_library src/e_smi.c
AMDESE OpenBMC AMDESE/OpenBMC meta-amd/
hwmon sbtsi driver linux/drivers/hwmon drivers/hwmon/sbtsi_temp.c

Platform: AMD EPYC 7002 (Rome), 7003 (Milan), 9004 (Genoa), and newer platforms. APML requires Family 17h (Zen) or later processors.


Back to top

OpenBMC Guide Tutorial is not affiliated with the OpenBMC project. Content is provided for educational purposes.

This site uses Just the Docs, a documentation theme for Jekyll.