Watchdog Guide

Configure host watchdog timer on OpenBMC.

Table of Contents

  1. Overview
  2. Setup & Configuration
    1. Build-Time Configuration
    2. Configuration Paths
    3. Customizing Watchdog Defaults via Yocto
    4. Expiration Actions
  3. Configuring Watchdog
    1. Via D-Bus
    2. Via IPMI
    3. Via Redfish
  4. Watchdog Operation
  5. Troubleshooting
  6. Deep Dive
    1. Watchdog Timer State Machine
    2. Timer Implementation Architecture
    3. IPMI Watchdog Timer Commands
    4. Recovery Action Sequence
    5. Source Code Reference
  7. Examples
  8. References

Overview

phosphor-watchdog monitors host health and triggers recovery actions if the host becomes unresponsive.

flowchart TB
    subgraph Watchdog["Watchdog Architecture"]
        direction TB
        
        subgraph HostSystem["Host System"]
            WatchdogAgent["Watchdog<br/>Agent"]
        end
        
        subgraph PhosphorWatchdog["phosphor-watchdog"]
            direction TB
            subgraph Properties["Properties"]
                direction LR
                Enabled["Enabled<br/>(bool)"]
                Interval["Interval<br/>(usec)"]
            end
            ExpireAction["ExpireAction<br/>(None/HardReset/PowerOff)"]
        end
        
        StateMgr["State Manager<br/>(Execute recovery action)"]
        
        WatchdogAgent -->|"Periodic 'kick'"| PhosphorWatchdog
        PhosphorWatchdog -->|"Timeout expired"| StateMgr
    end

Setup & Configuration

Build-Time Configuration

# Include watchdog
IMAGE_INSTALL:append = " phosphor-watchdog"

# Configure Meson options
EXTRA_OEMESON:pn-phosphor-watchdog = " \
    -Ddefault-action=HardReset \
    -Ddefault-timeout=300 \
"

Configuration Paths

phosphor-watchdog is configured primarily through Meson build options and runtime D-Bus properties. There is no standalone JSON config file.

Configuration Where Description
Default timeout Meson -Ddefault-timeout=300 Initial countdown in seconds
Default action Meson -Ddefault-action=HardReset Action on expiration
Service template /lib/systemd/system/phosphor-watchdog@.service systemd unit template
D-Bus policy /etc/dbus-1/system.d/phosphor-watchdog.conf D-Bus access control
Runtime state D-Bus properties (no file) Managed via D-Bus at runtime

Customizing Watchdog Defaults via Yocto

To change the default timeout or action for your platform:

# meta-myplatform/recipes-phosphor/watchdog/
# └── phosphor-watchdog_%.bbappend

cat > phosphor-watchdog_%.bbappend << 'EOF'
# Set platform-specific defaults
EXTRA_OEMESON:append = " \
    -Ddefault-action=PowerCycle \
    -Ddefault-timeout=600 \
"
EOF

To add a custom systemd override for the watchdog service:

# meta-myplatform/recipes-phosphor/watchdog/
# └── files/
# │   └── override.conf
# └── phosphor-watchdog_%.bbappend

cat > phosphor-watchdog_%.bbappend << 'EOF'
FILESEXTRAPATHS:prepend := "${THISDIR}/files:"
SRC_URI += "file://override.conf"

do_install:append() {
    install -d ${D}${systemd_system_unitdir}/phosphor-watchdog@watchdog-host0.service.d
    install -m 0644 ${WORKDIR}/override.conf \
        ${D}${systemd_system_unitdir}/phosphor-watchdog@watchdog-host0.service.d/
}
EOF

Expiration Actions

Action Description
None Log event only
HardReset Force power cycle
PowerOff Power off system
PowerCycle Power off then on

Configuring Watchdog

Via D-Bus

# Enable watchdog
busctl set-property xyz.openbmc_project.Watchdog \
    /xyz/openbmc_project/watchdog/host0 \
    xyz.openbmc_project.State.Watchdog \
    Enabled b true

# Set timeout (microseconds)
busctl set-property xyz.openbmc_project.Watchdog \
    /xyz/openbmc_project/watchdog/host0 \
    xyz.openbmc_project.State.Watchdog \
    Interval t 300000000

# Set expiration action
busctl set-property xyz.openbmc_project.Watchdog \
    /xyz/openbmc_project/watchdog/host0 \
    xyz.openbmc_project.State.Watchdog \
    ExpireAction s "xyz.openbmc_project.State.Watchdog.Action.HardReset"

Via IPMI

# Get watchdog status
ipmitool mc watchdog get

# Set watchdog timeout
ipmitool mc watchdog set timeout 300

# Enable watchdog
ipmitool mc watchdog set action hard_reset

# Reset (kick) watchdog
ipmitool mc watchdog reset

Via Redfish

# Configure watchdog via HostWatchdogTimer
curl -k -u root:0penBmc -X PATCH \
    -H "Content-Type: application/json" \
    -d '{
        "HostWatchdogTimer": {
            "FunctionEnabled": true,
            "TimeoutAction": "ResetSystem",
            "WarningAction": "None"
        }
    }' \
    https://localhost/redfish/v1/Systems/system

Watchdog Operation

Host Boot → Watchdog Enabled → Host kicks watchdog periodically
                                    ↓
                            Timeout expires?
                                    ↓
                            Execute ExpireAction

Troubleshooting

# Check watchdog service
systemctl status phosphor-watchdog@watchdog-host0

# View watchdog state
busctl introspect xyz.openbmc_project.Watchdog \
    /xyz/openbmc_project/watchdog/host0

Deep Dive

Advanced implementation details for watchdog timer developers.

Watchdog Timer State Machine

---
title: Watchdog Timer State Machine
---
stateDiagram-v2
    [*] --> DISABLED

    DISABLED --> RUNNING : Enable=true

    RUNNING --> RUNNING : Timer Reset (Kick)
    RUNNING --> EXPIRED : TimeRemaining == 0

    EXPIRED --> DISABLED : Disable
    EXPIRED --> ActionNone : Action: None
    EXPIRED --> ActionReset : Action: Reset

    state ActionNone {
        [*] --> LogEvent
        LogEvent --> ReturnDisabled
        ReturnDisabled --> [*]
    }

    state ActionReset {
        [*] --> PowerCycle
        PowerCycle --> StateManager
        StateManager --> [*]
    }

    note right of DISABLED : Enabled=false
    note right of RUNNING : Enabled=true\nTimeRemaining counting down
    note right of EXPIRED : Execute ExpireAction

Timer Properties:

Property Description
Interval Total countdown time (microseconds)
TimeRemaining Current time left (decrements each tick)
Enabled Whether timer is active
ExpireAction What to do on timeout (None, HardReset, PowerOff, etc.)
CurrentTimerUse BIOS_FRB2, BIOS_POST, OS_LOAD, SMS_OS, OEM
ASCII-art version (for comparison)
┌────────────────────────────────────────────────────────────────────────────┐
│                      Watchdog Timer State Machine                          │
├────────────────────────────────────────────────────────────────────────────┤
│                                                                            │
│                          ┌─────────────────────────────────────┐           │
│                          │                                     │           │
│              Enable=true │                                     │ Timer     │
│            ┌─────────────┴───────────┐                         │ Reset     │
│            │                         │                         │ (Kick)    │
│            v                         │                         │           │
│     ┌──────────────┐           ┌─────┴──────────┐              │           │
│     │   DISABLED   │           │    RUNNING     │<─────────────┘           │
│     │              │           │                │                          │
│     │  Enabled=    │  Enable   │  Enabled=true  │                          │
│     │  false       │──────────>│  TimeRemaining │                          │
│     │              │           │  counting down │                          │
│     └──────────────┘           └───────┬────────┘                          │
│            ^                           │                                   │
│            │                           │ TimeRemaining == 0                │
│            │                           v                                   │
│            │                   ┌───────────────┐                           │
│            │                   │   EXPIRED     │                           │
│            │       Disable     │               │                           │
│            └───────────────────┤  Execute      │                           │
│                                │  ExpireAction │                           │
│                                └───────┬───────┘                           │
│                                        │                                   │
│                    ┌───────────────────┴───────────────────┐               │
│                    │                                       │               │
│                    v                                       v               │
│          ┌─────────────────┐                    ┌─────────────────┐        │
│          │  Action: None   │                    │ Action: Reset   │        │
│          │                 │                    │                 │        │
│          │  Log event only │                    │ Power cycle     │        │
│          │  Return to      │                    │ Request via     │        │
│          │  DISABLED       │                    │ State Manager   │        │
│          └─────────────────┘                    └─────────────────┘        │
│                                                                            │
│  TIMER PROPERTIES:                                                         │
│  ─────────────────                                                         │
│                                                                            │
│  Interval:      Total countdown time (microseconds)                        │
│  TimeRemaining: Current time left (decrements each tick)                   │
│  Enabled:       Whether timer is active                                    │
│  ExpireAction:  What to do on timeout (None, HardReset, PowerOff, etc.)    │
│  CurrentTimerUse: BIOS_FRB2, BIOS_POST, OS_LOAD, SMS_OS, OEM               │
│                                                                            │
└────────────────────────────────────────────────────────────────────────────┘

Timer Implementation Architecture

┌────────────────────────────────────────────────────────────────────────────┐
│                    Phosphor-Watchdog Timer Implementation                  │
├────────────────────────────────────────────────────────────────────────────┤
│                                                                            │
│  HOST AGENT (BIOS/OS)                                                      │
│  ────────────────────                                                      │
│        │                                                                   │
│        │ IPMI: Set Watchdog Timer (0x24)                                   │
│        │       Reset Watchdog Timer (0x22)                                 │
│        v                                                                   │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                    phosphor-ipmi-host                               │   │
│  │                                                                     │   │
│  │  ipmid_watchdog.cpp                                                 │   │
│  │  ├── ipmi_watchdog_set()   → D-Bus properties                       │   │
│  │  └── ipmi_watchdog_reset() → D-Bus ResetTimeRemaining()             │   │
│  └────────────────────────────────────────────────────────────────────-┘   │
│        │                                                                   │
│        │ D-Bus                                                             │
│        v                                                                   │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                    phosphor-watchdog                                │   │
│  │                                                                     │   │
│  │  D-Bus Object: /xyz/openbmc_project/watchdog/host0                  │   │
│  │  Service:      xyz.openbmc_project.Watchdog                         │   │
│  │                                                                     │   │
│  │  ┌───────────────────────────────────────────────────────────────┐  │   │
│  │  │  Watchdog Class (watchdog.cpp)                                │  │   │
│  │  │                                                               │  │   │
│  │  │  Properties:                                                  │  │   │
│  │  │    Enabled: bool              ExpireAction: enum              │  │   │
│  │  │    Interval: uint64_t (usec)  TimeRemaining: uint64_t         │  │   │
│  │  │    CurrentTimerUse: enum      TimerUseExpirationFlags: byte   │  │   │
│  │  │                                                               │  │   │
│  │  │  Timer Loop:                                                  │  │   │
│  │  │    while (enabled && timeRemaining > 0) {                     │  │   │
│  │  │        timerfd_settime(fd, 0, &timerspec);                    │  │   │
│  │  │        epoll_wait(epfd, events, 1, -1);                       │  │   │
│  │  │        timeRemaining -= elapsed;                              │  │   │
│  │  │    }                                                          │  │   │
│  │  │    if (timeRemaining == 0) doExpiration();                    │  │   │
│  │  └───────────────────────────────────────────────────────────────┘  │   │
│  └────────────────────────────────────────────────────────────────────-┘   │
│        │                                                                   │
│        │ Expiration triggers D-Bus method call                             │
│        v                                                                   │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                    phosphor-state-manager                           │   │
│  │                                                                     │   │
│  │  D-Bus Object: /xyz/openbmc_project/state/host0                     │   │
│  │                                                                     │   │
│  │  Methods triggered by watchdog expiration:                          │   │
│  │    RequestedHostTransition = ForceOff    (PowerOff action)          │   │
│  │    RequestedHostTransition = Reboot      (HardReset action)         │   │
│  │    RequestedHostTransition = GracefulOff (PowerCycle action)        │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                            │
└────────────────────────────────────────────────────────────────────────────┘

IPMI Watchdog Timer Commands

┌────────────────────────────────────────────────────────────────────────────┐
│                      IPMI Watchdog Protocol Details                        │
├────────────────────────────────────────────────────────────────────────────┤
│                                                                            │
│  SET WATCHDOG TIMER (0x24)                                                 │
│  ─────────────────────────                                                 │
│                                                                            │
│  Request:                                                                  │
│  ┌─────────┬──────────────────────────────────────────────────────────┐    │
│  │ Byte    │ Description                                              │    │
│  ├─────────┼──────────────────────────────────────────────────────────┤    │
│  │ 1       │ Timer Use:                                               │    │
│  │         │   [6:0] Timer Use (1=BIOS_FRB2, 2=BIOS_POST,             │    │
│  │         │         3=OS_LOAD, 4=SMS_OS, 5=OEM)                      │    │
│  │         │   [7]   Don't log (0=log, 1=don't log)                   │    │
│  ├─────────┼──────────────────────────────────────────────────────────┤    │
│  │ 2       │ Timer Actions:                                           │    │
│  │         │   [2:0] Timeout action (0=None, 1=HardReset,             │    │
│  │         │         2=PowerDown, 3=PowerCycle)                       │    │
│  │         │   [6:4] Pre-timeout interrupt (0=None, 1=SMI,            │    │
│  │         │         2=NMI, 3=Messaging)                              │    │
│  ├─────────┼──────────────────────────────────────────────────────────┤    │
│  │ 3       │ Pre-timeout interval (seconds)                           │    │
│  ├─────────┼──────────────────────────────────────────────────────────┤    │
│  │ 4       │ Timer Use Expiration Flags Clear                         │    │
│  │         │   [0] BIOS_FRB2  [1] BIOS_POST  [2] OS_LOAD              │    │
│  │         │   [3] SMS_OS     [4] OEM                                 │    │
│  ├─────────┼──────────────────────────────────────────────────────────┤    │
│  │ 5-6     │ Initial countdown value (100ms units, LSB first)         │    │
│  └─────────┴──────────────────────────────────────────────────────────┘    │
│                                                                            │
│  GET WATCHDOG TIMER (0x25)                                                 │
│  ─────────────────────────                                                 │
│                                                                            │
│  Response:                                                                 │
│  ┌─────────┬──────────────────────────────────────────────────────────┐    │
│  │ Byte    │ Description                                              │    │
│  ├─────────┼──────────────────────────────────────────────────────────┤    │
│  │ 1       │ Completion code                                          │    │
│  │ 2       │ Timer Use (same format as set)                           │    │
│  │ 3       │ Timer Actions (same format as set)                       │    │
│  │ 4       │ Pre-timeout interval                                     │    │
│  │ 5       │ Timer Use Expiration Flags (set on expiration)           │    │
│  │ 6-7     │ Initial countdown value                                  │    │
│  │ 8-9     │ Present countdown value (current time remaining)         │    │
│  └─────────┴──────────────────────────────────────────────────────────┘    │
│                                                                            │
│  RESET WATCHDOG TIMER (0x22)                                               │
│  ───────────────────────────                                               │
│  No data bytes - resets countdown to initial value                         │
│  Returns error 0x80 if timer not initialized (never set)                   │
│                                                                            │
│  Example ipmitool Commands:                                                │
│  ──────────────────────────                                                │
│                                                                            │
│  # Set 5-minute timer with hard reset action, OS watchdog                  │
│  ipmitool raw 0x06 0x24 0x44 0x01 0x00 0x10 0xB8 0x0B                      │
│  #                      │    │    │    │    └─────┘                        │
│  #                      │    │    │    │    3000 = 5 min (100ms units)     │
│  #                      │    │    │    └─ Clear flags                      │
│  #                      │    │    └────── No pre-timeout                   │
│  #                      │    └─────────── HardReset action                 │
│  #                      └──────────────── SMS_OS timer use                 │
│                                                                            │
│  # Kick (reset) watchdog                                                   │
│  ipmitool raw 0x06 0x22                                                    │
│                                                                            │
│  # Get current status                                                      │
│  ipmitool raw 0x06 0x25                                                    │
│                                                                            │
└────────────────────────────────────────────────────────────────────────────┘

Recovery Action Sequence

┌────────────────────────────────────────────────────────────────────────────┐
│                     Watchdog Expiration Recovery Sequence                  │
├────────────────────────────────────────────────────────────────────────────┤
│                                                                            │
│  WATCHDOG EXPIRES (TimeRemaining reaches 0)                                │
│           │                                                                │
│           v                                                                │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │  1. Log Expiration Event                                            │   │
│  │     ─────────────────────                                           │   │
│  │     sd_journal_print(LOG_ERR, "Watchdog expired: %s",               │   │
│  │                      getCurrentTimerUseString());                   │   │
│  │                                                                     │   │
│  │     // Set expiration flag for this timer use                       │   │
│  │     timerUseExpirationFlags |= (1 << currentTimerUse);              │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│           │                                                                │
│           v                                                                │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │  2. Generate SEL Entry                                              │   │
│  │     ─────────────────────                                           │   │
│  │     IPMI SEL Event:                                                 │   │
│  │       Sensor Type: 0x23 (Watchdog 2)                                │   │
│  │       Event Type:  0x6F (Sensor-specific)                           │   │
│  │       Event Data1: 0x00 (Timer expired)                             │   │
│  │       Event Data2: Current timer use                                │   │
│  │       Event Data3: Timeout action                                   │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│           │                                                                │
│           v                                                                │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │  3. Execute Expiration Action                                       │   │
│  │     ──────────────────────────                                      │   │
│  │                                                                     │   │
│  │     switch (expireAction) {                                         │   │
│  │                                                                     │   │
│  │     case None:                                                      │   │
│  │         // Just log, no action                                      │   │
│  │         break;                                                      │   │
│  │                                                                     │   │
│  │     case HardReset:                                                 │   │
│  │         // Immediate power cycle                                    │   │
│  │         host.RequestedHostTransition(Host::Transition::Reboot);     │   │
│  │         // GPIO chassis power control                               │   │
│  │         break;                                                      │   │
│  │                                                                     │   │
│  │     case PowerOff:                                                  │   │
│  │         // Power off without restart                                │   │
│  │         host.RequestedHostTransition(Host::Transition::Off);        │   │
│  │         break;                                                      │   │
│  │                                                                     │   │
│  │     case PowerCycle:                                                │   │
│  │         // Power off, wait, power on                                │   │
│  │         host.RequestedHostTransition(Host::Transition::Off);        │   │
│  │         sleep(power_cycle_delay);  // e.g., 10 seconds              │   │
│  │         host.RequestedHostTransition(Host::Transition::On);         │   │
│  │         break;                                                      │   │
│  │     }                                                               │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│           │                                                                │
│           v                                                                │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │  4. GPIO Power Sequencing (for HardReset)                           │   │
│  │     ────────────────────────────────────                            │   │
│  │                                                                     │   │
│  │     Time ─────────────────────────────────────────>                 │   │
│  │                                                                     │   │
│  │     POWER_BUTTON   ────┐               ┌────────                    │   │
│  │     (assert)           └───────────────┘                            │   │
│  │                         Hold for       Release                      │   │
│  │                         4+ seconds     (force off)                  │   │
│  │                                                                     │   │
│  │     POWER_GOOD     ────┐                       ┌────                │   │
│  │     (monitor)          └───────────────────────┘                    │   │
│  │                              Power off         Power restored       │   │
│  │                                                                     │   │
│  │     POST_COMPLETE  ────┐                               ┌────        │   │
│  │     (monitor)          └───────────────────────────────┘            │   │
│  │                              Cleared                   Set by BIOS  │   │
│  │                                                                     │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                            │
└────────────────────────────────────────────────────────────────────────────┘

Source Code Reference

Key implementation files in phosphor-watchdog:

File Description
watchdog.cpp Main watchdog class with timer loop and expiration handling
watchdog.hpp Watchdog class definition with D-Bus property implementations
watchdog_main.cpp Service entry point and D-Bus object creation
phosphor-host-ipmid/ipmid_watchdog.cpp IPMI command handlers (Set/Get/Reset)

Examples

Working examples are available in the examples/watchdog directory:

  • watchdog-config.json - Watchdog timer configuration
  • watchdog-test.sh - Test script for watchdog functionality

References


Tested on: OpenBMC master, QEMU romulus


Back to top

OpenBMC Guide Tutorial is not affiliated with the OpenBMC project. Content is provided for educational purposes.

This site uses Just the Docs, a documentation theme for Jekyll.