Performance Optimization Guide

Profile and optimize boot time, memory usage, and runtime performance on resource-constrained BMC hardware.

Overview
BMC Resource Constraints
Boot Time Profiling
Runtime Memory Profiling
Optimization Strategies
Kernel Configuration Optimization
Optimization Workflow
Troubleshooting
Quick Reference
References

Overview

BMC systems-on-chip (SoCs) operate under tight resource constraints. A typical AST2600 provides a dual-core ARM Cortex-A7 at 800 MHz with 512 MB–1 GB of DDR4 RAM and 32–128 MB of SPI NOR flash. Every megabyte of RAM and every second of boot time matters because the BMC must be operational before the host system can power on.

This guide walks you through profiling boot time, tracking runtime memory consumption, and applying concrete optimization strategies to keep your OpenBMC image within budget.

Key concepts covered:

Hardware resource budgets and how to allocate them across services
Boot time profiling with systemd-analyze and critical-chain analysis
Runtime memory profiling with /proc/smaps, RSS tracking, and leak detection
Optimization strategies: lazy initialization, D-Bus batching, Python-to-C++ migration
Kernel configuration tuning to reduce image size and memory footprint

Optimization work should be guided by measurements, not assumptions. Always profile before and after each change to verify the impact.

BMC Resource Constraints

Hardware Budgets by SoC Generation

Resource	AST2500 (G5)	AST2600 (G6)	Notes
CPU	ARM1176 single-core 800 MHz	Cortex-A7 dual-core 800 MHz	G6 adds SMP
RAM	256–512 MB DDR3	512 MB–1 GB DDR4	Shared with video engine
SPI Flash	32–64 MB	32–128 MB	Holds kernel + rootfs + U-Boot
Root FS	~20 MB (typical)	~30 MB (typical)	Compressed squashfs or UBI

Typical Memory Budget

On a running AST2600 system with a standard OpenBMC image:

┌──────────────────────────────────────────────────────────┐
│              Typical RAM Usage (~512 MB system)          │
├──────────────────────────────────────────────────────────┤
│                                                          │
│  Kernel + modules + slab caches        ~60--80 MB        │
│  Core BMC services                     ~80--120 MB       │
│  ├── bmcweb (Redfish)                  ~15--25 MB RSS    │
│  ├── entity-manager                    ~8--15 MB RSS     │
│  ├── ipmid                             ~10--15 MB RSS    │
│  ├── phosphor-logging                  ~8--12 MB RSS     │
│  └── Other services (~20 daemons)      ~40--55 MB RSS    │
│  systemd + journald + tmpfs            ~25--50 MB        │
│  Free / available                      ~260--350 MB      │
│                                                          │
└──────────────────────────────────────────────────────────┘

These numbers are approximate and vary by image configuration, number of sensors, and host platform complexity. Always measure your specific build.

Per-Service Memory Targets

Service Category	Target RSS	Concern Threshold
Small daemon (state-manager, led-manager)	< 10 MB	> 15 MB
Medium daemon (entity-manager, ipmid)	< 15 MB	> 25 MB
Large daemon (bmcweb, phosphor-logging)	< 25 MB	> 40 MB
Python services (if any)	< 20 MB	> 30 MB

Boot Time Profiling

A BMC that boots slowly delays host power-on and extends maintenance windows. Reaching the Redfish endpoint within 30–60 seconds of power application is a common target.

systemd-analyze and Critical Chain

# Total boot time breakdown
systemd-analyze
# Startup finished in 3.512s (kernel) + 18.234s (userspace) = 21.746s

# Show the critical boot path (longest dependency chain)
systemd-analyze critical-chain
# multi-user.target @18.102s
# └─phosphor-bmc-state-ready.target @18.050s
#   └─mapper-wait@-xyz-openbmc_project-state-bmc.service @17.892s +145ms
#     └─xyz.openbmc_project.State.BMC.service @15.234s +2.651s
#       └─phosphor-mapper.service @12.891s +2.310s
#         └─dbus.service @3.456s +120ms

Each line shows the service, its activation timestamp (@), and how long it took to start (+).

Blame Analysis

# Show services ordered by startup time
systemd-analyze blame
# 4.210s phosphor-read-eeprom@...service
# 3.891s entity-manager.service
# 2.651s xyz.openbmc_project.State.BMC.service
# 2.310s phosphor-mapper.service
# 1.892s bmcweb.service

Focus on the top 5–10 services in the blame list. Optimizing a service that takes 100 ms has negligible impact compared to one that takes 4 seconds.

SVG Boot Chart and Kernel Time

# Generate visual boot chart (copy to host for viewing)
systemd-analyze plot > /tmp/boot-chart.svg
scp -P 2222 root@localhost:/tmp/boot-chart.svg .

# Measure kernel boot time from dmesg timestamps
dmesg | grep "Run /sbin/init"
# [    3.456789] Run /sbin/init as init process

Automating Boot Time Tracking

#!/bin/bash
# boot-metrics.sh - Run on the BMC after each boot
echo "=== Boot Metrics $(date) ==="
systemd-analyze
echo "--- Critical Chain ---"
systemd-analyze critical-chain --no-pager
echo "--- Top 15 Blame ---"
systemd-analyze blame --no-pager | head -15
echo "--- Memory ---"
free -m

Runtime Memory Profiling

Per-Process RSS Tracking

Resident Set Size (RSS) measures the actual physical memory a process occupies:

# List all processes sorted by RSS (descending)
ps aux --sort=-rss | head -20

# Track RSS of key OpenBMC services
for svc in bmcweb phosphor-log-manager entity-manager ipmid; do
    pid=$(pidof $svc 2>/dev/null)
    if [ -n "$pid" ]; then
        rss=$(awk '/VmRSS/{print $2}' /proc/$pid/status)
        printf "%-30s PID=%-6s RSS=%s kB\n" "$svc" "$pid" "$rss"
    fi
done

Detailed Memory Maps with smaps

The /proc/<pid>/smaps_rollup file distinguishes shared libraries from private heap:

pid=$(pidof bmcweb)
cat /proc/$pid/smaps_rollup
# Rss:               18432 kB
# Pss:               12288 kB    <-- Proportional share (accounts for sharing)
# Private_Dirty:      8704 kB    <-- Heap + private data (primary target)
# Shared_Clean:       8192 kB    <-- Shared library text

Field	Meaning	Optimization Target
`Rss`	Total physical memory	Reduce overall footprint
`Pss`	Proportional share	More accurate for shared libs
`Private_Dirty`	Heap + writable private data	Primary optimization target
`Shared_Clean`	Shared library code pages	Reduce by linking fewer libs

Tracking Memory Over Time

Monitor a service for memory growth that might indicate a leak:

#!/bin/bash
# mem-track.sh <process-name> <interval-seconds> <count>
PROC=$1; INTERVAL=${2:-10}; COUNT=${3:-60}
echo "timestamp,pid,rss_kb,vsz_kb"
for i in $(seq 1 $COUNT); do
    pid=$(pidof $PROC 2>/dev/null)
    if [ -n "$pid" ]; then
        rss=$(awk '/VmRSS/{print $2}' /proc/$pid/status)
        vsz=$(awk '/VmSize/{print $2}' /proc/$pid/status)
        echo "$(date +%s),$pid,$rss,$vsz"
    fi
    sleep $INTERVAL
done

A steadily increasing RSS over hours or days strongly suggests a memory leak. Occasional small increases are normal due to caching, but monotonic growth requires investigation.

Detecting Memory Leaks

For leak detection without recompiling, use Valgrind:

systemctl stop bmcweb
valgrind --tool=memcheck --leak-check=full \
    --show-leak-kinds=definite,possible \
    --log-file=/tmp/valgrind-bmcweb.log \
    /usr/bin/bmcweb &

# Exercise the service, then stop and examine
kill %1
cat /tmp/valgrind-bmcweb.log | grep "definitely lost"

For compile-time leak detection, use AddressSanitizer. See the Linux Debug Tools Guide for details.

Optimization Strategies

Strategy 1: Lazy Initialization

Many services load all data at startup. Lazy initialization defers work until first use.

Before (eager):

SensorManager(sdbusplus::bus_t& bus) : bus_(bus) {
    loadAllSensorConfigs();        // 200ms
    initializeAllThresholds();     // 150ms
    registerAllDbusObjects();      // 300ms
}

After (lazy):

SensorManager(sdbusplus::bus_t& bus) : bus_(bus) {
    registerMinimalDbusInterface();  // 50ms
}
void ensureInitialized(const std::string& sensorPath) {
    if (initialized_.count(sensorPath) == 0) {
        loadSensorConfig(sensorPath);
        initializeThreshold(sensorPath);
        initialized_.insert(sensorPath);
    }
}

Impact: Reduces service startup time. The trade-off is a small latency on first access to each resource.

Strategy 2: D-Bus Communication Patterns

D-Bus round trips are expensive on BMC hardware. Each call involves serialization, context switches, and deserialization.

Batch property reads:

// Bad: N individual GetObject calls
for (const auto& path : sensorPaths) {
    bus.call(bus.new_method_call("xyz.openbmc_project.ObjectMapper", ...));
}
// Better: Single GetSubTree call
auto subtree = mapper::getSubTree(bus, "/xyz/openbmc_project/sensors",
                                   interfaces, depth);

Use PropertiesChanged signals instead of polling:

// Bad: Polling every second
while (running) { value = readPropertyFromDbus(path, "Value"); sleep(1); }

// Better: React to changes
sdbusplus::bus::match_t match(bus,
    sdbusplus::bus::match::rules::propertiesChanged(
        sensorPath, "xyz.openbmc_project.Sensor.Value"),
    [](sdbusplus::message_t& msg) { /* handle change */ });

Strategy 3: Python-to-C++ Migration

Python services consume significantly more resources than equivalent C++ implementations:

Metric	Python Service	C++ Equivalent	Improvement
RSS at idle	15–25 MB	3–8 MB	3–5x less RAM
Startup time	1–3 s	0.1–0.5 s	3–10x faster
CPU per D-Bus call	~2 ms	~0.2 ms	~10x faster

Migration is a significant effort. Target services that run persistently and handle high-frequency events. One-time scripts are fine to leave in Python.

Strategy 4: Image Size Reduction

# In local.conf or your distro configuration
IMAGE_FEATURES:remove = "debug-tweaks tools-debug tools-profile"
INHIBIT_PACKAGE_STRIP = "0"
DEBUG_BUILD = "0"
FULL_OPTIMIZATION = "-O2 -pipe"

# Disable optional features in large services
EXTRA_OEMESON:append:pn-bmcweb = " -Dredfish-dbus-log=disabled"

Strategy 5: Reduce Service Count

Each daemon consumes baseline RSS for its runtime, D-Bus connection, and event loop. Remove services you do not need:

Service	When to Remove	Savings
`phosphor-ipmi-host`	No IPMI over KCS/BT required	~10–15 MB
`phosphor-ipmi-net`	No IPMI-over-LAN required	~8–12 MB
`obmc-ikvm`	No remote KVM feature	~5–10 MB
`phosphor-certificate-manager`	No TLS certificate management	~5–8 MB
`phosphor-post-code-manager`	No POST code logging	~4–6 MB

# Remove IPMI if using Redfish only
IMAGE_INSTALL:remove = "phosphor-ipmi-host phosphor-ipmi-net ipmitool"

Kernel Configuration Optimization

The default AST2600 defconfig includes many features useful for development but unnecessary in production.

Disabling Debug Options

# production-strip-debug.cfg

# CONFIG_DEBUG_KERNEL is not set
# CONFIG_DEBUG_INFO is not set
# CONFIG_DEBUG_FS is not set
# CONFIG_PROVE_LOCKING is not set
# CONFIG_DEBUG_LOCK_ALLOC is not set
# CONFIG_DEBUG_MUTEXES is not set
# CONFIG_KASAN is not set
# CONFIG_KMEMLEAK is not set
# CONFIG_SLUB_DEBUG is not set

Disabling Tracing

# production-strip-tracing.cfg

# CONFIG_FTRACE is not set
# CONFIG_FUNCTION_TRACER is not set
# CONFIG_DYNAMIC_FTRACE is not set
# CONFIG_KPROBES is not set
# CONFIG_UPROBES is not set
# CONFIG_PROFILING is not set
# CONFIG_PERF_EVENTS is not set

Removing Unused Drivers

# production-minimal-drivers.cfg

# CONFIG_NET_VENDOR_INTEL is not set
# CONFIG_NET_VENDOR_REALTEK is not set
# CONFIG_EXT4_FS is not set          # If using squashfs + tmpfs only
# CONFIG_BTRFS_FS is not set
# CONFIG_INPUT_MOUSEDEV is not set
# CONFIG_USB_STORAGE is not set
# CONFIG_SOUND is not set

Be conservative when removing drivers. Test thoroughly on your actual hardware after each change. A missing driver for an on-board device will cause silent failures.

Applying Kernel Config Fragments

# meta-vendor/recipes-kernel/linux/linux-aspeed_%.bbappend
FILESEXTRAPATHS:prepend := "${THISDIR}/${PN}:"
SRC_URI += " \
    file://production-strip-debug.cfg \
    file://production-strip-tracing.cfg \
    file://production-minimal-drivers.cfg \
"

# Validate all options took effect
bitbake linux-aspeed -c kernel_configcheck -f

Optimization Impact Summary

Category	Options Disabled	Image Savings	RAM Savings
Debug infrastructure	`DEBUG_KERNEL`, `DEBUG_INFO`, `DEBUG_FS`	2–5 MB	10–30 MB
Tracing and profiling	`FTRACE`, `KPROBES`, `PERF_EVENTS`	1–3 MB	5–15 MB
Unused drivers	Platform-specific	0.5–2 MB	1–5 MB
Unused filesystems	Platform-specific	0.2–1 MB	0.5–2 MB
Total potential		4–11 MB	17–52 MB

Optimization Workflow

Follow this systematic process to avoid wasted effort:

┌──────────────────────────────────────────────────────────┐
│                 Optimization Workflow                    │
├──────────────────────────────────────────────────────────┤
│                                                          │
│  1. MEASURE                                              │
│     systemd-analyze / free -m / df -h                    │
│                                                          │
│  2. IDENTIFY                                             │
│     critical-chain / blame / RSS ranking                 │
│                                                          │
│  3. OPTIMIZE                                             │
│     Lazy init, D-Bus batching, remove services,          │
│     kernel config stripping, Python -> C++ migration     │
│                                                          │
│  4. VERIFY                                               │
│     Re-measure, regression test, document impact         │
│                                                          │
│  5. REPEAT until targets are met                         │
│                                                          │
└──────────────────────────────────────────────────────────┘

Keep a log of your optimization results. Record the build commit, the change made, and before/after measurements. This prevents regressions and helps justify changes during code review.

Troubleshooting

Issue: Boot Time Regressed After Image Update

Symptom: systemd-analyze shows significantly longer boot time after updating packages.

Cause: A new or updated service on the critical boot path, or new Before=/Wants= dependencies.

Solution:

Compare systemd-analyze blame output before and after
Run systemd-analyze critical-chain to find the new bottleneck
Inspect dependencies: systemctl show <service> -p After -p Wants -p Before

Issue: Memory Usage Grows Over Time

Symptom: free -m shows available memory steadily decreasing over days.

Cause: Memory leak in a userspace service, or unbounded caching.

Solution:

Identify the growing process: ps aux --sort=-rss | head -10
Track with the mem-track.sh script from the profiling section
Check log entry counts: busctl tree xyz.openbmc_project.Logging | wc -l
Run the suspect under Valgrind (see Linux Debug Tools Guide)

Issue: Flash Image Exceeds Partition Size

Symptom: Build fails with “image too large” or BMC fails to boot after flashing.

Cause: Added packages or debug symbols pushed rootfs beyond the flash partition limit.

Solution:

Examine rootfs: du -sh tmp/work/*/obmc-phosphor-image/*/rootfs/*/ | sort -rh | head -20
Remove unnecessary packages from IMAGE_INSTALL
Ensure DEBUG_BUILD = "0" and symbols are stripped

Issue: systemd-analyze Shows Inaccurate Times

Symptom: Reported boot times do not match wall-clock observations.

Cause: systemd-analyze measures from kernel handoff to init, not from power-on. It does not account for U-Boot or firmware initialization.

Solution: For total power-on-to-ready time, use serial log timestamps. Add kernel boot time (dmesg first timestamp to init) plus userspace time for the complete picture.

Quick Reference

Boot Time Commands

Task	Command
Total boot time	`systemd-analyze`
Critical boot path	`systemd-analyze critical-chain`
Services by start time	`systemd-analyze blame`
Visual boot chart	`systemd-analyze plot > /tmp/boot.svg`
Kernel boot time	`dmesg \\| grep "Run /sbin/init"`

Memory Commands

Task	Command
System memory overview	`free -m`
Per-process RSS	`ps aux --sort=-rss \\| head -20`
Detailed memory map	`cat /proc/<pid>/smaps_rollup`
Kernel slab usage	`cat /proc/slabinfo \\| sort -k3 -rn \\| head`

Image Size Commands

Task	Command
Flash partition usage	`df -h` (on running BMC)
Kernel image size	`ls -lh tmp/deploy/images//zImage`
Package count	`opkg list-installed \\| wc -l` (on running BMC)

References

systemd-analyze man page – Boot time analysis tool
OpenBMC Development Environment – Development environment reference
Yocto Image Size Optimization – Build quality and size optimization
Linux Kernel Size Tuning – Embedded Linux kernel tuning
sdbusplus Repository – C++ D-Bus bindings used in OpenBMC
Systemd Boot Ordering – Prerequisite guide on systemd dependencies
Linux Debug Tools – Sanitizers and Valgrind for leak detection

Tested on: OpenBMC master with QEMU ast2600-evb. Memory figures measured on a representative AST2600 build. Actual values vary by image configuration and platform.

Performance Optimization Guide

Table of Contents

Overview

BMC Resource Constraints

Hardware Budgets by SoC Generation

Typical Memory Budget

Per-Service Memory Targets

Boot Time Profiling

systemd-analyze and Critical Chain

Blame Analysis

SVG Boot Chart and Kernel Time

Automating Boot Time Tracking

Runtime Memory Profiling

Per-Process RSS Tracking

Detailed Memory Maps with smaps

Tracking Memory Over Time

Detecting Memory Leaks

Optimization Strategies

Strategy 1: Lazy Initialization

Strategy 2: D-Bus Communication Patterns

Strategy 3: Python-to-C++ Migration

Strategy 4: Image Size Reduction

Strategy 5: Reduce Service Count

Kernel Configuration Optimization

Disabling Debug Options

Disabling Tracing

Removing Unused Drivers

Applying Kernel Config Fragments

Optimization Impact Summary

Optimization Workflow

Troubleshooting

Issue: Boot Time Regressed After Image Update

Issue: Memory Usage Grows Over Time

Issue: Flash Image Exceeds Partition Size

Issue: systemd-analyze Shows Inaccurate Times

Quick Reference

Boot Time Commands

Memory Commands

Image Size Commands

References