Bare Metal Runtime

Understanding the journey from reset vector to your main function.

Prerequisites: This chapter builds on Bare Metal from Part 6. Complete that chapter first if you haven’t already.

The Boot Sequence

When the STM32F769 powers on, the Cortex-M7 core executes a precisely defined startup sequence. Understanding this sequence is essential for debugging early boot issues and customizing the runtime.

sequenceDiagram
    participant HW as Hardware
    participant VT as Vector Table
    participant RT as cortex-m-rt
    participant App as Your Code

    HW->>VT: Power-on reset
    VT->>HW: Load initial SP from 0x08000000
    VT->>RT: Jump to Reset_Handler at 0x08000004
    RT->>RT: Zero .bss section
    RT->>RT: Copy .data from flash to RAM
    RT->>RT: Call __pre_init() (optional)
    RT->>RT: Enable FPU (Cortex-M7)
    RT->>App: Call main()
    App->>App: loop { } (never returns)

What Happens at Reset

SP initialization — The processor loads the initial stack pointer from address 0x08000000 (first word of the vector table)
Reset handler — The processor loads the reset vector from 0x08000004 and begins execution
Runtime init — cortex-m-rt zeroes .bss, copies .data, optionally calls __pre_init()
FPU enable — On Cortex-M7, the FPU is enabled before entering user code
main() — Your #[entry] function is called

The #[entry] function must return ! (never return). There is no OS to return to — returning from main would execute whatever happens to be next in memory, causing undefined behavior.

The Vector Table

The Cortex-M vector table is an array of function pointers at the start of flash memory. The first 16 entries are defined by ARM:

Offset	Exception	Description
`0x00`	—	Initial Stack Pointer value
`0x04`	Reset	Entry point after reset
`0x08`	NMI	Non-Maskable Interrupt
`0x0C`	HardFault	All faults if no specific handler
`0x10`	MemManage	Memory protection fault
`0x14`	BusFault	Bus error
`0x18`	UsageFault	Undefined instruction, alignment
`0x2C`	SVCall	Supervisor call (SVC instruction)
`0x38`	PendSV	Pendable request for system service
`0x3C`	SysTick	System timer tick
`0x40+`	IRQ0…	Device-specific interrupts

Exception Handlers in Rust

cortex-m-rt provides the #[exception] attribute for registering exception handlers:

use cortex_m_rt::exception;

#[exception]
unsafe fn HardFault(ef: &cortex_m_rt::ExceptionFrame) -> ! {
    // Log the exception frame for debugging
    // ef contains: r0, r1, r2, r3, r12, lr, pc, xpsr
    panic!("HardFault at PC={:#010x}", ef.pc());
}

#[exception]
fn SysTick() {
    // Called every SysTick period
    // Used for timekeeping, RTOS tick, etc.
}

#[exception]
unsafe fn DefaultHandler(irqn: i16) {
    // Catches any unhandled interrupt
    panic!("Unhandled IRQ: {}", irqn);
}

During development, always implement a HardFault handler that logs the exception frame. Without it, faults silently loop and are nearly impossible to debug.

Linker Script Structure

Memory Regions

The memory.x file defines the physical memory layout. For the STM32F769, the full memory map is:

/* STM32F769NIH6 Full Memory Map */
MEMORY
{
    /* Primary regions (used by default) */
    FLASH  : ORIGIN = 0x08000000, LENGTH = 2M
    RAM    : ORIGIN = 0x20000000, LENGTH = 512K

    /* Special regions (for advanced use) */
    ITCM   : ORIGIN = 0x00000000, LENGTH = 16K   /* Instruction TCM */
    DTCM   : ORIGIN = 0x20000000, LENGTH = 16K   /* Data TCM, zero wait */
    SRAM1  : ORIGIN = 0x20020000, LENGTH = 368K   /* Main SRAM */
    SRAM2  : ORIGIN = 0x20078000, LENGTH = 16K    /* Additional SRAM */
}

Sections Layout

The cortex-m-rt linker script (link.x, included via -Tlink.x in .cargo/config.toml) defines how code and data are placed:

graph LR
    subgraph FLASH
        A[.vector_table] --> B[.text]
        B --> C[.rodata]
        C --> D[.data LMA]
    end
    subgraph RAM
        E[.data VMA] --> F[.bss]
        F --> G[Stack ↓]
    end
    D -.->|Copied at startup| E

Section	Location	Contents
`.vector_table`	Flash start	Exception vectors, initial SP
`.text`	Flash	Executable code
`.rodata`	Flash	Constants, string literals
`.data`	Flash (LMA), RAM (VMA)	Initialized static variables
`.bss`	RAM	Uninitialized static variables (zeroed)
Stack	RAM (top, grows down)	Call stack

Placing Code in ITCM/DTCM

For performance-critical code on the STM32F769, use custom sections:

// Place a function in ITCM (zero-wait-state instruction memory)
#[link_section = ".itcm"]
fn fast_isr_handler() {
    // This runs from ITCM — faster than flash
}

// Place data in DTCM (zero-wait-state data memory)
#[link_section = ".dtcm"]
static mut FAST_BUFFER: [u8; 256] = [0; 256];

This requires adding the corresponding sections to your linker script — see the STM32F769 reference manual for the complete memory map.

.bss and .data Initialization

Why Initialization Matters

Before Rust code can safely execute, the runtime must:

Zero .bss — Uninitialized statics (static mut X: u32 = 0) live in .bss. The C standard (and Rust) guarantees they start as zero. Flash contains no data for .bss — the startup code must write zeros to RAM.
Copy .data — Initialized statics (static X: u32 = 42) have their values stored in flash (LMA) but are accessed from RAM (VMA). The startup code copies the initial values from flash to RAM.

// Lives in .bss — zeroed by startup code
static mut COUNTER: u32 = 0;

// Lives in .data — copied from flash by startup code
static GREETING: &str = "Hello, embedded!";

// Lives in .rodata — stays in flash, read directly
const MAX_RETRIES: u32 = 5;

const values are inlined at each use site and live in .rodata (flash). static values have a fixed address in RAM. Prefer const for read-only values to save RAM.

The cortex-m-rt Startup Code

cortex-m-rt handles all of this automatically. The generated startup code (in assembly) does:

Reset_Handler:
    ldr r0, =_sbss       @ Start of .bss
    ldr r1, =_ebss       @ End of .bss
    movs r2, #0
bss_loop:
    cmp r0, r1
    bge bss_done
    str r2, [r0], #4     @ Write zero, advance pointer
    b bss_loop
bss_done:
    ldr r0, =_sdata      @ Start of .data (RAM VMA)
    ldr r1, =_edata      @ End of .data
    ldr r2, =_sidata     @ Start of .data (Flash LMA)
data_loop:
    cmp r0, r1
    bge data_done
    ldr r3, [r2], #4     @ Read from flash
    str r3, [r0], #4     @ Write to RAM
    b data_loop
data_done:
    bl main               @ Call user code

Interrupt Handlers

Registering Interrupt Handlers

Device-specific interrupts (beyond the core exceptions) use the #[interrupt] attribute from the PAC:

use stm32f7xx_hal::pac::interrupt;

#[interrupt]
fn EXTI0() {
    // Handles EXTI line 0 interrupt (e.g., PA0 button press)
    // Clear the pending bit to acknowledge
}

#[interrupt]
fn TIM2() {
    // Handles TIM2 update interrupt
}

NVIC Priority Configuration

The STM32F769 (Cortex-M7) supports 4 bits of priority (16 levels, 0 = highest):

use cortex_m::peripheral::NVIC;
use stm32f7xx_hal::pac::Interrupt;

unsafe {
    // Set priority (0 = highest, 15 = lowest)
    let mut nvic = cortex_m::Peripherals::take().unwrap().NVIC;
    nvic.set_priority(Interrupt::EXTI0, 1);
    nvic.set_priority(Interrupt::TIM2, 4);

    // Enable the interrupts
    NVIC::unmask(Interrupt::EXTI0);
    NVIC::unmask(Interrupt::TIM2);
}

The standard pattern uses Mutex<RefCell<Option<T>>> with critical sections:

use core::cell::RefCell;
use cortex_m::interrupt::{self, Mutex};
use stm32f7xx_hal::gpio::{Output, PushPull, PJ13};

// Shared peripheral — wrapped for safe access
static LED: Mutex<RefCell<Option<PJ13<Output<PushPull>>>>> =
    Mutex::new(RefCell::new(None));

#[entry]
fn main() -> ! {
    // ... setup code ...
    let led = gpioj.pj13.into_push_pull_output();

    // Move the LED into the shared static
    interrupt::free(|cs| {
        LED.borrow(cs).replace(Some(led));
    });

    // Enable EXTI0 interrupt for button
    // ...

    loop {
        cortex_m::asm::wfi(); // Wait for interrupt
    }
}

#[interrupt]
fn EXTI0() {
    interrupt::free(|cs| {
        if let Some(ref mut led) = LED.borrow(cs).borrow_mut().as_mut() {
            led.toggle();
        }
    });
}

Never use static mut directly for shared data — it’s unsound. The Mutex<RefCell<Option<T>>> pattern ensures exclusive access via critical sections. For a more ergonomic alternative, see the critical-section crate.

Cortex-M7 Cache

The STM32F769’s Cortex-M7 has separate instruction and data caches:

Cache	Size	Purpose
I-cache	16 KB	Speeds up instruction fetch from flash
D-cache	16 KB	Speeds up data access from SRAM

Enabling Caches

let mut cp = cortex_m::Peripherals::take().unwrap();

// Enable instruction cache (safe, always beneficial)
cp.SCB.enable_icache();

// Enable data cache (requires care with DMA!)
unsafe { cp.SCB.enable_dcache(&mut cp.CPUID) };

DMA and Cache Coherency

When using DMA, the DMA controller reads/writes directly to memory, bypassing the cache. This creates coherency issues:

flowchart TD
    A[CPU writes data] --> B[Data in D-cache]
    C[DMA reads memory] --> D[Stale data!]
    B -.->|Cache line NOT written back| D

    E[DMA writes memory] --> F[New data in RAM]
    G[CPU reads data] --> H[Stale cache!]
    F -.->|Cache NOT invalidated| H

Before DMA transmit (CPU → DMA): clean the cache to flush data to RAM:

// Clean cache lines for the DMA buffer
let buf_addr = buffer.as_ptr() as usize;
let buf_size = buffer.len();
cortex_m::asm::dsb();
unsafe {
    cp.SCB.clean_dcache_by_address(buf_addr, buf_size);
}

After DMA receive (DMA → CPU): invalidate the cache to discard stale data:

// Invalidate cache lines to read fresh DMA data
unsafe {
    cp.SCB.invalidate_dcache_by_address(buf_addr, buf_size);
}
cortex_m::asm::dsb();

For DMA buffers, the simplest approach is to place them in DTCM (which is not cached) or mark the memory region as non-cacheable in the MPU. This avoids cache maintenance entirely.

Best Practices

Always implement HardFault — log the exception frame for post-mortem debugging
Use wfi in idle loops — cortex_m::asm::wfi() saves power by sleeping until the next interrupt
Prefer const over static — saves RAM since const values stay in flash
Enable I-cache unconditionally — it’s always safe and improves performance
Be cautious with D-cache — only enable if you handle DMA coherency correctly
Use critical-section crate — newer, more ergonomic alternative to cortex_m::interrupt::free

Next Steps

Now that you understand the runtime, learn how memory is managed in Memory Management in no_std.

Example Code

Bare Metal Runtime

The Boot Sequence

What Happens at Reset

The Vector Table

Exception Handlers in Rust

Linker Script Structure

Memory Regions

Sections Layout

Placing Code in ITCM/DTCM

.bss and .data Initialization

Why Initialization Matters

The cortex-m-rt Startup Code

Interrupt Handlers

Registering Interrupt Handlers

NVIC Priority Configuration

Sharing Data Between Interrupt and Main

Cortex-M7 Cache

Enabling Caches

DMA and Cache Coherency

Best Practices

Next Steps