Async Concurrency with Embassy
Running multiple concurrent tasks on a single-core microcontroller without an OS.
Prerequisites: This chapter builds on RTOS and Concurrency from Part 6 and is the most advanced chapter in Part 8. Complete all prior Part 8 chapters before continuing.
Async/Await for Embedded
Rust’s async/await syntax works without the standard library. At the language level, async fn returns a Future — a state machine that the compiler generates from your sequential-looking code. An executor repeatedly polls each future until it returns Poll::Ready.
Core Concepts
| Concept | Role |
|---|---|
| Future | A value that will be available later (Poll::Pending or Poll::Ready) |
| Polling | The executor calls Future::poll() to advance the state machine |
| Pinning | Guarantees the future will not move in memory (required for self-referential state) |
| Waker | A callback the executor registers so hardware interrupts can wake a pending future |
Why Async Beats Threads on Embedded
On a Cortex-M7 with 512 KB of RAM and no OS, threads are expensive — each needs its own stack (typically 1-4 KB). Async tasks share a single stack and only store their suspended state, which is often just a few dozen bytes.
| Approach | Stack Cost | Context Switch | Requires OS |
|---|---|---|---|
| OS threads (RTOS) | 1-4 KB per thread | Hardware-assisted | Yes |
| Async tasks | Shared stack, tiny state | Compiler-generated yields | No |
| Busy-wait polling | Shared stack | None (wastes CPU) | No |
Executor Loop
The executor is a simple loop that polls ready futures. Embassy makes this interrupt-driven: instead of busy-polling, the CPU sleeps until a hardware interrupt fires a waker.
flowchart TD
A[Start Executor] --> B[Poll all ready tasks]
B --> C{Any task<br>made progress?}
C -->|Yes| B
C -->|No| D[WFI — sleep until interrupt]
D --> E[Interrupt fires waker]
E --> F[Mark task as ready]
F --> B
Busy-Wait vs Interrupt-Driven
// Busy-wait polling (wastes CPU cycles)
loop {
if uart_data_ready() {
process(uart_read());
}
if timer_expired() {
toggle_led();
}
// CPU runs at 100% even when idle
}
// Async with Embassy (CPU sleeps between events)
#[embassy_executor::task]
async fn uart_task(mut usart: UartRx<'static>) {
let mut buf = [0u8; 64];
loop {
let n = usart.read_until_idle(&mut buf).await.unwrap();
process(&buf[..n]);
}
}
#[embassy_executor::task]
async fn blink_task(mut led: Output<'static>) {
loop {
led.toggle();
Timer::after(Duration::from_millis(500)).await;
}
}
// CPU sleeps (WFI) whenever both tasks are awaiting
Embassy Executor
Embassy is the leading async runtime for embedded Rust. It provides an executor, HAL drivers for STM32/nRF/ESP, and timer/channel primitives — all no_std and zero-allocation.
Key Components
| Crate | Purpose |
|---|---|
embassy-executor |
The async task executor |
embassy-time |
Timer futures (Timer::after, Ticker) |
embassy-stm32 |
Async HAL drivers for STM32 peripherals |
embassy-sync |
Channels, mutexes, signals for task communication |
Minimal Embassy Application
#![no_std]
#![no_main]
use embassy_executor::Spawner;
use embassy_stm32::gpio::{Level, Output, Speed};
use embassy_time::{Duration, Timer};
use defmt_rtt as _;
use panic_probe as _;
#[embassy_executor::main]
async fn main(_spawner: Spawner) {
// Initialize the HAL — configures clocks, enables peripherals
let p = embassy_stm32::init(Default::default());
// Configure PJ13 as output (green LED on STM32F769I-DISCO)
let mut led = Output::new(p.PJ13, Level::Low, Speed::Low);
loop {
led.set_high();
Timer::after(Duration::from_millis(500)).await; // Non-blocking!
led.set_low();
Timer::after(Duration::from_millis(500)).await;
}
}
The #[embassy_executor::main] macro sets up the executor, vector table, and clock configuration. The async fn main is the root task — it runs like a normal main() but can .await futures.
Embassy’s executor is interrupt-driven. When a task calls
.awaiton a timer or peripheral, the executor puts the CPU to sleep withWFI. A hardware interrupt (timer compare, UART RX, etc.) fires the waker, and the executor resumes the task. There is no busy-polling.
Task Spawning
Embassy supports spawning independent tasks that run concurrently on the single-core executor.
Spawning Tasks
use embassy_executor::Spawner;
use embassy_stm32::gpio::{Input, Level, Output, Pull, Speed};
use embassy_time::{Duration, Timer};
#[embassy_executor::main]
async fn main(spawner: Spawner) {
let p = embassy_stm32::init(Default::default());
let led = Output::new(p.PJ13, Level::Low, Speed::Low);
let button = Input::new(p.PA0, Pull::Down);
// Spawn independent tasks
spawner.spawn(blink_task(led)).unwrap();
spawner.spawn(button_task(button)).unwrap();
// Main task can do its own work or just idle
loop {
Timer::after(Duration::from_secs(60)).await;
}
}
#[embassy_executor::task]
async fn blink_task(mut led: Output<'static>) {
loop {
led.toggle();
Timer::after(Duration::from_millis(500)).await;
}
}
#[embassy_executor::task]
async fn button_task(mut button: Input<'static>) {
loop {
button.wait_for_rising_edge().await;
defmt::info!("Button pressed!");
}
}
Cooperative Scheduling
Embassy uses cooperative multitasking: tasks voluntarily yield at every .await point. The executor never preempts a running task.
sequenceDiagram
participant Exec as Executor
participant Blink as blink_task
participant Btn as button_task
Exec->>Blink: poll()
Blink->>Blink: led.toggle()
Blink-->>Exec: Pending (Timer 500ms)
Exec->>Btn: poll()
Btn-->>Exec: Pending (wait_for_rising_edge)
Exec->>Exec: WFI (sleep)
Note over Exec: Timer interrupt fires
Exec->>Blink: poll()
Blink->>Blink: led.toggle()
Blink-->>Exec: Pending (Timer 500ms)
Note over Exec: EXTI interrupt fires
Exec->>Btn: poll()
Btn->>Btn: log "Button pressed!"
Btn-->>Exec: Pending (wait_for_rising_edge)
Each
#[embassy_executor::task]function must have a'staticsignature — it cannot borrow local variables frommain. Pass owned values (peripherals, channels) as arguments.
Timer and Delay Futures
Embassy provides non-blocking timer primitives backed by hardware timer peripherals.
One-Shot Delays
use embassy_time::{Duration, Timer, Instant};
// Delay for a fixed duration (non-blocking — other tasks run)
Timer::after(Duration::from_millis(500)).await;
// Delay until a specific instant
let deadline = Instant::now() + Duration::from_secs(2);
Timer::at(deadline).await;
Periodic Ticker
For tasks that must run at a fixed rate (e.g., sensor sampling), Ticker compensates for execution time drift:
use embassy_time::{Duration, Ticker};
#[embassy_executor::task]
async fn sensor_task() {
let mut ticker = Ticker::every(Duration::from_millis(100));
loop {
// Runs every 100ms regardless of how long read_sensor() takes
let value = read_sensor();
process(value);
ticker.next().await; // Wait for next tick
}
}
Non-Blocking vs Blocking Delay
| Method | Blocks Executor | Other Tasks Run | Use Case |
|---|---|---|---|
Timer::after().await |
No | Yes | Normal async delay |
Ticker::every() |
No | Yes | Periodic sampling |
cortex_m::asm::delay() |
Yes | No | Sub-microsecond spin |
embassy_time::block_for() |
Yes | No | Short critical timing |
Never use
cortex_m::asm::delay()orblock_for()for long delays in async tasks. They block the entire executor, starving all other tasks. UseTimer::after().awaitinstead.
Interrupt-Driven Peripheral Futures
Embassy’s HAL drivers wrap peripheral interrupts as futures. Reading from UART, SPI, or I2C becomes a simple .await — the task sleeps until the hardware signals completion.
Async UART Echo
This example echoes received bytes back over UART1 (PA9 TX / PB7 RX on the STM32F769-DISCO ST-LINK VCP):
#![no_std]
#![no_main]
use embassy_executor::Spawner;
use embassy_stm32::usart::{Config, Uart};
use embassy_stm32::bind_interrupts;
use defmt_rtt as _;
use panic_probe as _;
// Bind the USART1 interrupt to Embassy's handler
bind_interrupts!(struct Irqs {
USART1 => embassy_stm32::usart::InterruptHandler<embassy_stm32::peripherals::USART1>;
});
#[embassy_executor::main]
async fn main(_spawner: Spawner) {
let p = embassy_stm32::init(Default::default());
let config = Config::default(); // 115200 8N1
let mut usart = Uart::new(
p.USART1, p.PB7, p.PA9, Irqs, p.DMA1_CH0, p.DMA1_CH1, config,
).unwrap();
defmt::info!("UART echo started");
let mut buf = [0u8; 64];
loop {
// Await incoming data — task sleeps, other tasks run
let n = usart.read_until_idle(&mut buf).await.unwrap();
// Echo back
usart.write(&buf[..n]).await.unwrap();
}
}
Async SPI Transfer
use embassy_stm32::spi::{Config, Spi};
let mut spi = Spi::new(
p.SPI2, p.PB10, p.PC3, p.PC2,
p.DMA1_CH2, p.DMA1_CH3,
Config::default(),
);
let tx_buf = [0x9F, 0x00, 0x00, 0x00]; // Read JEDEC ID
let mut rx_buf = [0u8; 4];
// Full-duplex transfer — task sleeps until DMA completes
spi.transfer(&mut rx_buf, &tx_buf).await.unwrap();
defmt::info!("JEDEC ID: {:02x}", rx_buf);
Async I2C Transaction
use embassy_stm32::i2c::{Config, I2c};
let mut i2c = I2c::new(
p.I2C1, p.PB8, p.PB9,
Irqs, p.DMA1_CH4, p.DMA1_CH5,
embassy_stm32::time::Hertz(100_000),
Config::default(),
);
let sensor_addr = 0x48u8;
let mut temp = [0u8; 2];
i2c.read(sensor_addr, &mut temp).await.unwrap();
let temperature = i16::from_be_bytes(temp) as f32 / 256.0;
Channel-Based Communication
When tasks need to exchange data, Embassy provides synchronization primitives in embassy-sync.
Bounded Channel (Backpressure)
use embassy_sync::channel::Channel;
use embassy_sync::blocking_mutex::raw::CriticalSectionRawMutex;
// Channel with capacity 4 — sender blocks if full
static SENSOR_CHANNEL: Channel<CriticalSectionRawMutex, u16, 4> = Channel::new();
#[embassy_executor::task]
async fn producer_task() {
let mut ticker = Ticker::every(Duration::from_millis(100));
loop {
let value = read_adc();
SENSOR_CHANNEL.send(value).await; // Blocks if channel full
ticker.next().await;
}
}
#[embassy_executor::task]
async fn consumer_task() {
loop {
let value = SENSOR_CHANNEL.receive().await; // Blocks if empty
if value > THRESHOLD {
defmt::warn!("Sensor value {} exceeds threshold!", value);
trigger_alarm();
}
}
}
Signal (Latest Value)
Signal keeps only the most recent value — new sends overwrite the previous. Useful for configuration updates or status reporting where only the latest value matters.
use embassy_sync::signal::Signal;
use embassy_sync::blocking_mutex::raw::CriticalSectionRawMutex;
static MODE_SIGNAL: Signal<CriticalSectionRawMutex, OperatingMode> = Signal::new();
#[embassy_executor::task]
async fn control_task() {
loop {
let mode = MODE_SIGNAL.wait().await;
match mode {
OperatingMode::Normal => { /* ... */ }
OperatingMode::LowPower => { /* ... */ }
}
}
}
Async Mutex
For shared resources that require exclusive access:
use embassy_sync::mutex::Mutex;
use embassy_sync::blocking_mutex::raw::CriticalSectionRawMutex;
static DISPLAY: Mutex<CriticalSectionRawMutex, RefCell<Option<Display>>> =
Mutex::new(RefCell::new(None));
#[embassy_executor::task]
async fn status_task() {
loop {
let display = DISPLAY.lock().await;
// Exclusive access to the display
display.borrow_mut().as_mut().unwrap().draw_status();
// Lock released when `display` is dropped
Timer::after(Duration::from_secs(1)).await;
}
}
Cooperative Multitasking Best Practices
The Starvation Problem
Because Embassy is cooperative, a task that never yields blocks the entire system:
// BAD: blocks the executor — no other task runs
#[embassy_executor::task]
async fn compute_task() {
loop {
// CPU-intensive work with no .await point
for i in 0..1_000_000 {
heavy_computation(i);
}
// Other tasks are starved for the entire loop!
}
}
// GOOD: yield periodically to let other tasks run
#[embassy_executor::task]
async fn compute_task() {
loop {
for chunk in data.chunks(64) {
process_chunk(chunk);
embassy_futures::yield_now().await; // Yield to executor
}
}
}
Breaking Up Long Computations
If your task does heavy processing (CRC calculation, signal filtering, image processing), break it into chunks and yield between them:
#[embassy_executor::task]
async fn crc_task(data: &'static [u8]) {
let mut crc = 0xFFFFu16;
for chunk in data.chunks(128) {
for byte in chunk {
crc = update_crc(crc, *byte);
}
// Yield every 128 bytes so other tasks can run
embassy_futures::yield_now().await;
}
defmt::info!("CRC: {:#06x}", crc);
}
Diagnosing Task Starvation
Symptoms of a blocked executor:
| Symptom | Likely Cause |
|---|---|
| LED blink freezes periodically | Another task has a long computation without yields |
| UART drops characters | RX task not polled fast enough — another task blocks |
| Timer callbacks fire late | Executor cannot service timer wakers promptly |
defmt output stops |
The logging task is starved |
Diagnosis approach:
- Add
defmt::info!timestamps at.awaitpoints to measure gaps - Use a hardware timer + GPIO toggle to measure per-task execution time on an oscilloscope
- Temporarily disable suspect tasks to isolate the blocker
Other Platforms
Using a different board? Embassy supports multiple chip families:
Platform HAL Crate Notes nRF52840 embassy-nrfBLE SoftDevice integration, excellent power management ESP32-C3/S3 embassy-esp(esp-hal-embassy)Wi-Fi/BLE async drivers via esp-wifiRP2040 embassy-rpDual-core support, PIO async drivers STM32 (all) embassy-stm32Broadest peripheral coverage The
embassy-executorandembassy-synccrates are platform-independent. Task code using channels, timers, and signals ports across chips unchanged — only HAL initialization differs.
Best Practices
- Never block in async tasks — use
.awaitfor all waiting; neverloop {}orcortex_m::asm::delay()for long durations - Yield in CPU-intensive work — call
embassy_futures::yield_now().awaitevery few hundred microseconds - Use channels over shared state —
ChannelandSignalare safer and easier to reason about thanMutex<RefCell<...>> - Prefer
TickeroverTimerloops —Ticker::every()compensates for execution time drift - Keep tasks small — each task should do one thing; compose behavior through channels
- Use
defmtfor logging — it is fast, zero-allocation, and works with Embassy’sdefmt-rttprobe logging - Bind interrupts explicitly — the
bind_interrupts!macro makes interrupt routing visible and auditable
What’s Next
This is the final chapter of Part 8. You have covered the complete embedded Rust development workflow:
- Toolchain Setup — cross-compilation, probe-rs, cargo-embed
- Embedded Software — HAL drivers, type-state GPIO, UART, timers
- Debugging — GDB, RTT logging, defmt, fault analysis
- Bare Metal Runtime — boot sequence, vector table, linker scripts
- Memory Management — stack, heap, static allocation in constrained systems
- C Interoperability — FFI, bindgen, mixing Rust and C firmware
- Binary Optimization — size reduction, LTO, panic handlers
- Async Concurrency — Embassy executor, tasks, channels (this chapter)
Suggested next steps:
- Browse the Embassy examples repository for more advanced patterns (USB, Ethernet, BLE)
- Experiment with
embassy-netfor async TCP/IP networking on STM32 - Try porting your blocking HAL code from earlier chapters to async Embassy equivalents
- Explore the
embassy-bootbootloader for OTA firmware updates - Contribute to the Rust embedded ecosystem — file issues, improve docs, write drivers