RTOS-based STM32 projects ¶

Real-time operating systems (RTOS) are common in STM32 applications that must handle multiple activities with predictable timing. This chapter explains how to design, implement, debug, and validate RTOS-based projects in STM32CubeIDE for Visual Studio Code.

The chapter focuses on practical engineering decisions:

When an RTOS is the right choice
How to structure tasks and shared resources
How to measure timing behavior and stability
How to avoid common design and debugging issues

RTOS fundamentals on STM32 ¶

An RTOS introduces a scheduler that switches execution between tasks. Each task has its own stack and priority. The scheduler decides which task runs, based on task state and priority rules.

Typical task states are:

Ready
Running
Blocked
Suspended

In STM32 systems, interrupts and RTOS scheduling must cooperate correctly:

Interrupt service routines must remain short
Time-consuming work should be deferred to tasks
Shared data between interrupt context and task context must be protected

Use an RTOS when the application must coordinate several concurrent functions, for example:

Communication stacks
Sensor acquisition pipelines
Control loops and supervision tasks
User interface and maintenance services

Choosing an RTOS model ¶

STM32 projects often use FreeRTOS or ThreadX. Both can support production systems when the architecture is clean and timing is validated.

Selection criteria include:

Existing team experience
Available middleware and ecosystem components
Trace and debug tooling requirements
Memory overhead and feature set

For many projects, the most important requirement is consistency. Select one RTOS model per product line when possible, then reuse patterns, templates, and validation methods across projects.

Project creation and baseline configuration ¶

Create and configure an RTOS project before adding application logic.

Typical baseline setup:

Create a project for the selected STM32 target.
Enable the RTOS middleware in the STM32 configuration.
Configure system clock and tick source.
Generate code.
Build and run a smoke test on hardware.

Baseline verification checklist:

Project builds without warnings that indicate configuration conflicts
Scheduler starts correctly
Idle and system timer behavior is nominal
Debug session can inspect RTOS objects

Note

Lock tool and bundle versions used by the project to keep developer and CI environments reproducible.

FreeRTOS reference workflow ¶

This section provides one concrete reference flow for FreeRTOS-based STM32 projects. The same architecture principles also apply to other RTOS options, but object names and API details can differ.

Reference task graph ¶

Example task model for a sensing and communication node:

acq_task (high priority): acquires sensor data and publishes samples
ctrl_task (high priority): applies control logic to latest sample window
comms_task (medium priority): serializes and transmits telemetry
diag_task (low priority): emits periodic health and watermark data

Interrupt responsibilities:

DMA or peripheral interrupt pushes a lightweight event token
Tasks consume events and perform non-interrupt processing

Reference queue and sync schema ¶

Example communication pattern:

isr_event_q: interrupt-to-task event queue
sample_q: acquisition to control queue
telemetry_q: control to communication queue
diag_mutex: protects shared diagnostics buffer

Data ownership rules:

The producer owns the data until enqueue succeeds.
The consumer owns the data immediately after dequeue.
Shared mutable state must have a single lock owner policy.

Timeout policy guidance:

Use bounded waits in control and communication tasks.
Count timeout events for runtime diagnostics.
Escalate repeated timeout bursts to a recoverable fault state.

Example startup sequence ¶

Initialize clocks, GPIO, DMA, and communication peripherals.
Create queues, mutexes, and event groups.
Create tasks with initial stack sizes.
Start scheduler.
Run a startup self-check and publish result in diagnostics.

Note

Keep startup deterministic. Avoid peripheral probes with unbounded wait loops before scheduler start.

Debug checklist for FreeRTOS ¶

Use this checklist when the system is unstable or intermittently failing:

Verify scheduler is running and SysTick is active.
Inspect all task states and identify unexpectedly blocked tasks.
Check queue depths and overflow counters under active load.
Confirm mutex ownership for shared resources.
Capture high-water marks for all task stacks.
Correlate interrupt frequency with queue consumer throughput.
Re-test with logging reduced in critical paths.

Expected debug evidence for closure:

Reproducible failing scenario
Confirmed root cause with one primary fix
Regression test showing stable behavior across repeated runs

Task design and partitioning ¶

Task design has stronger impact on stability than any specific API choice.

Recommended partitioning approach:

Define software responsibilities at system level.
Group work into coherent task roles.
Assign priorities from timing criticality, not convenience.
Define task communication contracts.

Common task roles in STM32 products:

Acquisition task for sensor sampling
Control task for state-machine logic
Communication task for protocol handling
Logging and diagnostics task
Maintenance task for non-time-critical operations

Priority planning guidance:

Highest priorities for hard timing constraints
Medium priorities for communication and data processing
Lower priorities for maintenance and reporting

Avoid using very high priority for tasks that can block on I/O. A blocked high-priority task is acceptable, but a busy loop at high priority can starve other tasks and hide design errors.

Synchronization and communication ¶

Use explicit synchronization mechanisms. Avoid ad hoc shared-state patterns.

Queues ¶

Queues are suitable for message passing between tasks and for interrupt-to-task handoff.

Use queues when:

The producer and consumer run at different rates
Data order must be preserved
Backpressure handling is required

Semaphores and mutexes ¶

Use binary or counting semaphores for event signaling. Use mutexes for ownership of shared resources.

Key rules:

Keep mutex hold time short
Do not block while holding a mutex unless strictly necessary
Use priority inheritance mechanisms when available

Event flags or event groups ¶

Event flags are useful when one task must wait for multiple asynchronous conditions.

Typical examples:

Wait for both communication ready and sensor ready signals
Coordinate startup dependencies between tasks

Interrupt to task handoff patterns ¶

Good interrupt design improves determinism and lowers risk.

Pattern:

Interrupt captures minimum required data.
Interrupt signals a task using RTOS-safe API.
Task performs processing in thread context.

Benefits:

Lower interrupt latency
Better debug visibility
Reduced risk of nested interrupt overload

Validation points:

Interrupt frequency at worst case load
Queue or buffer depth under burst traffic
Recovery behavior when consumer task is delayed

Memory and stack strategy ¶

Memory planning is critical for long-term reliability.

Static versus dynamic allocation ¶

Static allocation is recommended for safety-oriented and predictable systems. Dynamic allocation can be used with strict controls.

If dynamic allocation is used:

Define allocation ownership clearly
Detect allocation failures explicitly
Measure fragmentation risk with long-duration tests

Task stack sizing ¶

Define per-task stack based on measured usage, not guesswork.

Process:

Start with conservative stack sizes.
Run stress scenarios with full features enabled.
Measure high-water marks for every task.
Reduce margins only after repeated verification.

Recommended monitoring:

Stack watermark checks in diagnostics
Periodic health report for free heap and stack usage

Time base and scheduling behavior ¶

System timing behavior depends on tick configuration and scheduling policy.

Tick configuration ¶

Consider the tradeoff between:

Tick resolution
CPU overhead from periodic tick interrupts
Wake frequency in low-power use cases

A lower tick period can improve timing granularity but increases scheduler activity.

Software timers ¶

Use software timers for deferred events and periodic activities that do not require dedicated tasks.

Guidelines:

Keep timer callbacks short
Avoid blocking operations in callback context
Transfer long operations to worker tasks

Latency budgeting ¶

Define latency budgets for critical paths:

Interrupt arrival to task wakeup
Task wakeup to action completion
End-to-end response for control and communication flows

Measure actual values on target hardware and compare against requirements.

Low-power integration with RTOS ¶

RTOS and low-power policy must be designed together.

Integration principles:

Idle path should support low-power entry when system conditions permit
Wakeup sources must be explicit and tested
Clock restoration must be deterministic after wakeup

Typical sequence:

Scheduler reaches idle condition.
Platform code selects low-power mode.
Wake event occurs from configured source.
Clock and peripheral state are restored.
Scheduler resumes normal operation.

Test across realistic use profiles, not only synthetic idle scenarios.

Debugging RTOS applications ¶

RTOS debugging requires both kernel-level and application-level inspection. For debug adapter settings and launch.json RTOS options, see RTOS.

Core debug activities:

Inspect task list and task states
Verify blocked task reasons
Inspect queues, semaphores, and timers
Correlate interrupt activity with task execution

When investigating a deadlock or stall:

Capture call stacks for all tasks.
Identify shared resource ownership.
Check wait conditions and timeout paths.
Confirm that interrupt signaling still occurs.

If timing anomalies appear only at speed, repeat tests with instrumentation enabled and disabled to detect measurement side effects.

Runtime statistics and profiling ¶

Runtime metrics support objective optimization.

Useful metrics:

CPU load by task class
Context-switch frequency
Worst-case task execution time
Queue depth peaks and dropped messages

Profiling goals:

Confirm headroom under peak load
Detect starvation of low-priority maintenance tasks
Identify unstable timing behavior before field deployment

Record the measurement setup with each dataset:

Target board and clock profile
Build type and optimization level
Probe and trace configuration
Test scenario and duration

Fault handling in RTOS systems ¶

Fault strategy must be deterministic and observable.

Recommended fault handling flow:

Capture minimal fault context.
Store fault records in a persistent or retained area.
Transition to safe state or controlled reset.
Report fault signature after restart.

Include RTOS-aware diagnostic data when possible:

Current task identity
Stack pointers and high-water marks
Scheduler state flags
Recent event log identifiers

A fault handler must avoid unsafe dependencies. Keep it independent from components that may already be corrupted.

Testing and validation strategy ¶

Validation for RTOS systems should combine host and target testing.

Test levels:

Unit tests for algorithmic modules
Integration tests for task interaction
System tests on real hardware
Long-duration stability tests

Essential scenarios:

High message-rate bursts
Peripheral error and timeout events
Resource exhaustion conditions
Repeated sleep and wake cycles
Communication loss and reconnection

For each release candidate, run regression tests with instrumentation and a minimal-overhead build to compare behavior.

Use this chapter together with the general testing material in Testing and validation.

Common pitfalls and troubleshooting ¶

Task starvation ¶

Symptoms:

One or more tasks rarely run
Background services lag or stop

Checks:

Priority assignments
Long critical sections
Busy loops without blocking calls

Queue overflow and data loss ¶

Symptoms:

Missing events
Sporadic protocol failures

Checks:

Producer and consumer rate balance
Queue depth sizing at peak load
Timeout and retry policy

Deadlock between tasks ¶

Symptoms:

System appears alive but no progress is made
Multiple tasks remain blocked indefinitely

Checks:

Lock acquisition order
Nested mutex usage patterns
Missing timeout and recovery paths

Timing regressions after feature growth ¶

Symptoms:

Occasional missed deadlines
Increased jitter after adding features

Checks:

New high-priority tasks or interrupts
Increased logging overhead in critical paths
Memory pressure and cache effects

Recommended project practices ¶

To keep RTOS projects maintainable across product life cycles:

Define an architecture note that documents task model and priorities
Keep communication contracts in shared headers with version fields
Enforce coding rules for interrupt-safe and task-safe APIs
Add automated checks for stack and heap health in nightly tests
Keep debug and release configuration differences documented

A stable RTOS project depends on clear ownership, measured timing, and repeatable validation. Treat these as continuous engineering activities, not one-time setup actions.