ci/cdembeddedautomation

Embedding Timing Analysis in Containerized CI: Reproducible Real-Time Tests

UUnknown

2026-02-14

9 min read

Practical 2026 guide to containerize WCET tools and run reproducible timing tests in CI to catch regressions early.

Embed timing analysis into CI so timing regressions never surprise production

Pain point: Your functional tests pass, but release day reveals a timing regression that breaks a real‑time workflow. You need reproducible WCET tests in CI that catch regressions early without slowing developers.

Executive summary — what you'll get

This guide (2026 edition) shows how to containerize worst‑case execution time (WCET) and timing analysis tools, run them reliably inside CI, and prevent timing regressions. You’ll get pragmatic steps for hermetic images, host tuning, accurate measurement, artifact management, and automated gating. It includes concrete Dockerfile and CI examples, strategies for deterministic runs, and patterns to integrate modern commercial tools like VectorCAST+RocqStat (following Vector’s 2026 acquisition) or equivalent analysis tools.

Why 2026 is the moment for WCET in CI

In late 2025 and early 2026, industry consolidation and tooling integrations accelerated timing analysis adoption. Vector’s acquisition of RocqStat and the announced integration into VectorCAST signal vendor alignment: timing analysis is becoming a first‑class CI artifact in safety‑critical development. At the same time, infrastructure automation and reproducible builds advances make it feasible to run meaningful WCET checks as part of every merge.

That means teams can and should treat timing analysis like unit tests: automated, reproducible, and fast enough to be part of pull request validation. The rest of this article explains how to make that real.

High‑level architecture

At a glance, a reproducible WCET pipeline includes four layers:

Hermetic build image — pinned compiler toolchain and deterministic build flags.
Timing analysis container — the WCET tool and runtime harness packaged as a container image.
Controlled execution host — CI runner tuned for determinism (RT kernel, CPU isolation, disabled turbo).
Result management — baselines, artifacts, and automated regression detection in CI.

Key principles for reproducible timing tests

Pin everything: compiler, libc, toolchain, analysis tool versions.
Isolate hardware effects: control frequency scaling, turbo boost, and system daemons.
Run identical binaries: reproducible builds or artifact store for the instrumented binary.
Capture baselines: store WCET results as CI artifacts and use tolerances for drift.
Make failures actionable: diffable, traceable outputs and links to source commit + tool version.

Practical steps: containerizing a WCET tool

We’ll walk through building a container image for a timing analysis toolchain. The example is generic so you can adapt it to VectorCAST+RocqStat, aiT, OTA‑style analyzers, or in‑house tools.

1) Start from a minimal base and pin OS/toolchain

Use a small deterministic base (e.g., Debian Bookworm minimal) and pin the exact versions. In 2026, many teams adopt reproducible base images built from buildpacks or distroless patterns; prefer immutable image registries in your org.

FROM debian:bookworm-2026-01-01

# Install pinned toolchain versions
RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential=12.3-1 \
    gcc-12=12.3.0-1 \
    python3=3.11.5-1 \
    ... \
  && rm -rf /var/lib/apt/lists/*

# Copy the WCET tool installer / binaries
COPY rocqstat-3.2.1.tar.gz /opt/rocqstat/
RUN tar xzf /opt/rocqstat/rocqstat-3.2.1.tar.gz -C /opt && rm /opt/rocqstat/*.tar.gz

ENV PATH=/opt/rocqstat/bin:$PATH

# Entrypoint runs the analysis harness
ENTRYPOINT ["/opt/rocqstat/bin/run-wcet-analysis.sh"]

Notes: use strict version pins and cryptographic verification for downloads (GPG/sha256). For commercial tools like VectorCAST/RocqStat, follow vendor packaging guidance and keep licenses in an external secret store rather than baking them into images.

2) Provide a deterministic build artifact

WCET analysis must target the exact binary the CI is validating. Produce a reproducible build or upload the built binary to an artifact repository (e.g., Artifactory, OCI registry). Reproducible builds are preferred because they remove drift.

# Example: multi-stage Dockerfile to produce the instrumented artifact
FROM debian:bookworm-2026-01-01 as builder
# install pinned compilers
COPY . /src
WORKDIR /src
RUN make clean && make CC=gcc-12 CFLAGS='-O2 -g -fdata-sections' -j$(nproc)

FROM wcet-tool:3.2.1
COPY --from=builder /src/build/my_rt_binary /artifacts/my_rt_binary

3) Ensure controlled execution environment on CI hosts

Containers help with tooling but cannot fully control host hardware. For tight WCET work, use runners with predictable behavior:

Real‑time kernel: PREEMPT_RT or other RT patches on the host for lower jitter.
CPU isolation: boot with isolcpus= and pin the container to dedicated cores via cgroups or docker --cpuset‑cpus.
Disable frequency scaling: set governor=performance and disable turbo.
Minimal services: run a streamlined runner image that removes cron, logging agents and other daemon activity.

Example host tunables (provision in your runner image or boot scripts):

# Disable turbo and set performance governor
echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
# Disable hyperthreading (if needed)
sudo echo 0 > /sys/devices/system/cpu/cpuX/online
# Isolate CPUs at boot (GRUB) with isolcpus=2,3

4) Launch container with deterministic affinity and privileges

Provide the container pinned CPUs and access to performance counters if necessary. Example run:

docker run --rm --cpuset-cpus="2-3" --cpu-shares=2048 \
  --cap-add=SYS_ADMIN --device /dev/cpu/msr:/dev/cpu/msr:rw \
  -v /artifacts:/out myorg/wcet-tool:3.2.1 \
  --input /out/my_rt_binary --iterations 10 --output /out/wcet-results.json

Security note: granting devices or SYS_ADMIN should be reviewed. Many teams use privileged short‑lived runners only for timing pipelines and keep such runners out of general CI pools. Consider integrating virtual patching and image hardening into your pipeline as described by teams automating virtual patching.

Measuring and verifying timing deterministically

Two measurement patterns work well in CI:

Static WCET estimation — purely analytic methods (the analyzer computes a bound based on code/path analysis). These are deterministic given the same binary and tool config.
Measured worst‑case via controlled execution — run with stimulus harness, collect wall clock and hardware counters, and apply outlier filtering.

For static tools, the critical reproducibility elements are tool version, analysis model, and binary. For measured testing, add host control and repeatable harness inputs.

Tuning measurement parameters

Run multiple iterations (e.g., 50–100) to estimate distribution and worst observed latencies.
Discard warmup iterations.
Use high resolution timers (clock_gettime(CLOCK_MONOTONIC_RAW)) or hardware counters (PMU/rdtsc) depending on platform.
Record environment metadata: kernel version, CPU model, governor, container image digest, commit hash.

Integrate with CI: GitHub Actions / GitLab / Jenkins examples

Below is a GitHub Actions example that shows building artifacts, pushing them to an internal registry, running timing analysis in a dedicated runner, and saving the reports.

name: wcet-ci

on: [push, pull_request]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Build binary
        run: make CC=gcc-12 CFLAGS='-O2 -g'
      - name: Upload artifact
        uses: actions/upload-artifact@v4
        with:
          name: built-binary
          path: build/my_rt_binary

  wcet-analysis:
    runs-on: self-hosted-rt-runner
    needs: build
    steps:
      - name: Download binary
        uses: actions/download-artifact@v4
        with:
          name: built-binary
      - name: Run WCET container
        run: |
          docker run --rm --cpuset-cpus="2" -v ${{ github.workspace }}/build:/in myorg/wcet-tool:3.2.1 \
            --input /in/my_rt_binary --iterations 100 --output /in/wcet.json
      - name: Upload WCET result
        uses: actions/upload-artifact@v4
        with:
          name: wcet-results
          path: build/wcet.json

In this workflow, self-hosted-rt-runner is a dedicated runner with RT kernel, CPU isolation, and locked frequency settings. Keep such runners in a separate pool to avoid noisy neighbors.

Detecting timing regressions and gating merges

Detecting regressions is a combination of numeric thresholds and intelligent analysis:

Absolute thresholds: fail if WCET > certified limit.
Delta checks: compare current WCET against baseline; fail if > X% worse.
Statistical checks: use percentile comparisons (95th/99th) and bootstrap tests to avoid flaky fails.
Automated triage: attach diffs, tool versions, and call stacks to failure reports.

Store baselines as artifacts keyed by branch and target platform. Use a small time window for baseline updates and require manual approval to shift baselines to prevent silent drift.

Handling cross‑compilation and hardware differences

If your target is an embedded or heterogenous CPU (ARM microcontroller, automotive ECUs), consider two strategies:

Hardware‑in‑the‑loop (HIL) runners: connect CI to a lab with instrumented target boards. The container still packages the analysis tool, but the physical runner executes the binary on the real device and returns results.
Accurate simulation: use QEMU or other cycle‑accurate simulators encapsulated in a container. Be explicit about simulator version and model because simulator differences affect timing.

For many safety systems, HIL is required for final verification — but running fast preliminary checks in CI with simulated or controlled host runs catches most regressions early.

Tooling and vendor integrations — 2026 landscape

Vector’s acquisition of RocqStat and the roadmap to integrate it into VectorCAST illustrate the move toward unified test and timing toolchains. In 2026 expect:

Vendor-provided, containerized analyzer images (secure, signed) for CI.
Standardized artifact formats for WCET outputs so CI dashboards can parse and visualize timing trends.
Better orchestration between static analysis and dynamic measurement — e.g., static analysis to identify worst paths and targeted dynamic testing to validate those paths.

Practical implication: plan for images distributed by vendors but still verify reproducibility via pinned images and signature checks in CI.

Common pitfalls and how to avoid them

Pitfall: Running timing tests on shared runners. Fix: Use isolated, dedicated hardware for benchmark jobs.
Pitfall: Inconsistent compiler flags between functional builds and WCET targets. Fix: Enforce single build pipeline that produces both runtime artifacts and analysis inputs.
Pitfall: Blindly using container timestamps or nondeterministic file ordering. Fix: Use reproducible build practices and record the container image digest in results.
Pitfall: Overly strict gating that produces false positives. Fix: Build tolerant statistical checks and provide easy overrides with audit trails.

Case study (concise)

At a mid‑sized automotive supplier in 2025–26, the team introduced a WCET CI job using a vendor container image for RocqStat. They provisioned two self‑hosted RT runners in the lab behind a VPN. The pipeline pinned tool and kernel versions and required a signed artifact as input. Within three months they reduced late timing regressions by 85%: most regressions were small compiler flag changes that previously slipped through unit tests. The lessons were clear: standardize builds, lock hosts, and automate baselines.

Advanced strategies and future predictions

Look ahead to 2026–2028 trends to plan your roadmap:

Artifact provenance and SLSA enforcement: CI will not only run WCET but enforce provenance for all binaries fed to timing tools.
OCI also for baselines: expect baselines and WCET reports to be stored as OCI artifacts with semantic versioning and signatures.
Hybrid static/dynamic CI flows: static analyzers will flag paths and CI will dynamically exercise only those paths to reduce runtime cost.
Rising use of secure, vendor‑signed images: vendors will ship certified container images for timing tools — still pin digests and verify signatures in CI.

Checklist: Make your WCET CI reliable

Pin tool, compiler, and base image versions; verify signatures.
Produce reproducible binaries or store artifacts in an immutable repo.
Provision isolated RT runners with CPU isolation and controlled governors.
Run repeated iterations, collect percentiles, and record environment metadata.
Store baselines and run delta and statistical checks to gate merges.
Integrate results into PR feedback with clear, actionable messages and links to artifacts.

Quick reference scripts and snippets

Use this minimal harness pattern as a starting point. It executes a target binary N times, measures monotonic time, and outputs JSON with metadata.

#!/bin/bash
set -euo pipefail
BIN=$1
ITER=${2:-100}
OUT=${3:-wcet.json}

python3 - <<'PY'
import time,subprocess,json,sys
bin=sys.argv[1]
iter=int(sys.argv[2])
res=[]
for i in range(iter):
  start=time.time()
  subprocess.run([bin], check=True)
  res.append((time.time()-start))
meta={
  'iterations':iter,
  'max':max(res),
  'p95':sorted(res)[int(len(res)*0.95)],
  'values':res,
}
print(json.dumps(meta))
PY

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.