TL;DR

A patch series would expose and allow selecting a TSO memory model on some Arm CPUs via new prctl() controls, aiming to help x86 emulation on Arm. Kernel maintainers have raised objections about fragmenting user space and the proposal faces alternatives including out-of-tree distribution support or VM-scoped approaches.

What happened

Jonathan Corbet reported on a set of kernel patches from Hector Martin that would let user space query and request a CPU memory model on Arm platforms. The series adds two prctl() operations: PR_GET_MEM_MODEL to report the processor's current model and PR_SET_MEM_MODEL to request either the default Arm model or TSO (total store ordering). Some Arm processors already implement TSO — certain NVIDIA and Fujitsu parts run with TSO always, while Apple silicon exposes TSO as an optional runtime feature. Martin's work, developed in the Asahi Linux downstream tree, aims to make that capability controllable by applications such as x86 emulators that rely on stricter ordering. The proposals met resistance from Arm architecture maintainers who warned that a selectable memory-model feature could encourage developers to enable TSO as a shortcut, producing binaries that break on other Arm CPUs. Alternatives discussed include keeping the feature out-of-tree, enabling TSO unconditionally on specific hardware at a measurable cost, or exposing TSO only within virtual machines.

Why it matters

  • x86 emulation on Arm can require TSO semantics for correctness; exposing TSO to user space could improve emulator performance or correctness.
  • Making a stricter memory model selectable risks creating software that works only on a subset of Arm hardware, complicating portability.
  • Enabling TSO globally on a platform carries measurable performance costs, creating trade-offs between correctness and system throughput.
  • How this is handled will influence whether downstream distributions continue carrying their own patches or whether the feature is accepted upstream.

Key facts

  • Memory models define how freely a CPU may reorder memory operations; Arm implements a weaker model than x86.
  • x86 implements Total Store Ordering (TSO), which ensures stores are seen in program order by all CPUs.
  • Arm's weaker model allows more reordering for implementation simplicity and potential performance gains, but requires more care in concurrent code.
  • Some Arm CPUs implement TSO: certain NVIDIA and Fujitsu parts always run TSO; Apple's CPUs expose TSO as an optional runtime feature.
  • Hector Martin's patch series adds PR_GET_MEM_MODEL and PR_SET_MEM_MODEL prctl() operations with values PR_SET_MEM_MODEL_DEFAULT and PR_SET_MEM_MODEL_TSO.
  • Martin's work originates in the Asahi Linux downstream tree and has been distributed to users there.
  • A similar patch by Zayd Qumsieh implemented TSO only for Linux virtual machines on Apple CPUs.
  • Arm architecture maintainers, including Will Deacon and Catalin Marinas, objected to making this implementation-defined feature available to user space due to fragmentation risks.
  • Alternatives discussed include shipping the feature out-of-tree, enabling TSO unconditionally on Apple CPUs (estimated ~9% performance cost per Martin), or enabling TSO only within VMs.

What to watch next

  • Whether the patch series is accepted into the upstream Arm architecture code or blocked by maintainers (not confirmed in the source).
  • How downstream distributions that support Apple silicon and Asahi Linux proceed: will they continue shipping the feature out-of-tree or seek another route (not confirmed in the source).
  • Whether TSO support ends up used only by emulators or is adopted more widely by other user-space software (not confirmed in the source).

Quick glossary

  • Memory model: A set of rules that describe the allowed ordering of memory operations (loads and stores) as observed by different threads or processors.
  • TSO (Total Store Ordering): A memory model where stores from a single CPU are observed by all other CPUs in program order; commonly implemented on x86.
  • Memory barrier: An instruction or mechanism that enforces ordering constraints between memory operations to ensure visibility guarantees across processors.
  • prctl(): A Linux system call that allows a process to control specific kernel behaviors or query kernel-provided attributes at runtime.
  • Emulator: Software that mimics the behavior of one architecture or platform on another, often requiring faithful replication of hardware semantics.

Reader FAQ

Why would user space want to request TSO on Arm CPUs?
Some user-space software—particularly x86 emulators—assume TSO semantics and may fail or need expensive software barriers if the CPU does not provide them.

Which Arm CPUs already provide TSO?
According to the source, some NVIDIA and Fujitsu CPUs run with TSO always, and Apple's CPUs expose TSO as an optional runtime feature.

Does enabling TSO affect performance?
The article cites an estimate that enabling TSO unconditionally on Apple CPUs would impose around a 9% performance penalty, per the patch author.

Will the kernel maintainers accept the patch soon?
Not confirmed in the source.

Can virtual machines use TSO without changing the kernel globally?
A VM-scoped approach was suggested that would start VMs with TSO enabled, making it available to applications running inside; this was proposed but not described as accepted.

Support for the TSO memory model on Arm CPUs Please consider subscribing to LWN Subscriptions are the lifeblood of LWN.net. If you appreciate this content and would like to see…

Sources

Related posts

By

Leave a Reply

Your email address will not be published. Required fields are marked *