TL;DR
Patch sets would expose an x86-style TSO memory model on some Arm processors via new prctl() controls, allowing user space to request TSO where hardware supports it. Kernel maintainers have raised strong objections over fragmentation and long-term maintenance, and several alternatives have been proposed, including leaving the work out-of-tree or enabling TSO only for virtual machines.
What happened
Linux developers are debating a pair of patch series that would let user space query and request a TSO (total store ordering) memory model on Arm hardware that can provide it. The proposed interface adds PR_GET_MEM_MODEL and PR_SET_MEM_MODEL prctl() calls; PR_GET_MEM_MODEL would report the CPU's current memory model, and PR_SET_MEM_MODEL would try to select either the default Arm model or TSO, returning success only if the CPU supports the requested mode (or a stricter mode). The work comes from downstream trees: Hector Martin's series—already shipping in Asahi Linux—and a similar, earlier submission from Zayd Qumsieh that targeted virtual machines on Apple silicon. Some Arm chips (NVIDIA and Fujitsu) run in TSO always, while Apple’s silicon can make TSO available at runtime. Kernel maintainers including Will Deacon and Catalin Marinas object to adding the feature upstream, citing fragmentation and long-term support costs. Proposals to address the issue include keeping the code out-of-tree, enabling TSO unconditionally on some hardware (with reported performance costs), or restricting TSO exposure to VMs.
Why it matters
- Memory models determine whether CPUs can reorder reads and writes; incompatibilities can break concurrent programs or emulators.
- x86 software and emulators written to assume TSO can malfunction on weaker Arm memory models unless TSO is emulated or provided.
- Exposing TSO in the kernel could improve correctness and performance for some workloads but risks fragmenting user-space behavior across Arm platforms.
- Choosing whether to accept the feature involves trade-offs between hardware capability exposure, performance impact, and long-term maintenance burden.
Key facts
- x86 uses the TSO model, which guarantees stores are seen by all CPUs in program order; Arm implements a weaker model that permits more reordering.
- Some Arm CPUs (NVIDIA and Fujitsu) implement TSO at all times; Apple’s chips make TSO available as an optional runtime feature.
- Hector Martin submitted a patch series to add PR_GET_MEM_MODEL and PR_SET_MEM_MODEL prctl() operations to let user space query and request TSO.
- PR_SET_MEM_MODEL may select a stricter memory model than requested and will fail if the CPU does not support the requested model.
- A similar patch by Zayd Qumsieh targeted enabling TSO for Linux running inside virtual machines on Apple silicon.
- Martin reports the Asahi Linux downstream series has been used by thousands of users.
- Kernel maintainers including Will Deacon expressed strong objections, warning that the feature would fragment user-space and impose maintenance costs; Catalin Marinas indicated he would block such patches.
- Alternatives discussed include keeping the patch out-of-tree/distribution-specific, enabling TSO unconditionally on Apple CPUs (Martin cited roughly a 9% performance penalty), or enabling TSO only for VMs as suggested by Marc Zyngier.
- Emulating TSO by inserting memory barriers incurs a performance penalty, which motivates hardware support where available.
What to watch next
- Whether the proposed prctl() interface will be accepted into the upstream kernel: not confirmed in the source
- If distributions running on Apple silicon will continue shipping the downstream Asahi patches or attempt to upstream them: not confirmed in the source
- Any decisions to restrict TSO exposure to virtual machines or to enable it unconditionally on specific hardware, and their measured performance and power impacts: not confirmed in the source
Quick glossary
- Memory model: A specification describing the ordering guarantees that hardware provides for memory operations across multiple processors.
- TSO (Total Store Ordering): A memory model used by x86 where stores from a CPU are observed by others in the order they were issued; it restricts certain reordering behaviors.
- prctl(): A Linux system call that lets a process control various kernel-level attributes for its own execution environment.
- Memory barrier: A CPU instruction that enforces ordering constraints on memory operations to ensure visibility or ordering across cores.
Reader FAQ
What is being proposed to add to the kernel?
A pair of prctl() operations, PR_GET_MEM_MODEL and PR_SET_MEM_MODEL, to query and request the TSO memory model where hardware supports it.
Which Arm CPUs can provide TSO?
The article says some NVIDIA and Fujitsu chips run with TSO always and Apple chips can expose TSO as an optional runtime feature.
Will the patches be merged into mainline Linux?
not confirmed in the source
What are the main objections?
Maintainers worry this would fragment user-space expectations and create long-term maintenance burdens if software begins to depend on an implementation-defined feature.
Are there alternatives?
Options discussed include keeping the support out-of-tree, enabling TSO unconditionally on Apple hardware (with a cited ~9% performance cost), or limiting TSO to virtual machines.
Support for the TSO memory model on Arm CPUs By Jonathan Corbet April 26, 2024 At the CPU level, a memory model describes, among other things, the amount of freedom…
Sources
- Support for the TSO memory model on Arm CPUs
- AtoMig: Automatically Migrating Millions Lines of Code from …
- This raises questions. For example, modern x86 …
Related posts
- Mux seeks platform engineer focused on improving internal developer DX
- Apple confirms Chase takeover of Apple Card and outlines next steps
- Lisa Brennan-Jobs’ memoir depicts Steve Jobs as ‘a truly rotten person’