TL;DR
Engineers who build at extreme scale report ELF binaries that can exceed tens of gigabytes, driven in part by broad use of static linking. On x86_64 a 32-bit signed relative offset on CALL instructions produces a practical ~2GiB reach limit; once code crosses that boundary linkers report relocation overflows and toolchains must change code models or emit less compact call sequences.
What happened
An engineer recounts encountering extremely large ELF binaries while working in industry and during academic research, including examples reported beyond 25 GiB when debug symbols were included. These sizes arise in part because some organizations prefer static builds that embed all required code in each binary. On x86_64 the typical CALL opcode (e8) uses a signed 32-bit relative offset, so a single callsite can only reach roughly +/-2 GiB. The author demonstrates this with a toy C example and a synthetic linker script that forces a target function more than 2 GiB away; the linker then emits relocation overflow errors. One common mitigation is to switch to a 'large' code model (using -mcmodel=large), which forces the compiler/linker to perform absolute jumps via an indirect register-based call sequence. That workaround resolves the relocation overflow but expands callsite instruction size and consumes a general-purpose register, with other trade-offs noted.
Why it matters
- At very large codebase scale, conventional assumptions about relative branch reach break and cause link-time failures.
- Static linking practices that produce single, self-contained binaries make it more likely to hit the 2 GiB relocation limit.
- Switching to a large code model avoids overflow but increases binary code size and register usage, potentially affecting instruction density.
- Linker errors from relocation overflow are a build-time interruption that require changes to toolchains or build layout.
- Tool and build-system choices at mega-scale have architectural consequences that rarely appear in smaller projects.
Key facts
- The author observed ELF binaries larger than 25 GiB (including debug symbols).
- On x86_64 the CALL opcode (e8) encodes a signed 32-bit relative offset, limiting reach to roughly +/-2 GiB.
- Relocation entries in object files mark places the linker must fix up; these can trigger 'relocation overflow' when targets are too far.
- A synthetic linker script was used to place a function at 0x120000000 to reproduce the overflow and provoke linker errors.
- Linkers (the author used lld) report out-of-range relocations when a callsite references a symbol farther than the 32-bit signed offset permits.
- Using -mcmodel=large changes callsites into a load of an absolute 64-bit address followed by an indirect call, avoiding the relative offset limit.
- Changing to the large model increased a call sequence from 5 bytes to 12 bytes in the demonstration and consumed the %rdx register.
- The author disabled asynchronous unwind tables (-fno-asynchronous-unwind-tables) for the demonstration to avoid additional overflows.
What to watch next
- Follow-up writings from the author on alternative strategies and code-model trade-offs (author indicated 'More to come in subsequent writings').
- How toolchains and large codebases adopt layout or linking changes to avoid relocation overflows without excessive instruction bloat.
- Whether switching code models measurably alters runtime performance in real workloads: not confirmed in the source.
Quick glossary
- ELF: Executable and Linkable Format, a common binary format for Unix-like systems used to store programs, object code, and shared libraries.
- Static linking: Building a binary that includes all required libraries and code so the program can run without relying on external shared libraries at runtime.
- Relocation: Metadata in object files that records addresses or offsets the linker must adjust when producing the final executable.
- Code model (-mcmodel): Compiler/linker setting that controls assumptions about code and data addresses, influencing how references (relative vs absolute) are emitted.
- Relative offset (CALL e8 on x86_64): A signed 32-bit value encoded by some branch/call instructions that specifies a jump target relative to the current instruction pointer.
Reader FAQ
Why is there a 2 GiB barrier on x86_64?
Because the common CALL instruction uses a signed 32-bit relative offset, which limits its reach to roughly +/-2 GiB from the callsite.
How did the author reproduce the problem?
They used a small C example and a linker script that placed the callee in a far-away address range, provoking relocation overflow errors reported by the linker.
What is the simple workaround?
Compiling and linking with a large code model (e.g., -mcmodel=large) makes the toolchain emit absolute-address sequences for calls, avoiding the 32-bit relative limit.
Does using the large code model have downsides?
Yes — in the demonstration it increased callsite instruction size and consumed a general-purpose register; broader performance impacts were not demonstrated in the source.
Are performance degradations from the large model proven in the article?
Not confirmed in the source.
A problem I experienced when pursuing my PhD and submitting academic articles was that I had built solutions to problems that required dramatic scale to be effective and worthwhile. Responses…
Sources
- Huge Binaries
- Binaries
- Compiled Elf binary is too big
- Hardening ELF binaries using Relocation Read-Only …
Related posts
- You Can’t Design Software You Don’t Work On — Design Must Be Grounded in Code
- How market design transformed Feeding America’s food distribution
- Kidnapped by Deutsche Bahn: A 35 km Trip That Ended 63 km Away