TL;DR

A code example summing integers up to a value triggered markedly different optimizations in GCC and clang. GCC kept a loop and applied pairwise and vectorized additions, while clang removed the loop entirely and emitted arithmetic that corresponds to the closed-form v(v-1)/2.

What happened

Matt Godbolt examined how different compilers optimize a function that sums integers up to a given value. With GCC the compiled output retained a loop; the optimizer combined two iterations at once, producing an update equivalent to adding x and x+1 as x*2+1 and, at higher optimization levels, could vectorize the loop. Clang, by contrast, eliminated the loop and emitted a short sequence of arithmetic instructions (lea, imul, shr, lea, dec) that compute the same result directly. Backing out the algebra shows clang’s sequence implements the closed-form formula v(v-1)/2, converting what was written as an O(n) loop into an O(1) computation at runtime. Godbolt notes his surprise and appreciation for compiler work; the post appears as day 24 of the Advent of Compiler Optimisations 2025 series and was written by him and proof-read by LLMs and humans.

Why it matters

  • Compilers can transform apparently straightforward code into very different runtime algorithms, affecting performance characteristics.
  • Loop-level optimizations can include pairing iterations and vectorization, improving throughput without changing source code.
  • Some compilers can recognise mathematical identities and replace iterative code with constant-time formulas.
  • Generated sequences may reflect concerns such as overflow handling or the compiler’s internal analysis, which can make emitted code non-obvious.

Key facts

  • The example is a function that sums integers up to a provided value.
  • GCC’s generated code kept a loop and combined two iterations at a time using an LEA instruction to perform result += 1 + x*2.
  • At -O3 GCC can further vectorize the loop using parallel adds.
  • Clang produced no loop; it emitted instructions (lea, imul, shr, lea, dec) that implement an arithmetic expression.
  • The arithmetic emitted by clang equates to the closed-form v(v – 1) / 2 for the sum of integers.
  • That transformation turns the apparent O(n) source algorithm into an O(1) sequence in the compiled output.
  • The post is written by Matt Godbolt and is day 24 of the Advent of Compiler Optimisations 2025 series.
  • The post was reviewed and proof-read by LLMs and humans, per the author’s note.

What to watch next

  • Watch the final posts in the Advent of Compiler Optimisations 2025 series for more examples and explanations.
  • A video accompanies this post; consult it for the author’s walkthrough of the example.
  • Whether clang’s exact instruction sequence is chosen to avoid overflow or is a side effect of induction-variable inference is not confirmed in the source.

Quick glossary

  • Compiler optimization: Transformations applied by a compiler to improve performance or reduce resource use while preserving program semantics.
  • Vectorization: An optimization that converts scalar operations performed in a loop into parallel operations that run on SIMD hardware units.
  • Closed-form solution: A direct mathematical expression that computes a result without iteration, e.g., a formula in terms of the input.
  • Induction variable: A variable in a loop whose value changes systematically each iteration, often analysed by compilers to enable optimizations.
  • Overflow: When a computation produces a value outside the representable range of the data type, potentially altering results; compilers may avoid transformations that could change overflow behavior.

Reader FAQ

Did clang actually remove the loop?
Yes. In the example clang emitted a short arithmetic sequence instead of an explicit loop, implementing the closed-form sum.

Did GCC also do the closed-form transformation?
No. GCC kept a loop and optimised it by combining two iterations and, at higher optimization, vectorizing; it did not replace the loop with the closed-form formula in this example.

Why did clang emit that particular sequence of instructions?
The author speculates it may be partly to avoid overflow and due to how clang tracks induction variables, but the exact reason is not confirmed in the source.

Does this mean compilers always turn loops into closed-form formulas?
Not confirmed in the source.

When compilers surprise you Written by me, proof-read by an LLM. Details at end. Every now and then a compiler will surprise me with a really smart trick. When I…

Sources

Related posts

By

Leave a Reply

Your email address will not be published. Required fields are marked *