TL;DR

An OpenJDK change replaced a /proc-based user-time implementation with a clock_gettime-based approach, removing file I/O and parsing. Microbenchmarks show average latency falling from ~11.2 µs to ~0.279 µs — about a 40× improvement in the test run — and much cleaner syscall profiles.

What happened

OpenJDK maintainers replaced code that read /proc/self/task/<tid>/stat and parsed its text to compute a thread's user CPU time with a small change that uses a thread-specific clockid and clock_gettime(). The removed implementation performed file I/O, string parsing and multiple syscalls; the new version queries pthread_getcpuclockid(), flips low bits in the returned clockid to select the user-time-only clock on Linux, and calls clock_gettime() directly. The patch added a JMH benchmark and removed the complex parsing code. In a local JMH run with 16 threads the author measured average latency drop from about 11.186 microseconds per call to 0.279 microseconds, reducing observed median and tail syscalls and showing far fewer kernel interactions. The change relies on a Linux kernel clockid encoding documented in kernel sources rather than POSIX man pages.

Why it matters

  • Calls to ThreadMXBean.getCurrentThreadUserTime() become far cheaper, reducing overhead in tooling and monitoring that sample thread CPU time frequently.
  • Replacing /proc reads with a single clock_gettime() syscall lowers kernel and VFS activity and reduces lock/contention under concurrency.
  • A small, targeted change in native code produced a substantial run-time improvement, showing value of platform-specific APIs where safe.
  • Because the fix depends on Linux-specific clockid encoding, portability and behavior on non-Linux systems need consideration.

Key facts

  • The OpenJDK change replaced a /proc-based implementation with a clock_gettime-based approach using a modified clockid.
  • Removed code read /proc/self/task/<tid>/stat, used sscanf to parse fields and converted clock ticks to nanoseconds.
  • New code obtains a clockid via pthread_getcpuclockid() and flips low bits to select the VIRT clock (user time only) before calling clock_gettime().
  • Linux encodes clock type into clockid_t; bits indicate thread vs process and clock type (00 PROF, 01 VIRT, 10 SCHED, 11 FD).
  • Linux kernels have used this clockid encoding since 2.6.12 (2005); documentation is sparse and primarily in kernel sources.
  • Author ran a JMH benchmark (16 threads) included with the patch; average latency fell from ~11.186 µs/op to ~0.279 µs/op.
  • Measured improvement in that run is about 40× on average; the original bug report cited a 30×–400× gap depending on setup.
  • Both before and after runs still show rare high-tail outliers (~1.2 ms), but the fixed version shows a much cleaner syscall profile.

What to watch next

  • Whether this change is backported or propagated across OpenJDK release branches (not confirmed in the source).
  • Impact on high-concurrency workloads and real-world monitoring tools that frequently sample per-thread user time.
  • Potential kernel or libc changes that could alter clockid encoding or its stability (not confirmed in the source).

Quick glossary

  • clock_gettime: A POSIX/C function that returns the current value of a specified clock, typically used to read elapsed, process, or thread CPU time.
  • /proc filesystem: A virtual filesystem on Linux that exposes kernel and process information as text files; reading it can involve kernel string synthesis and VFS operations.
  • pthread_getcpuclockid: A POSIX function that returns a clockid associated with a specific thread; on Linux the returned clockid encodes additional type and target information.
  • JMH: Java Microbenchmark Harness, a toolkit for building and running microbenchmarks on the JVM.
  • clockid_t: An integer type representing a clock identifier used by clock_gettime and related APIs; on Linux parts of its bits encode clock type and target thread/process.

Reader FAQ

What did the patch change in OpenJDK?
It replaced a /proc-based user-time reader with a clock_gettime-based implementation that uses a thread-specific clockid.

How big was the performance improvement?
In the author's JMH run the average latency dropped from ~11.186 µs to ~0.279 µs per call, roughly a 40× reduction; the original bug reported a 30×–400× range.

Why wasn't clock_gettime used originally?
POSIX clock types normally expose total CPU time (user + system); obtaining user-time-only required a Linux-specific tweak to the clockid.

Is this change portable to other operating systems?
The approach depends on Linux-specific clockid encoding; portability to non-Linux systems is not confirmed in the source.

How a 40-Line Fix Eliminated a 400x Performance Gap Jaromir Hamala QuestDB Team January 13, 2026 Tags: jvm linux performance engineering I have a habit of skimming the OpenJDK commit…

Sources

Related posts

By

Leave a Reply

Your email address will not be published. Required fields are marked *