TL;DR

A developer building clickable links for GitHub PR comments ran into nonfunctional URLs and discovered GitHub exposes two ID systems: GraphQL node IDs and numeric database IDs used in URLs. By decoding the node IDs (base64 + MessagePack) or masking the lower 32 bits of the decoded integer, the original database ID can be recovered.

What happened

While adding direct links from a code-review product to GitHub PR comments, the author found stored GraphQL node IDs didn’t map to the integer IDs used in GitHub URLs. Investigation revealed GitHub operates two identifier formats. Some objects use a legacy base64-encoded string that decodes to a textual form like an enum and a database ID; newer objects use a different base64 payload that, when decoded, yields a MessagePack-encoded array. Decoding the newer node IDs produced arrays whose last element is the object’s database ID; decoding the raw base64 to a large integer showed the lower 32 bits match that same database ID. A simple bitmask on the decoded integer or unpacking the MessagePack array returns the numeric ID used in web URLs, allowing the author to generate working links without mass-migrating stored records.

Why it matters

  • Storing only GraphQL node IDs may be insufficient when you need numeric database IDs for web URLs or REST endpoints.
  • Developers can recover database IDs from node IDs without migrating or backfilling large data sets by decoding the node ID payload.
  • GitHub exposes two ID formats (legacy and newer MessagePack-based), so client code must handle both when mapping between APIs and web links.
  • Although GitHub’s docs advise treating node IDs as opaque, they do contain a reversible structure that practical tools can use.

Key facts

  • GitHub returns opaque GraphQL node IDs with prefixes like PRRC_ followed by a base64 payload.
  • Web URLs and some REST responses use integer database IDs (e.g., 2475899260) rather than GraphQL node IDs.
  • Decoding the base64 payload of newer node IDs yields MessagePack data that is an array; the last array element is the object’s database ID.
  • Interpreting the base64 payload as a large integer shows the lower 32 bits equal the numeric database ID, allowing extraction via a bitmask.
  • Older or legacy objects use a different base64-encoded textual format that includes an enum, object type name, and the database ID (for example, '010:Repository2325298').
  • The first element of the MessagePack arrays observed is consistently 0; other elements provide repository context and the object identifier.
  • The split between legacy and newer ID formats is inconsistent: some old repositories use legacy IDs, new objects in old repositories may use new IDs, and some object types (such as Users) can still use the legacy format.

What to watch next

  • Whether GitHub will change or officially document the internal MessagePack structure or the lower-32-bit relationship: not confirmed in the source.
  • Which object types or repository histories continue to return legacy-format IDs versus the newer MessagePack-encoded IDs; developers should validate both formats when building integrations.
  • If GitHub updates guidance on treating node IDs as opaque versus providing official conversion utilities: not confirmed in the source.

Quick glossary

  • GraphQL node ID: An opaque identifier returned by GitHub’s GraphQL API, typically a prefix plus a base64-encoded payload intended to uniquely identify objects across the platform.
  • Database ID: An integer identifier used internally and visible in GitHub web URLs and many REST API responses; often required to construct direct links.
  • Base64: A binary-to-text encoding scheme that represents binary data as ASCII characters; commonly used to transport compact binary payloads as text.
  • MessagePack: A compact binary serialization format that encodes structured data (arrays, maps, numbers, binary) into a binary representation.
  • Legacy ID format: An older base64-encoded identifier that decodes to a textual pattern containing an enum, object type name, and database ID.

Reader FAQ

Can I convert a GraphQL node ID to a numeric database ID?
Yes. For newer node IDs you can decode the base64 payload and unpack the MessagePack array to take the last element, or decode to an integer and mask the lower 32 bits to get the database ID.

Do all GitHub objects use the same node ID format?
No. GitHub uses both a legacy textual base64 format and a newer MessagePack-based format; which format appears depends on object type and creation history.

Should I treat node IDs as opaque?
GitHub’s migration guidance recommends treating new node IDs as opaque references, even though practical decoding approaches exist.

Will these extraction methods always work in the future?
Not confirmed in the source.

I was recently building a feature for Greptile (an AI-powered code review tool), when I hit a weird snag with GitHub's API. The feature should have been simple: I wanted…

Sources

Related posts

By

Leave a Reply

Your email address will not be published. Required fields are marked *