Bridge Anonymization: Local, reversible PII scrubber for translation pipelines

TL;DR

A TypeScript library called bridge-anonymization masks and later restores PII to enable safe use of external translation and LLM services while keeping sensitive data local. It combines regex rules and an ONNX-backed NER model, preserves context for translation, and encrypts the mapping used to rehydrate text.

What happened

Developer Tom Jordi Ruesch open-sourced bridge-anonymization, a local-first TypeScript library that masks personally identifiable information (PII) for translation workflows and then restores it after external machine translation or LLM processing. The tool applies a lifecycle of Detect -> Mask -> Translate -> Rehydrate and runs the masking and rehydration steps entirely on-device in Node.js or Bun. Detection is hybrid: deterministic regexes handle structured items like IBANs and credit cards (with checksum validation), while a quantized ONNX NER model identifies names, organizations and locations. To preserve translation quality, the project adds lightweight semantic enrichment via lookup tables (gender and GeoNames) and uses a fuzzy tag matcher to map mangled tags back to original values. The PII mapping table is encrypted using AES-256-GCM, and the project is available under an MIT license on npm and GitHub.

Why it matters

Helps teams send redacted text to external translation or LLM services without irreversibly losing grammatical context required for accurate translation.
Runs detection and rehydration locally, reducing the risk of exposing raw PII to third-party APIs.
Encrypting the PII map reduces the risk from persisted state or local storage leaks.
Offers a pragmatic trade-off between runtime cost and accuracy by combining fast regexes with a quantized NER model and compact lookup tables.

Key facts

Library name: bridge-anonymization; implementation: TypeScript.
Runs on-device for masking and rehydration in Node.js or Bun (supports onnxruntime-node and onnxruntime-web).
Hybrid detection: regex for structured PII (IBAN, credit cards with Luhn, emails) and ONNX NER for soft PII (names, orgs, locations).
Provides anonymizeRegexOnly() for low-latency streams and a full anonymize() pipeline for higher-precision scrubbing.
Uses a quantized (INT8) XLM-RoBERTa ONNX model (~280MB) by default, claiming ~95%+ accuracy relative to the full model.
Semantic enrichment in V1 relies on lookup tables: gender-guesser (~40k Western names) and GeoNames (cities >15k population) to add attributes like gender or location type.
Fuzzy Tag Matcher tolerates changes introduced by external APIs (spacing, quotes, attribute order) to reliably rehydrate masked tokens.
PII mapping table is encrypted with AES-256-GCM; raw PII is kept in local memory and encrypted at rest.
Project is MIT licensed and available on GitHub and npm (package @elanlanguages/bridge-anonymization).

What to watch next

Planned research into ML-based semantic enrichment to replace or augment lookup tables (described as a future step).
Coverage and accuracy beyond mostly Western names and major cities — lookup tables currently cover many common Western names and large cities, but broader coverage is an area for improvement.
not confirmed in the source

Quick glossary

PII: Personally Identifiable Information — data that can be used to identify a specific individual, such as names, emails, or identification numbers.
ONNX: Open Neural Network Exchange — a format and runtime ecosystem that allows machine learning models to run across different frameworks and platforms.
NER: Named Entity Recognition — an NLP technique that identifies and classifies proper names and entities in text (people, organizations, locations, etc.).
Quantization: A model compression technique that reduces the precision of neural network weights (e.g., to INT8) to lower size and speed up inference with modest accuracy trade-offs.
AES-256-GCM: A symmetric encryption algorithm and authenticated mode that provides confidentiality and integrity for stored or transmitted data.

Reader FAQ

Is the mapping between placeholders and original PII stored securely?
Yes. The library encrypts the PII map using AES-256-GCM and keeps raw PII in local memory, with the persisted state encrypted at rest.

Does bridge-anonymization send raw PII to external translation or LLM APIs?
No. The workflow masks PII locally and sends only the anonymized text to external services; the mapping used to restore values remains local and encrypted.

Is the project open-source and where can I get it?
Yes. It is MIT licensed and distributed via npm and GitHub (links provided in the source).

Does it support disambiguation of names and locations for non-Western contexts?
not confirmed in the source

Does the library auto-download the NER model?
According to the source, the quantized model (~280MB) is auto-downloaded on first run when using the default quantized mode.

Press enter or click to view image in full size Photo by Egor Komarov on Unsplash A local-first, reversible PII scrubber for AI workflows using ONNX and Regex Tom Jordi…

Bridge Anonymization: Local, reversible PII scrubber for translation pipelines

By

TL;DR

What happened

Why it matters

Key facts

What to watch next

Quick glossary

Reader FAQ

Sources

Related posts

By

Related Post

Developing a Rust-Inspired Static Analysis Tool for C++ Using AI Assistance

C-Sentinel: Lightweight UNIX prober capturing system fingerprints for AI

taws: Keyboard-driven terminal UI for navigating and managing AWS

Leave a Reply Cancel reply

You missed

Capita tells civil servants to wait for chatbots to fix pension portal issues

Auditing my subscriptions for the New Year revealed $100 in monthly waste

Samsung Galaxy S26 could rise in price in South Korea but stay flat in US

Galaxy S26 Edge’s Return in Doubt After Indian Certification Listing Sparks Debate