TL;DR
Fabrice Bellard released ts_zip, an experimental utility that compresses text files using a language model (RWKV 169M v4). It achieves substantially lower bits-per-byte than xz on benchmark files but requires a GPU, is slower than conventional compressors, and is limited to text.
What happened
ts_zip is a new experimental compressor that encodes text by using a neural language model to predict next-token probabilities and an arithmetic coder to emit compressed output. The implementation uses the RWKV 169M v4 model, quantized to 8 bits per parameter and evaluated with BF16 arithmetic. Bellard’s page includes benchmark comparisons versus xz on standard corpus files: for example, enwik8 shrinks from 24,865,244 bytes (1.989 bpb with xz) to 13,825,741 bytes (1.106 bpb) with ts_zip; enwik9 drops from 213,370,900 bytes (1.707 bpb) to 135,443,237 bytes (1.084 bpb). The tool currently only supports text input, runs deterministically across hardware configurations, and is described as experimental with no guaranteed backward compatibility between versions. Downloads for Linux and Windows builds are provided on the project page.
Why it matters
- Demonstrates that a compact LLM can improve compression ratios for large text corpora compared with traditional compressors on benchmark data.
- Deterministic model evaluation means compressed files can be decompressed on different hardware or thread counts without mismatch.
- Highlights trade-offs: higher compression at the cost of slower throughput and a requirement for GPU resources.
- Points toward specialized LLM-based compressors for text and small-message variants (ts_sms) rather than general-purpose binary compression.
Key facts
- ts_zip uses the RWKV 169M v4 language model quantized to 8 bits per parameter.
- Model inference is evaluated with BF16 floating point numbers.
- Compression combines next-token probabilities from the LLM with an arithmetic coder.
- Benchmarks versus xz: alice29.txt -> xz: 48,492 bytes (2.551 bpb); ts_zip: 21,713 bytes (1.142 bpb).
- Benchmarks versus xz: book1 -> xz: 261,116 bytes (2.717 bpb); ts_zip: 137,477 bytes (1.431 bpb).
- Benchmarks versus xz: enwik8 -> xz: 24,865,244 bytes (1.989 bpb); ts_zip: 13,825,741 bytes (1.106 bpb).
- Benchmarks versus xz: enwik9 -> xz: 213,370,900 bytes (1.707 bpb); ts_zip: 135,443,237 bytes (1.084 bpb).
- Benchmarks versus xz: linux-1.2.13.tar -> xz: 1,689,468 bytes (1.441 bpb); ts_zip: 1,196,859 bytes (1.021 bpb).
- A GPU is necessary for reasonable speed and at least 4 GB of RAM is required; compression/decompression speeds reach up to ~1 MB/s on an RTX 4090.
- Only text files are supported; binary files are unlikely to compress well with this tool.
What to watch next
- ts_sms, mentioned as a related tool optimized for compressing small messages (see project page).
- Whether future ts_zip releases change model size, quantization, or add backward compatibility — not confirmed in the source.
- Potential support for binary files or faster, CPU-only modes — not confirmed in the source.
Quick glossary
- Large Language Model (LLM): A neural network trained on large amounts of text to predict or generate language-like token sequences.
- Quantization: Reducing the numeric precision of model parameters to shrink memory use and speed up inference, often at some trade-off in accuracy.
- Arithmetic coder: A lossless entropy coding method that encodes data based on predicted symbol probabilities into a fractional number of bits.
- BF16 (BFloat16): A 16-bit floating-point format often used for neural network inference that preserves a wide dynamic range while reducing memory and compute.
- Bits per byte (bpb): A compression metric expressing average bits used to represent each original byte; lower is better.
Reader FAQ
Does ts_zip support binary files?
Only text files are supported; the source states binary files won't be compressed much.
Is a GPU required to run ts_zip?
Yes. The source says a GPU is necessary for reasonable speed and at least 4 GB of RAM is required.
Is ts_zip backward compatible across versions?
No backward compatibility should be expected between versions; the project describes the tool as experimental.
Is the tool open source or where to download it?
Download links for Linux and Windows builds are provided on the project page, but whether the source code is available is not confirmed in the source.
How fast is compression and decompression?
The source reports speeds up to about 1 MB/s on an RTX 4090.
ts_zip: Text Compression using Large Language Models The ts_zip utility can compress (and hopefully decompress) text files using a Large Language Model. The compression ratio is much higher than with…
Sources
- Fabrice Bellard's TS Zip (2024)
- Ts_zip: Text Compression Using Large Language Models
- Text Compression Gets Weirdly Efficient With LLMs
- Fabrice Bellard's ts_sms: Short Message Compression …
Related posts
- Samsung releases limited-time Stranger Things theme and phone wallpapers
- Fabrice Bellard’s ts_zip: LLM-backed text compression using RWKV 169M v4
- Anthropic launches Claude for Healthcare after OpenAI unveils ChatGPT Health