Per-instance TSP solver learns from scratch, hits 1.66% gap on d1291

TL;DR

A developer posted a reinforcement-learning TSP solver that trains from scratch on a single instance and reports a 1.66% optimality gap on TSPLIB d1291. The agent uses PPO plus geometric inductive biases for 'exception edges'; code and a Colab notebook are available for verification.

What happened

The project author built a Travelling Salesman Problem (TSP) solver that trains per instance without any pre-training on other problems. The implementation uses Proximal Policy Optimization (PPO) and incorporates hand-crafted inductive biases that highlight likely promising edges, informed by the agent author's hypothesis about the role of ‘minimum edges’ versus a small set of ‘exception edges’ that lie outside local nearest-neighbor structure. According to the author, the solver reached a 1.66% gap on the TSPLIB instance d1291 after roughly 5.6 hours of training on a single NVIDIA A100 GPU. The codebase and an executable Colab notebook have been published on GitHub for others to inspect and reproduce the reported results. The author has invited questions about the geometric priors and PPO implementation.

Why it matters

Demonstrates a per-instance training approach that avoids large-scale pre-training datasets.
Suggests inductive biases based on geometry/topology can focus learning on a small set of difficult edges.
Offers a reproducible experiment (code and Colab) for researchers to evaluate instance-specific RL for combinatorial problems.
If generalizable, could reduce data requirements for applied TSP solvers that must adapt to individual problem instances.

Key facts

Solver is trained from scratch for each TSP instance (no pre-training).
Learning algorithm used: Proximal Policy Optimization (PPO).
Reported performance: 1.66% gap on TSPLIB instance d1291.
Training time reported: about 5.6 hours on a single NVIDIA A100 GPU.
Core hypothesis: optimal tours are mostly composed of nearest-neighbor 'minimum edges' plus a small number of 'exception edges'.
Author encoded geometric/topological inductive biases to guide the agent toward promising edges.
Repository and Colab notebook published: https://github.com/jivaprime/TSP_exception-edge.
Author posted the result on Hacker News and offered to answer questions about the approach.

What to watch next

Try reproducing the result using the provided GitHub repo and Colab notebook (confirmed in the source).
Whether the exception-edge inductive bias yields similar gains on other TSPLIB instances or random instances (not confirmed in the source).
How this per-instance approach compares, in solution quality and compute cost, to pre-trained learning-based solvers and classical TSP algorithms (not confirmed in the source).

Quick glossary

Travelling Salesman Problem (TSP): A combinatorial optimization problem that seeks the shortest possible route visiting a set of cities exactly once and returning to the start.
Proximal Policy Optimization (PPO): A reinforcement learning policy-gradient method that updates policies with controlled step sizes to improve stability.
TSPLIB: A library of sample instances for benchmarking TSP and related routing algorithms.
Pre-training: Training a model on a broad dataset before fine-tuning or applying it to specific tasks or instances.
Inductive bias: Prior assumptions or structural choices embedded in a model or algorithm that guide learning toward particular solutions.

Reader FAQ

Did the solver use any pre-trained model or dataset?
No — the author states the solver is trained from scratch per instance with no pre-training.

What hardware and time were required for the reported result?
The reported run took about 5.6 hours on a single NVIDIA A100 GPU.

Is the code available to reproduce the experiment?
Yes — the author published the code and a Colab notebook at the linked GitHub repository.

Does this solver beat state-of-the-art TSP methods broadly?
not confirmed in the source

Does the method generalize to other TSP instances?
not confirmed in the source

OP here. Most Deep Learning approaches for TSP rely on pre-training with large-scale datasets. I wanted to see if a solver could learn "on the fly" for a specific instance…

Sources

Show HN: Per-instance TSP Solver with No Pre-training (1.66% gap on d1291)

Per-instance TSP solver learns from scratch, hits 1.66% gap on d1291

By

TL;DR

What happened

Why it matters

Key facts

What to watch next

Quick glossary

Reader FAQ

Sources

Related posts

By

Related Post

Samsung’s CES 2026 event pushed AI across products, not new Galaxy phones

ICE Uses Facial-Recognition Technology via Mobile App to Rapidly Make Arrests

Agentic Patterns: Curated Catalogue of Repeatable, Traceable AI Agent Patterns

Leave a Reply Cancel reply

You missed

Capita tells civil servants to wait for chatbots to fix pension portal issues

Auditing my subscriptions for the New Year revealed $100 in monthly waste

Samsung Galaxy S26 could rise in price in South Korea but stay flat in US

Galaxy S26 Edge’s Return in Doubt After Indian Certification Listing Sparks Debate