TL;DR

A developer posted a reinforcement-learning TSP solver that trains from scratch on a single instance and reports a 1.66% optimality gap on TSPLIB d1291. The agent uses PPO plus geometric inductive biases for 'exception edges'; code and a Colab notebook are available for verification.

What happened

The project author built a Travelling Salesman Problem (TSP) solver that trains per instance without any pre-training on other problems. The implementation uses Proximal Policy Optimization (PPO) and incorporates hand-crafted inductive biases that highlight likely promising edges, informed by the agent author's hypothesis about the role of ‘minimum edges’ versus a small set of ‘exception edges’ that lie outside local nearest-neighbor structure. According to the author, the solver reached a 1.66% gap on the TSPLIB instance d1291 after roughly 5.6 hours of training on a single NVIDIA A100 GPU. The codebase and an executable Colab notebook have been published on GitHub for others to inspect and reproduce the reported results. The author has invited questions about the geometric priors and PPO implementation.

Why it matters

  • Demonstrates a per-instance training approach that avoids large-scale pre-training datasets.
  • Suggests inductive biases based on geometry/topology can focus learning on a small set of difficult edges.
  • Offers a reproducible experiment (code and Colab) for researchers to evaluate instance-specific RL for combinatorial problems.
  • If generalizable, could reduce data requirements for applied TSP solvers that must adapt to individual problem instances.

Key facts

  • Solver is trained from scratch for each TSP instance (no pre-training).
  • Learning algorithm used: Proximal Policy Optimization (PPO).
  • Reported performance: 1.66% gap on TSPLIB instance d1291.
  • Training time reported: about 5.6 hours on a single NVIDIA A100 GPU.
  • Core hypothesis: optimal tours are mostly composed of nearest-neighbor 'minimum edges' plus a small number of 'exception edges'.
  • Author encoded geometric/topological inductive biases to guide the agent toward promising edges.
  • Repository and Colab notebook published: https://github.com/jivaprime/TSP_exception-edge.
  • Author posted the result on Hacker News and offered to answer questions about the approach.

What to watch next

  • Try reproducing the result using the provided GitHub repo and Colab notebook (confirmed in the source).
  • Whether the exception-edge inductive bias yields similar gains on other TSPLIB instances or random instances (not confirmed in the source).
  • How this per-instance approach compares, in solution quality and compute cost, to pre-trained learning-based solvers and classical TSP algorithms (not confirmed in the source).

Quick glossary

  • Travelling Salesman Problem (TSP): A combinatorial optimization problem that seeks the shortest possible route visiting a set of cities exactly once and returning to the start.
  • Proximal Policy Optimization (PPO): A reinforcement learning policy-gradient method that updates policies with controlled step sizes to improve stability.
  • TSPLIB: A library of sample instances for benchmarking TSP and related routing algorithms.
  • Pre-training: Training a model on a broad dataset before fine-tuning or applying it to specific tasks or instances.
  • Inductive bias: Prior assumptions or structural choices embedded in a model or algorithm that guide learning toward particular solutions.

Reader FAQ

Did the solver use any pre-trained model or dataset?
No — the author states the solver is trained from scratch per instance with no pre-training.

What hardware and time were required for the reported result?
The reported run took about 5.6 hours on a single NVIDIA A100 GPU.

Is the code available to reproduce the experiment?
Yes — the author published the code and a Colab notebook at the linked GitHub repository.

Does this solver beat state-of-the-art TSP methods broadly?
not confirmed in the source

Does the method generalize to other TSP instances?
not confirmed in the source

OP here. Most Deep Learning approaches for TSP rely on pre-training with large-scale datasets. I wanted to see if a solver could learn "on the fly" for a specific instance…

Sources

Related posts

By

Leave a Reply

Your email address will not be published. Required fields are marked *