# The Traveling Salesman Problem (TSP)

This blog is the 1st part of a 3-part blog series on Operations Research (OR) in Transportation Engineering. In this blog, we will begin with one of the most fundamental problems in logistics – the Traveling Salesman Problem (TSP).

Ever wondered how newspaper, posts, mails etc., get delivered to your house? Ever wondered how groceries get delivered to the supermarkets, and how many such brick-and-mortar stores get re-stocked? This is essentially what the Traveling Salesman Problem, one of the most fundamental problems in logistics and operations research, tries to answer. To be more precise, given a depot node and a set of customer nodes, the objective of a Traveling Salesman Problem is to determine the optimal order of visits to the customer nodes that results in least cost. The figure below is one such example of the Traveling Salesman Problem with the depot located near the geographic center and 100 customers spread across the region. I encourage the reader to think of a rule of thumb to decide the order of visits for this problem before we delve deep into the mathematical formulation and the solution method employed to solve the Traveling Salesman Problem. By the end of this blog, compare your rule of thumb with the solution developed in this blog.

Formulating the problem

Given a graph $$\small G = (N,A)$$ with set of nodes $$\small N$$ – with depot node $$\small d$$ and set of customer nodes $$\small C$$, and set of arcs $$\small A = \{(i,j); i,j \in N\}$$ with arc traversal cost $$\small c_{ij}\ \forall\ (i,j) \in A$$, the objective of the Traveling Salesman Problem is to develop a least cost route starting and ending at the depot node, such that every customer node is visited exactly once.

Objective function: minimize total cost of routing

$$\small\quad \min{z=\sum_{\left(i,j\right)\in A}{c_{ij}x_{ij}}}$$

Subject to,
Service constraint: every customer node must be visited exactly once

$$\small\quad\quad\quad \sum_{i\in T_j} x_{ij}=1\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \forall\ j\in N$$

Flow constraint: at every node, the outgoing vehicle flow should balance the incoming vehicle flow.

$$\small\quad\quad\quad \sum_{i\in T_j} x_{ij}=\sum_{k\in H_j} x_{jk}\ \ \ \ \ \ \ \ \forall\ j\in N$$

Sub-tour elimination constraint: eliminating tours that do not start and end at the depot node.

$$\small\quad\quad\quad \sum_{\left(i,j\right)\in S} x_{ij}\le\left|S\right|-1\ \ \ \ \ \ \ \ \ \ \forall\ S\subset N,\ \left|S\right|\geq2$$

Binary constraints,
Arc-use:

$$\small\quad\quad\quad x_{ij\ }\in\left\{0,1\right\}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \forall\ \left(i,j\right)\in A$$

Where,
Tail node function: return all tail (predecessor) nodes for a given node

$$\small\quad\quad\quad T_j\ =\left\{i;\left(i,j\right)\in A\right\}$$

Head node function: return all head (successor) nodes for a given node

$$\small\quad\quad\quad H_j=\left\{k;\left(j,k\right)\in A\right\}$$

Solution methodology

The Traveling Salesman Problem falls in the category of combinatorial optimization problem, wherein the aim is to find “best combination of objects from a given set of objects to achieve a certain objective given certain constraints”. In TSP, this pertains to finding the best combination of node visits (least cost route) such that every node is visited exactly once. And while this problem may seem simple at first, a mere 10 node problem has 10! = 3628800 possible solutions with 90 decision variables in the above developed formulation. Thus, instead of looking for the best solution (global optimal), we often limit our search to a good solution (local optimal). To this end, we employ heuristics which are essentially shortcuts to develop a solution. These heuristics can be classical heuristics that develop solution based on certain principles and rules, or meta-heuristics that guide classical heuristics to develop an even better solution based on certain high-level guiding principle. In the context of TSP, one such classical heuristic is the nearest neighborhood heuristic which builds solution by iteratively visiting the next nearest customer node until all customer nodes are visited. However, classical heuristics are incapable of searching the solution space, and tend to give poor quality solutions. And that’s where meta-heuristics come in. As described above, meta-heuristics guide classical heuristics to develop an even better solution based on certain high-level guiding principle. The guiding principles are often inspired by the natural environment, such as physics/chemistry-based (Simulated Annealing, Spiral Optimization, etc.), biology-based (Ant Colony Optimization, Firefly Algorithm, etc.), evolution-based (Genetic Algorithm, Evolutionary Programming, etc.) while other algorithms include Tabu Search, Guided Local Search etc.

For my work, I employed the Adaptive Large Neighborhood Search (ALNS) meta-heuristic which works on destroy and repair principle and thus iteratively removes and re-inserts nodes into the solution using certain removal and insertion heuristics. In every iteration of the ALNS meta-heuristic, one removal heuristic and then one insertion heuristic is chosen “adaptively”, i.e., based on their performance in improving the quality of the solution in previous iterations. The removal heuristic first removes a “large” chunk of nodes from the solution (10% to 40%) and the insertion heuristic then re-inserts these open nodes back into the solution thus “searching” through the “neighborhood” (solution space). Hence, the name – Adaptive Large Neighborhood Search.

The removal heuristics include,
– random node removal: removes customer nodes randomly from the solution
– related node removal: for a randomly selected customer node, removes most related* customer nodes
– worst node removal: removes customer nodes that render highest reduction in cost on removal
until n (10% to 40%) customer nodes have been removed from the solution
* In the context of TSP, relatedness of two customer nodes is defined by their proximity.

The insertion heuristics on the other hand compute cost of inserting an open customer node at every possible position in the solution, thereby identifying least insertion cost and corresponding best insertion position for every open customer node*, and then,
– best insertion: inserts randomly selected open customer node
– greedy insertion: inserts customer nodes in increasing order of least insertion cost
– regret insertion: iteratively inserts customer nodes with highest regret cost**
at their best positions until all open customer nodes have been inserted into the solution.
* Note, after every iteration of the insertion heuristic, the least insertion cost and corresponding best insertion position for the remaining open customer nodes needs to be re-computed.
** Notice how best and greedy insertion heuristics are myopic in nature. The last few open customer nodes inserted using these methods tend to have significantly high insertion cost due to the solution being nearly full (almost complete). Regret insertion on the other hand accounts for inserting an open customer node at kth (typically k = 2, 3) best position instead of its best position, and in doing so, it inserts the open customer node with the highest regret cost.

Thus, using the above mentioned removal and insertion heuristics, in every iteration of Adaptive Large Neighborhood, the current solution is destroyed and repaired to create a new solution. If the newly created solution is better than the current solution, then this new solution is set as the current solution. If the newly created solution is also better than the best solution, then this new solution is set as the best solution. However, if the newly created solution is worse than the current solution, then only with certain (low) probability the new solution is set as the current solution. Performance scores for the removal and insertion heuristic used in this iteration are updated based on the quality of the newly created solution in comparison to the quality of current solution and the best solution.

In addition to the removal and insertion heuristic, the ALNS meta-heuristic employs local search heuristics after every few iterations. These local search heuristics make “small” adjustments to the solution, unlike the removal and insertion heuristics which make “large” changes to the solution. These local search heuristics include,
– move: randomly select a customer node and move it to its best position in the solution
– 2-opt: randomly select 2 arcs and reconfigure them
(for instance, arcs A—>B and I—>J in A—>B—>C…H—>I—>J could be reconfigured as A—>I—>H…C—>B—>J)
– swap: randomly select 2 customer nodes and swap their positions.

Developing the solution to the Traveling Salesman Problem

To develop the Adaptive Large Neighborhood Search meta-heuristic with removal, insertion, and local search heuristics for the Traveling Salesman Problem, I used Julia Programing Language, a relatively new programing language (developed in 2012) perfectly suited for computational science. Interested readers can checkout the GitHub Repo – Traveling Salesman Problem.

Now lets solve the 100 customer node Traveling Salesman Problem introduced at the start of this blog. Recall, given a depot node and a set of customer nodes, the objective of a Traveling Salesman Problem is to determine the optimal order of visits to the customer nodes that results in least cost. The Adaptive Large Neighborhood Search meta-heuristic begins with a completely randomized initial solution amounting to a total cost of 3518.0 monetary units. In every iteration then, the ALNS meta-heuristic destroys a large chunk of the solution and consequently repairs to create a new solution. This destroy and repair principle can be seen in action below. Finally, after only a few iterations of ALNS meta-heuristic, a good solution to the problem (locally optimal) can be achieved with a total cost of 659.4 monetary units.

Going back to our discussion in the early part of this blog, how does your rule of thumb compare to the solution developed here? In most cases, I believe we tend to create myopic and greedy heuristics to begin with. Like with nearest neighborhood heuristic, if you chose to build solution by iteratively visiting the next nearest customer node, then the final few open customer nodes tend to be located quite further away resulting in a high cost solution. As we discussed earlier, this is precisely why we employ meta-heuristics – high-level principles to guide heuristics to build an even better solution.

In the next blog, we will take the problem one level up and explore another commonly studied problem in logistics, particularly urban logistics – the Vehicle Routing Problem (VRP).