Tabu search – theory and implementation
In this post, I discuss the multi-objective Tabu search for gradient-free solution of typical turbomachinery optimisation problems. There are already many good references outlining this algorithm, but I could not find a reusable Python implementation readily available. After a motivation of why Tabu is a good choice of algorithm, I describe the how the search works and present highlights from my implementation. The article concludes with results from a test problem. The full code is on my sourcehut.
Motivation
Any engineering design process can be formulated as an optimisation problem. We seek a point in the design space that maximises, say, performance or reliability while minimising cost and complexity; while we use the laws of physics to predict each of these factors, external human judgement determines their relative importance in a given situation. There are a large number of possible degrees of freedom available to the designer of a machine, but in practice they will select a much smaller parametrisation that remains flexible enough to navigate the design space and find the optimum.
Moving to the domain of computational fluid dynamics based turbomachinery design, we have more specific considerations. Each evaluation of the objectives, by solving the fluid flow equations, is rather expensive. Aerodynamic performance and mechanical constraints are in direct conflict, leading to a constrained design space.
The key to a successful outcome in computational engineering is not to reinvent the wheel, but to know and deploy the most suitable state-of-the-art algorithms for the problem at hand, developed by other people. Here, a multi-objective Tabu search fits our needs because it,
- yields the entire Pareto front, enabling flexible judgement of trade-offs;
- does not require gradients, which are expensive and noisy;
- naturally handles constraints;
- scales well to multi-dimensional design spaces.
As an aside, I’m not convinced about adjoint methods or the present fashion for neural network surrogate models. One needs extremely well-converged solutions to get consistent results from the adjoint equations, and I worry about local minima (although the prevalence of local minima in a typical turbine blade design is an open question). Rather than spend PhD-student hours implementing and debugging the additional complexity of a surrogate model, I would prefer to spend core hours brute-forcing a more exhaustive search.
Algorithm
We parametrise the design space using an \(n\)-dimensional vector \(\newcommand{\bv}[1]{\mathbf{#1}}\bv{x}\), and seek to optimise for the minimum of an \(m\)-dimensional objective \(\bv{y} = f(\bv{x})\). The design vector \(\bv{x}\) may be subject to any relevant constraints. In the scalar case of \(m=0\), ranking of candidate optimal \(y\) values is trivial. In the multiobjective case, we are interested in Pareto fronts. When comparing two candidate objectives, \(\bv{y}_0\) and \(\bv{y}_1\), there are three possible cases:
- If every element of \(\bv{y}_0\) is less than \(\bv{y}_1\), then \(\bv{y}_0\) dominates \(\bv{y}_1\);
- If every element of \(\bv{y}_0\) is greater than \(\bv{y}_1\), then \(\bv{y}_0\) is dominated by \(\bv{y}_1\);
- Otherwise, \(\bv{y}_0\) and \(\bv{y}_1\) are Pareto-equivalent and form two points on a Pareto front.
Before starting the search, we initialise the working variables:
- Local search counter \(i\) holding the number of iterations since the last improvement in the value of the objective;
- Current step sizes \(\Delta\bv{x}\), a vector with \(n\) elements, one for each design parameter;
- Short-term memory \(\newcommand{\Mem}[1]{\mathcal{M}_\mathrm{#1}}\Mem{sht}\), to store a fixed number of the most recently-visited \((\bv{x}, \bv{y})\) points which are Tabu;
- Medium-term memory, \(\Mem{med}\), to store a set of current near-optimal \((\bv{x}, \bv{y})\) points. For \(m=1\), a scalar objective, \(\Mem{med}\) is is a ranked list containing a fixed number of the best points seen so far. For \(m>1\), multidimensional objectives, \(\Mem{med}\) is a Pareto front;
- Long-term memory \(\Mem{lng}\), to record all \((\bv{x}, \bv{y})\) points visited by the optimiser.
Starting from an initial point \((\bv{x}_0,\bv{y}_0)\), at each iteration:
- Generate a set of \(2n\) candidate moves by perturbing the \(n\)th element of \(\bv{x}_0\) by \(\pm\Delta x_n\);
- Remove Tabu points, either which have been recently visited and exist in \(\Mem{sht}\), or which violate a constraint;
- Evaluate the objective function over non-Tabu candidate moves, reusing previous results from \(\Mem{lng}\) for old points that have been seen before;
- Store all new evaluations in \(\Mem{lng}\). If we have found any new near-optimal points, add them to \(\Mem{med}\) and reset \(i=0\), otherwise increment \(i\);
- What happens now depends on the value of \(i\):
- Until \(i\) reaches a certain threshold, the algorithm performs a local search, choosing the next point \(\bv{x_1}\) as the best of the current candidate moves;
- If \(i = i_\mathrm{diversify} \approx 10\), that is, the optimiser has not found any improvements for 10 moves, we conclude we are stuck in a local minima. To diversify the search, we generate a random \(\bv{x}_1\) in a sparse region of \(\Mem{lng}\);
- If \(i = i_\mathrm{intensify} \approx 20\), and the optimiser has still found no improvements, select a near-optimal point from \(\Mem{med}\) as \(\bv{x}_1\). This intensification step biases the search towards known-good neighbourhoods, while giving the optimiser flexibility to explore the design space;
- If \(i = i_\mathrm{restart} \approx 40\), we conclude that the design space has been sufficiently mapped, and no improvement is possible at our current resolution. So reduce the step \(\Delta\bv{x}\) by a factor of, say, 2. Restart the search taking a near-optimal next point \(\bv{x}_1\) from \(\Mem{med}\);
- Add the next point, \(\bv{x}_1\) to the Tabu list \(\Mem{sht}\);
- Halt if the step size \(\Delta\bv{x}\) reduces below the desired resolution, otherwise replace \(\bv{x}_0 \leftarrow \bv{x}_1\) and return to step 1.
Implementation
Memories
First of all, we need a memory data structure. This is a prime example of
something that should be implemented as a class: a tight collection of related
data and a few operations that we want to apply to that data. Classes seem most
natural to me when they act as more specialised versions of built-in data
types, such as the list
in Python.
My instinct is to preallocate matrices (not very Pythonic, I know) for the
design vectors and objective functions, \(\bv{X}\) and \(\bv{Y}\), where
each row represents a new evaluation. Before the memory fills up, we do not
want to expose empty rows outside of the class. So we keep track of the number
of occupied rows using npts
and apply the @property
decorator to only
return that many rows when accessing the X
and Y
attributes.
import numpy as np
class Memory:
def __init__(self, nx, ny, max_points, tol=None):
"""Store a set of design vectors and their objective functions."""
# Record inputs
self.nx = nx
self.ny = ny
self.max_points = max_points
self.tol = np.array(tol)
# Initialise points counter
self.npts = 0
# Preallocate matrices for design vectors and objectives
# Private because the user should not have to deal with empty slots
self._X = np.empty((max_points, nx))
self._Y = np.empty((max_points, ny))
# Public read-only properties for X and Y
@property
def X(self):
"""The current set of design vectors."""
return self._X[: self.npts, :]
@property
def Y(self):
"""The current set of objective functions."""
return self._Y[: self.npts, :]
Elemental operations like Memory.add()
, Memory.get()
, Memory.delete()
are
relatively straightforward to implement.
One of the more frequent operations is to check is a given design point is
already in the memory, i.e. we have evaluated it before or it is on the Tabu
list in short term memory. This amounts to looking for matches of a test vector
against each row of a matrix, something like the Matlab ismember
function
with the rows
flag set. There isn’t a Numpy builtin for this, so we roll our
own,
def find_rows(A, B, atol=None):
"""Get matching rows in matrices A and B.
Return:
logical same shape as A, True where A is in B
indices same shape as A, of the first found row in B for each A row."""
# Arrange the A points along a new dimension
A1 = np.expand_dims(A, 1)
# nA by nB logical, True where all columns elements match
if atol is None:
# Test for equality
b = (A1 == B).all(axis=-1)
else:
# Test for equality within tolerance
b = np.isclose(A1, B, atol=atol).all(axis=-1)
# A index is True where it matches any of the B points
ind_A = b.any(axis=1)
# Use argmax to find first True along each row
loc_B = np.argmax(b, axis=1)
# Where there are no matches, override to a sentinel value -1
loc_B[~ind_A] = -1
return ind_A, loc_B
There are a couple of useful tricks here. expand_dims
, being the opposite of
squeeze, is a useful function to know. Numpy does not really have an equivalent
of the find
function in Matlab, so we mildly abuse the fact that, where
elements are equal, argmax
returns the lowest index. We can now implement
Memory.contains()
, returning true if a design vector is in memory; and
Memory.lookup()
to return the objective function for a design vector that has
already been evaluated.
def contains(self, Xtest):
"""Boolean index for each row in Xtest, True if x already in memory."""
if self.npts:
return find_rows(Xtest, self.X, self.tol)[0]
else:
return np.zeros((Xtest.shape[0],), dtype=bool)
def lookup(self, Xtest):
"""Return objective function for design vector already in mem."""
# Check that the requested points really are available
if np.any(~self.contains(Xtest)):
raise ValueError("The requested points have not been previously evaluated")
return self.Y[find_rows(Xtest, self.X)[1]]
During the search, we want to keep the medium-term memory as a set of
near-optimal points: a fixed number of the best seen scalar objectives, or a
Pareto front for vector objectives. Building off our elemental operations,
methods for these are not too difficult. Memory.update_front()
is a marvel
of logical indexing!
def update_best(self, X, Y):
"""Add or remove points to keep the best N in memory."""
X, Y = np.atleast_2d(X), np.atleast_2d(Y)
# Join memory and test points into one matrix
Yall = np.concatenate((self.Y,Y),axis=0)
Xall = np.concatenate((self.X,X),axis=0)
# Sort by objective, truncate to maximum number of points
isort = np.argsort(Yall[:,0],axis=0)[:self.max_points]
Xall, Yall = Xall[isort], Yall[isort]
# Reassign to the memory
self.npts = len(isort)
self._X[: self.npts] = Xall
self._Y[: self.npts] = Yall
return np.any(self.contains(X))
def update_front(self, X, Y):
"""Add or remove points to maintain a Pareto front."""
Yopt = self._Y[: self.npts]
# Arrange the test points along a new dimension
Y1 = np.expand_dims(Y, 1)
# False where an old point is dominated by a new point
b_old = ~(Y1 < Yopt).all(axis=-1).any(axis=0)
# False where a new point is dominated by an old point
b_new = ~(Y1 >= Yopt).all(axis=-1).any(axis=1)
# False where a new point is dominated by a new point
b_self = ~(Y1 > Y).all(axis=-1).any(axis=1)
# We only want new points that are non-dominated
b_new_self = np.logical_and(b_new, b_self)
# Delete old points that are now dominated by new points
self.delete(~b_old)
# Add new points
self.add(X[b_new_self], Y[b_new_self])
# Return true if we added at least one point
return (np.sum(b_new_self) > 0)
Finally, we need a few methods to sample points from the memory. We can either
pick a random point, Memory.sample_random()
, or bin each component of the
design vector into a histogram and pick the sparsest bin,
Memory.sample_sparse()
. To really diversify the search, we can generate an
entirely new point in a sparse region of the design space by sampling from a
uniform distribution of \(x\) in the sparsest bin,
Memory.generate_sparse()
.
Main loop
Once the memory structures are in place, it seems natural to make a
TabuSearch
class to hold all three memories, the objective function callable,
algorithm parameters, etc.
class TabuSearch:
def __init__(self, objective, constraint, nx, ny, tol):
"""Maximise an objective function using Tabu search."""
# Store objective and constraint functions
self.objective = objective
self.constraint = constraint
# Store tolerance on x
self.tol = tol
# Default memory sizes
self.n_short = 20
self.n_long = 20000
self.nx = nx
self.ny = ny
self.n_med = 2000 if ny>1 else 10
# Default iteration counters
self.i_diversify = 10
self.i_intensify = 20
self.i_restart = 40
self.i_pattern = 2
# Misc algorithm parameters
self.x_regions = 3
self.max_fevals = 20000
self.fac_restart = 0.5
self.fac_pattern = 2.0
# Initialise counters
self.fevals = 0
# Initialise memories
self.mem_short = Memory(nx, ny, self.n_short, self.tol)
self.mem_med = Memory(nx, ny, self.n_med, self.tol)
self.mem_long = Memory(nx, ny, self.n_long, self.tol)
self.mem_all = (self.mem_short, self.mem_med, self.mem_long)
I have skipped over most of the TabuSearch
methods, as they are not all that
interesting, but the main loop in TabuSearch.seach()
is the code embodiment
of the algorithm detailed above.
def search(self, x0, dx):
"""Perform a search with given intial point and step size."""
# Evaluate the objective at given initial guess point, update memories
y0 = ts.initial_guess(x0)
# Main loop, until max evaluations reached or step size below tolerance
i = 0
while self.fevals < self.max_fevals and np.any(dx > self.tol):
# Evaluate objective for all permissible candidate moves
X, Y = ts.evaluate_moves(x0, dx)
# Put new results into long-term memory
self.mem_long.add(X, Y)
# Put Pareto-equivalent results into medium-term memory
# Flag true if we successfully added a point
if self.ny==1:
flag = self.mem_med.update_best(X, Y)
else:
flag = self.mem_med.update_front(X, Y)
# Reset counter if we added to medium memory, otherwise increment
i = 0 if flag else i+1
# Choose next point based on local search counter
if i == self.i_restart:
# RESTART: reduce step sizes and randomly select from
# medium-term
dx = dx * self.fac_restart
if self.ny==1:
# Pick the current optimum if scalar objective
x1, y1 = self.mem_med.get(0)
else:
# Pick from sparse region of Pareto from if multi-objective
x1, y1 = self.mem_med.sample_sparse(self.x_regions)
i = 0
elif i == self.i_intensify or X.shape[0] == 0:
# INTENSIFY: Select a near-optimal point
x1, y1 = self.mem_med.sample_sparse(self.x_regions)
elif i == self.i_diversify:
# DIVERSIFY: Generate a new point in sparse design region
x1 = self.mem_long.generate_sparse(self.x_regions)
y1 = self.objective(x1)
else:
# Normally, choose the best candidate move
x1, y1 = ts.select_move(x0, y0, X, Y)
# Check for a pattern move every i_pattern steps
if np.mod(i, self.i_pattern):
x1 = ts.pattern_move(x0, y0, x1, y1)
# Add chosen point to short-term list (tabu)
self.mem_short.add(x1)
# Update current point before next iteration
x0, y0 = x1, y1
# After the loop return current point
return x0, y0
Test problem
To check everything is working correctly, we can apply the algorithm to a benchmark problem from the literature. This one, taken from Deb et al. (2002), is known as ‘CONSTR’. Minimise, \[ \bv{y} = f(\bv{x}) = \begin{bmatrix} x_1 \\ (1+x_2)/x_1\end{bmatrix}\ , \] subject to the constraints, \[ \begin{aligned} &0.1 \le x_1 \le 1.0 \ , \\ &0 \le x_2 \le 5 \ , \\ &9x_1 + x_2 \ge 6 \ , \\ &9x_1 - x_2 \ge 1 \ . \end{aligned} \]
In code, these look like,
def objective(x):
return np.column_stack((x[:, 0], (1.0 + x[:, 1]) / x[:, 0]))
def constrain(x):
return np.all(
(
x[:, 1] + 9.0 * x[:, 0] >= 6.0,
-x[:, 1] + 9.0 * x[:, 0] >= 1.0,
x[:, 0] >= 0.0,
x[:, 0] <= 1.0,
x[:, 1] >= 0.0,
x[:, 1] <= 5.0,
),
axis=0,
)
and we can run the search with just a few more lines,
x0 = np.atleast_2d([.8,6.]) # Initial guess
dx = np.array([0.2,2.]) # Initial step sizes
tol = dx/64. # Tolerance to halt optimisation
nx = 2 # Number of design parameters
ny = 2 # Number of objectives
ts = TabuSearch(objective, constrain, nx, ny, tol)
x_opt, y_opt = ts.search(x0, dx)
By plotting the medium and long term memories at each loop iteration, we can observe the progress of the optimiser exploring the Pareto front.
Epilogue
I discussed why the Tabu search is particularly suitable for aerodynamic optimisation problems, outlined the algorithm and presented the more interesting parts of my implementation. This post might be useful as a brief introduction for those new to the topic. More importantly, optimisation is a good tool to have in my toolbox, and by coding it up and writing this post the algorithm is ready for me to deploy on new problems in the future.