Skip to main content

James Brind

2D Gaussian process regression in scikit-learn

Programming something new is always easier if you have a working example of something similar. Recently, I went searching for an example of multi-dimensional Gaussian process regression in scikit-learn, but all I could find in their docs and elsewhere online were one-dimensional problems. This post plugs that gap.

After a brief primer on the theory involved, I will walk through a Python script that fits a Gaussian process to a two-dimensional function. Full code is available from from my sourcehut.

Theory

Gaussian process regression works by applying Bayes Theorem to a distribution of functions. First we establish some priors on the functions that might fit our data: a mean, a variance, and a degree of smoothness over a given length scale. Second, given some observed data points with a certain noise level, we apply Bayes Theorem to obtain a new posterior distribution of functions. The posterior distribution of functions will have a mean value that passes through the data, and a reduced variance near to data points. Finally, we use the posterior mean to predict unseen samples, and interpret the posterior variance as a measure of uncertainty in those predictions.

The posterior distribution can be evaluated analytically, however, for \(N\) data points it requires the expensive inversion of an \(N\)-by-\(N\) matrix. In practice, the length scale and noise level are unknown, and selecting their values entails an optimisation process with many posterior evaluations. Therefore, Gaussian process regression does not scale well to large data sets.

Gaussian processes are non-parametric models, so our priors are the only assumptions we have to make about the form of the function. For example, linear regression is less flexible — we must stipulate a linear trend which will not fit a quadratic function well.

A two-dimensional example

Suppose we have a few samples of a function of two variables \(z(x,y)\). We wish to interpolate \(z\) using Gaussian process regression over the available data. For this example, we consider the quadratic surface, $$ z(x,y) = (x-3)^2 + 2xy + (2y+3)^2 - 3\ . $$

Preliminaries

Before we begin, we will need some imports and a deterministic random seed:

import numpy as np

import matplotlib.pyplot as plt
import matplotlib.lines as mlines

from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, WhiteKernel
from sklearn.model_selection import train_test_split

np.random.seed(17)  # So you can reproduce these results at home

Let’s start by evaluating \(z\) over a regular grid to represent our ground truth:

nv = 20  # Number of points in each direction
xv = np.linspace(-5.,5.,nv)  # x-vector
yv = np.linspace(-5.,5.,nv)  # y-vector
x, y = np.meshgrid(xv, yv)  # x and y are (nv, nv) matrices
z = (x-3.)**2. + 2.*x*y + (2.*y+3.)**2. - 3.

Now, real data may be corrupted by a degree of numerical or measurement error, which we model here as Gaussian noise added to the ground truth:

noise_level = 5.
z_noisy = z + np.random.normal(size=x.shape) * noise_level

scikit-learn expects each input and output as column vectors, so we must reshape the grid:

X = np.column_stack((x.reshape(-1), y.reshape(-1)))
Z = z_noisy.reshape(-1,1)

Performing the fit

Before going any further, we need to specify our prior functional distribution. The smoothness criterion is expressed in terms of a kernel, an object that encodes the desired correlation (or covariance) between pairs of points. We shall use a radial basis function RBF to ensure fits are infinitely differentiable, with unknown length scales for each of \(x\) and \(y\). Added to the RBF, we use a WhiteKernel with unknown noise level to represent uncertainty in the data and explain uncorrelated short length scale variations. For the ensuing fitting process, we must specify bounds and initial guesses for the length scales and noise levels. Passing a tuple of length scale guesses, rather than a scalar, in to the RBF constructor is all that is needed to extend the one-dimensional examples in the SciPy docs to two dimensions. In code, this all looks like:

guess_l = (2., 1.)  # In general, x and y have different scales
bounds_l = ((1e-1,100.),) * 2  # Same bounds for x and y
guess_n = 1.  # Amount of noise
bounds_n = (1e-20, 10.) # Bounds for noise
kernel = (  # Kernel objects can simply be summed using +
    RBF(length_scale=guess_l, length_scale_bounds=bounds_l)
    + WhiteKernel(noise_level=guess_n, noise_level_bounds=bounds_n)
)

Before fitting, we extract 5% of our grid of points to use as training samples, the remainder becoming test data:

X_train, X_test, Y_train, Y_test = train_test_split(X, Z, test_size=0.95)

At last, we can do the fitting. Passing normalize_y=True scales the function values to zero mean and unit variance before fitting. This is equivalent to choosing priors: the mean of \(z\) is equal to the training sample mean; and the variance of \(z\) is equal to the training sample variance.

gpr = GaussianProcessRegressor(kernel, normalize_y=True )
gpr.fit(X_train, Y_train)

We can inspect the fitted kernel parameters:

print(gpr.kernel_)
# RBF(length_scale=[7.86, 3.71]) + WhiteKernel(noise_level=0.00678)

The \(x\) length scale is about twice the \(y\) length scale, as might be expected from the form of \(z\). The noise level is smaller than we prescribed; more training data would give a better estimate of the noise level.

Plotting the results

With our fitted regressor object, we can evaluate the posterior mean and standard deviation over the input data, and convert back to a grid for plotting:

Zfit, Zstd = gpr.predict(X, return_std=True)
zstd = Zstd.reshape(x.shape)
zfit = Zfit.reshape(x.shape)

First, compare the true function with the fitted posterior mean:

# Set up figure
fig, ax = plt.subplots(figsize=(6., 4.))
ax.set_xlabel('$x$')
ax.set_ylabel('$y$')
ax.set_xlim((-5,5))
ax.set_ylim((-5,5))
ax.set_aspect('equal')
ax.set_xticks((-5,0,5))
ax.set_yticks((-5,0,5))
ax.grid(False)

# Do the plotting
lev = np.linspace(0.,250.,6)
ax.contour(x,y,z,lev,colors='k')  # Truth
ax.plot(*X_train.T,'o',color="C1")  # Training samples
ax.contour(x, y,zfit, lev, colors='C0', linestyles='dashed')  # Posterior mean

# Legend
truth_line = mlines.Line2D([], [], color='black', label='True $z(x,y)$')
sample_line = mlines.Line2D([], [], color='C1', marker="o", linestyle="none", label='Train samples')
mean_line = mlines.Line2D([], [], color='C0', linestyle="--", label='Posterior mean')
ax.legend(handles=[truth_line, sample_line, mean_line],bbox_to_anchor=(1.05, 1), loc="upper left")

# Write out
plt.tight_layout()
plt.savefig('gpr_posterior_mean.svg')

The figure shows good agreement between the contours.

Second, plot the variation in posterior standard deviation.

# Set up figure
fig, ax = plt.subplots(figsize=(6., 4.))
ax.set_xlabel('$x$')
ax.set_ylabel('$y$')
ax.set_xlim((-5,5))
ax.set_ylim((-5,5))
ax.set_aspect('equal')
ax.set_xticks((-5,0,5))
ax.set_yticks((-5,0,5))
ax.grid(False)

# Do the plotting
ax.plot(*X_train.T,'o',color="C1")  # Training samples
lev = np.linspace(6.,16.,11)
hc = ax.contourf(x, y, zstd, lev)  # Posterior std
for hci in hc.collections:
  hci.set_edgecolor("face")

# Colorbar
hcb = plt.colorbar(hc)
hcb.ax.grid(False)
hcb.set_label('Posterior standard deviation')

# Write out
plt.tight_layout()
plt.savefig('gpr_posterior_std.svg')

From the figure, uncertainty increases away from the training samples.

Bonus one-liner

Here is a shell command that extracts all code blocks from a markdown file and writes them out to a new Python script:

sed -n '/```/,/```/p' blog-post.md | grep -v '```' > script.py

I used this command to generate the script associated with this post on sourcehut.

References

  • Gaussian Processes for Machine Learning, Rasmussen and Williams (2006). Comprehensive exposition of the theory behind GPs. The mathematical detail is not that important for us, but the Preface and Introduction give a good background.
  • UQLab Kriging Manual. Chapter 1, Theory, is a more digestible summary of the mathematics and gives a clear description of the choices we make to use GPs in practice.
  • scikit-learn GP docs