Spatial Interpolation: An Introduction to Kriging

History and Theory

Danie Krige, credit to www.quazoo.com

Danie Krige, credit to www.quazoo.com

In 1951 Danie Krige, a South African statistician, considered the problem of predicting concentration values for gold deposits in Witwatersrand given the location and value of boring samples.  Creating boreholes in the ground is rather expensive (in time and money) so we have the classic mathematician's goal:  extrapolating the most information from doing the least amount of work (drilling the least amount of holes). 

Figure 1: The borehole problem: boreholes are made in a region for the purpose of measuring where and how much gold is present. The height of the stems give the amount of gold present. Scales and units omitted for brevity. 

Figure 1: The borehole problem: boreholes are made in a region for the purpose of measuring where and how much gold is present. The height of the stems give the amount of gold present. Scales and units omitted for brevity. 

Now Krige new the first law of geography 18 years before it was written down: 

Everything is related to everything else – but near things are more closely related than distant things
— Waldo Tobler

The great contribution of Krige and later Georges Matheron in a more technical sense, was to translate this quote into mathematics. Consider the hard-earned boreholes of Figure 1 as given data pairs: (x,y) = (location, amount of gold); notice that the location x can be coordinates in 1,2, or 3 dimensional space. Let me write down some assumptions about this data: 

  1. The amount of gold follows a known average spatial trend.  (Let's say from empirical studies or from assumed regressions) 
  2. Deviations from this average trend are normally distributed and depends on how far away from other gold you stand.

How do you declare dependence on how far away something is? Well the distance between two points in space is measured by Pythagoras, since he isn't around much anymore use his formula:  distance between a and b  =  ||a - b||. So we need a function which depends on ||a-b|| to describe the spatial correlations of the deviations in assumption 2.   Here comes the technical framework; consider the mean trend as a linear regression of known functions in a vector p and deviations from such as a random process z which has a correlation function R which depends on the distance between two points

Krig_1.PNG

Notice that the spatial correlation is an a priori choice and ends up significantly effecting performance of this method. A theoretically tractable choice is the classic Gaussian relationship denoted below. 

Krig_3.PNG

A graph of this choice is seen on the right side for a particular choice of the shape parameter epsilon. This choice ends up being vital as well as it changes how sharply the dependence from the gold drops off; there are ways to statistically pick this value (Cross-Validation, Likelihood Maximization, etc. ) which are omitted here. 

The spatial correlation function R as a function of distance and the shape parameter is 3. A mathematical representation of the 1st law of geography.  

The solution to the statistical model posed boils down to a general weighted least squares solution.  For detailed codes, visit my codes section.

Numerical Experiment in 1D

Consider the following 1 dimensional test problem, the circles are given data and we are to approximate by Kriging the underlying red-lined function.  The Kriging interpolant was calculated and is shown on the right hand side below. 

The 1D test situation. The red line is the "truth" with red cirlces the location of the "gold boreholes" 

The 1D test situation. The red line is the "truth" with red cirlces the location of the "gold boreholes" 

The blue line is the  Kriging prediction, notice it perfect interpolates the red line at the data sites. 

The blue line is the  Kriging prediction, notice it perfect interpolates the red line at the data sites. 

As we are solving a statistical model, we can also include a confidence estimation of the prediction by assuming normality and calculating a usual z-score: 

The same figure above but with confidence intervals showing 2 standard deviations in either direction from the Kriging prediction. 

The same figure above but with confidence intervals showing 2 standard deviations in either direction from the Kriging prediction. 

Numerical Experiment in 3D

To stay true to the original gold problem explored by Krige, viability of the method should be explored while considering 3D coordinates as the data.  In practice, using vectorized MATLAB code makes this equivalent to solving the 1D case.  Below is shown the original data set from Figure 1 with the Kriging solution at the right. 

A three dimensional data set shown in red stems over the true surface to be estimated. 

A three dimensional data set shown in red stems over the true surface to be estimated. 

The Kriging prediction of the test surface using the red data points in 3D space.

The Kriging prediction of the test surface using the red data points in 3D space.


References

  1. Meshfree Approximation Methods with MATLAB, Fasshauer (2007)
  2. Interpolation of spatial data: some theory for kriging, Stein (2012)