Banner image with photos
Basic Geostats

 Print this page

Geostatistics is a relatively new and rapidly growing area in engineering, the earth sciences, and applied mathematics. The field is devoted to the application of statistical techniques in the study of spatially variable phenomena. Although geostatistics was first developed to improve ore reserve estimation in a mining context, it has grown in application to many other areas of the earth sciences.

A brief history
In the late 1960's early 70's Georges Matheron began the use of geostatistics by applying it to the mining industry. Matheron was a professor as the School of Mines in Paris, France. By the late 70's a graduate program was established at the school. Two of Matheron's most noted students were Andre Journel, who now is working with the Department of Petroleum Engineering at Stanford University, and Michel David, who is now with the Ecole Polytechnique in Montreal.

Today geostatistics is used around the world for mining techniques and petroleum reservoir calculations. Annual meeting are held to discuss new techniques and applications for geostatistics. These meetings have been held both is the U.S. and in Canada. Recently a meeting was held in Guanajuato, Mexico. Also, every four years an International Geostatistics Congress is held. The first of these was in Rome, Italy in 1975.

Geostatistics has been greatly simplified by the modern computer. Several computer programs have been written to aid people working with geostatistics. GEO-EAS was the first program available it was a DOS program but was deemed very user friendly. In 1992 Andre Journel and Clayton Deutsch wrote GSLIB. Current versions of this program are still available for use today. In 1996 Yvan Pannatier published VARIOWIN. VARIOWIN is much like the old GEO-EAS but has a much larger storage capacity.

Geostatistical Concepts
There are a few key concepts in modern geostatistics. These include:

  • Random Function Model
  • Stationarity
  • Declustering
  • Variograms
  • Kriging
  • Simulation

Random Function Model
A random variable (RV), Z (denoted by upper case letter), is a variable that can take a series of possible values as characterized by a probability density function (pdf) or equivalently a cumulative distribution function (cdf). Spatial dependence of the RV is denoted by Z(u), where u is a location within the domain A. A particular outcome at some location u is denoted by z(u) (lower case letter). The cdf characterizing the uncertainty in a RV Z(u) is:

A random function (RF) is the set of dependent random variables, {Z(u), u Î A }. The RF is defined by its spatial law.

where N is the number of locations in domain A.

Stationarity
The concept of stationarity is critical to inference. Stationarity allows the geostatistician to extend his exploratory statistical analysis from limited data to the entire domain of interest. It is not a property of the random function model, but a decision made by the modeler that assumes invariance of the multivariate distribution over the domain.

Declustering
Sample data are used to construct a global stationary histogram. Statistical measures of the distribution are estimated from the histogram; however, when sampling is spatially biased the histogram must be corrected. Declustering corrects for preferential sampling. In mining, the high valued zones or the area that will be produced first is of economic importance and will be sampled more closely. Declustering can be performed in several ways:

Polygonal declustering works by volume of influence of each sample. The denser the sampling in a given zone, the less influence each sample will have. Boundaries must be known to apply this method.

Cell declustering handles the problem of the domain boundary. The domain is divided into cells or blocks that receive equal weight; every sample is assigned the same weight within the cell. The more samples in a cell, the less influence they will have in the global statistics. The question is how to pick a cell size. Common practice is to run the algorithm with several cell sizes and select the one that minimizes the mean.

Ordinary kriging weights can be used as a measure of influence. The samples that have the higher influence in estimating the points in the domain will have higher kriging weights. The sum of the weights assigned to a given sample will be standardized and used as its weight. The advantage of this method for declustering is that it accounts for the configuration of the data and the spatial continuity.

Declustering only changes the weight of the sample values in terms of its probability in the global distribution, but does not change the value itself. Therefore, these techniques will not be able to correct for sampling that did not cover the entire range of the variable.

Variograms
The most important bivariate statistic used in geostatistics is the variogram. The experimental variogram is estimated as half the average of squared differences between data separated exactly by a distance vector h. In practice, we define angle and lag tolerances so that we can find a reasonable number of pairs approximately h apart The number of lags, lag separation distance and tolerances (vertical and horizontal angles and band widths) may help to get a reliable estimate of the variogram, although this is not always possible. Bad choices will generate noisier plots that are not representative of the underlying population.

Variograms must be modelled to be incorporated to estimation or simulation algorithms. Models are considered licit if they are positive-definite, that is if they are a valid measure of distance. The positive-definiteness constraint ensures that the estimation variance will be positive or zero. Otherwise the mathematical model would not be valid since the variance must be non-negative, by definition. When more than one variable exist cross-variograms measure their relationship in space, that is, how similar the variables at two locations are.

Modelling variograms and cross-variograms is even more demanding. A valid model of coregionalization is required. This means that direct and cross variogram models must be consistent with each other and provide a measure of spatial correlation that makes physical sense. The positive definiteness condition ensures that when solving a cokriging system the estimation variance is positive.

Kriging
Kriging is an optimization technique consisting of a class of linear regression algorithms used in spatial estimation. In estimating unsampled locations, weights are assigned to the known dataset and the estimate is a linear combination of the sample data values.

This class of algorithms consists of many different “flavours” of kriging: simple, ordinary, block, cokriging, disjunctive, universal, MultiGaussian, etc. As its name suggests, the most basic form of kriging is simple kriging (SK), wherein there are no constraints on the assigned values of the weights. Another commonly used form of kriging is ordinary kriging (OK), a technique in which the weights are constrained to sum to 1. Similar to OK, the other techniques are variations of the SK method and may account for potential trends in the data, volumesupport of the data, etc.

Simulation
For numerical modeling of natural phenomena, there are numerous simulation algorithms that are available. These techniques include Gaussian, indicators, p-field, direct, simulated annealing, etc.

Gaussian simulation is one of the simplest geostatistical simulation algorithms, and for this reason, it is the most commonly used method in practice. This technique is dependent on the characteristics of a Gaussian or normally distributed variable and an assumption of multiGaussianity. This assumes that the multivariate distribution is also Gaussian.

Indicators
Outside of the Gaussian framework, indicator approaches are among the most commonly used methods. The non-parametric formalism of indicators was introduced in 1983 by A. G. Journel. This method permits the direct estimation of the conditional distribution at an unsampled location, that is, its distribution of uncertainty. It permits the random variable to have different spatial continuity for high and low values.

The indicator formalism requires the data to be coded directly as probabilities. A conditional cumulative distribution function is obtained by kriging the indicator transforms of the data at several thresholds at a location of interest. Simulation can be performed by including the previously simulated nodes into the conditioning information, and drawing from the distribution function. Several important advantages are derived from this basic idea of directly estimating the probabilities:

For an article on geostatistics written by Dr. Deutsch in Academic Press Encyclopedia:

Links

  • Basic Geostatistics
  • Home  >    > Basic Geostats   
    University of Alberta logo