Normal score transformation

Some interpolation and simulation methods require the input data to be normally distributed (refer to Examine the distribution of your data for a list of these methods). The normal score transformation (NST) is designed to transform your dataset so that it closely resembles a standard normal distribution. It does this by ranking the values in your dataset from lowest to highest and matching these ranks to equivalent ranks generated from a normal distribution. Steps in the transformation are as follows: your dataset is sorted and ranked, an equivalent rank from a standard normal distribution is found for each rank from your dataset, and the normal distribution values associated with those ranks make up the transformed dataset. The ranking process can be done using the frequency distribution or the cumulative distribution of the datasets.

Examples showing histograms and cumulative distributions before and after a normal score transformation was applied are shown below:

Histograms before and after a normal score transformation
Histograms before and after a normal score transformation

Cumulative distributions before and after a normal score transformation
Cumulative distributions before and after a normal score transformation

Approximation methods

In Geostatistical Analyst, there are three approximation methods: direct, linear, and Gaussian kernels. The direct method uses the observed cumulative distribution, the linear method fits lines between each step of the cumulative distribution, and the Gaussian kernels method approximates the cumulative distribution by fitting a linear combination of component cumulative normal distributions. After the Geostatistical Wizard makes predictions on the transformed scale, it automatically transforms them back to the original scale.

The choice of approximation method depends on the assumptions you are willing to make and the smoothness of the approximation. The direct method is the least smooth and has the fewest assumptions, the linear method is intermediate, and the Gaussian kernels method has the smoothest reverse transformation but assumes that the data distribution can be approximated by a finite mixture of normal distributions. If this assumption is valid, the Gaussian kernels method produces the best results.


6/24/2013