# Estimating cross-covariance models for cokriging

When you have multiple datasets and you want to use cokriging, you need to develop models for cross-covariance. Because you have multiple datasets, you keep track of the variables with subscripts, with Z_{
k
}(**s**_{j}
) indicating a random variable for the *k*^{th} data type at location **s**_{i
}. The cross-covariance function between the *k*^{th} data type and the *m*^{th} data type is then defined to be

C_{km}(s_{i},s_{j}) = cov(Z_{k}(s_{i}), Z_{m}(s_{j})).

Here is a subtle and often confusing fact: C _{km
} (**s**_{i
} ,**s**_{
j
}) can be asymmetric: C _{
km
} (**s**_{
i
} ,**s**_{j
}) ≠ C _{
mk
} (**s**_{i
} ,**s**_{
j
}) (notice the switch in the subscripts). To see why, look at the following example. Suppose you have data arranged in one dimension, along a line, such as the following:

The variables for type 1 and 2 are regularly spaced along the line, with the thick red line indicating highest cross-covariance, the green line less cross-covariance, and the thin blue line the least cross-covariance, with no line indicating 0 cross-covariance. This figure shows that Z_{1}(**s**_{i
}) and Z_{2}(**s**_{j
}) have the highest cross-covariance when **s**_{
i
} = **s**_{j
}, and the cross-covariance decreases as **s**_{
i
} and **s**_{
j
} get farther apart. In this example, C _{km
} (**s**_{i
} ,**s**_{j
} ) = C _{mk
} (**s**_{
i
} ,**s**_{
j
} ). However, the cross-covariance can be "shifted":

Notice that C_{12}(**s**_{2},**s**_{3}) now has the minimum cross-covariance (thin blue line) while C_{21}(**s**_{2},**s**_{3}) has the maximum cross-covariance (thick red line), so here C _{km
} (**s**_{
i
} ,**s**_{j
}) ≠ C _{mk
} (**s**_{i} ,**s**_{j
}). Relative to Z_{1}, the cross-covariances of Z_{2} have been shifted -1 unit. In two dimensions, Geostatistical Analyst will estimate any shift in the cross-covariance between the two datasets if you click the shift parameters.

The empirical cross-covariances are computed as follows:

Average [ (z_{1}(s_{i}) - _{1}) (z_{2}(s_{j}) - _{2})]

where Z_{
k
}(**s**_{
i
}) is the measured value for the *k*^{th} data set at location **s**_{i
} ,_{k} is the mean for the *k*^{th} dataset, and the average is taken for all **s**_{i
} and **s**_{
j
} separated by a certain distance and angle. As for the semivariograms, Geostatistical Analyst shows both the empirical and fitted models for cross-covariance.

Choosing different cross-covariance models, using compound cross-covariance models, and choosing anisotropy will all cause the theoretical model to change. You can make a preliminary choice of model by seeing how well it fits the empirical values. Changing the lag size and the number of lags and adding shifts will change the empirical cross-covariance surface, which will cause a corresponding change in the theoretical model. Geostatistical Analyst computes default values, but you should feel free to try different values and use validation and cross-validation to choose the best model.