# How Maximum Likelihood Classification works

The algorithm used by the Maximum Likelihood Classification tool is based on two principles:

• The cells in each class sample in the multidimensional space being normally distributed
• Bayes' theorem of decision making

The tool considers both the variances and covariances of the class signatures when assigning each cell to one of the classes represented in the signature file. With the assumption that the distribution of a class sample is normal, a class can be characterized by the mean vector and the covariance matrix. Given these two characteristics for each cell value, the statistical probability is computed for each class to determine the membership of the cells to the class. When the default EQUAL A priori probability weighting option is specified, each cell is assigned to the class to which it has the highest probability of being a member.

If the likelihood of occurrence of some classes is higher (or lower) than the average, the FILE a priori option should be used with an Input a priori probability file. The weights for the classes with special probabilities are specified in the a priori file. In this situation, an a priori file assists in the allocation of cells that lie in the statistical overlap between two classes. These cells are more accurately assigned to the appropriate class, resulting in a better classification. This weighting approach to classification is referred to as the Bayesian classifier.

By choosing the SAMPLE a priori option, the a priori probabilities assigned to all classes sampled in the input signature file are proportional to the number of cells captured in each signature. Consequently, classes that have fewer cells than the average in the sample receive weights below the average, and those with more cells receive weights greater than the average. As a result, the respective classes have more or fewer cells assigned to them.

When a maximum likelihood classification is performed, an optional output confidence raster can also be produced. This raster shows the levels of classification confidence. The number of levels of confidence is 14, which is directly related to the number of valid reject fraction values. The first level of confidence, coded in the confidence raster as 1, consists of cells with the shortest distance to any mean vector stored in the input signature file; therefore, the classification of these cells has highest certainty. The cells comprising the second level of confidence (cell value 2 on the confidence raster) would be classified only if the reject fraction is 0.99 or less. The lowest level of confidence has a value of 14 on the confidence raster, showing the cells that would most likely be mis-classified. Cells of this level will not be classified when the reject fraction is 0.005 or greater.

## Example

The following example shows the classification of a multiband raster with three bands into five classes. The five classes are dry riverbed, forest, lake, residential/grove, and rangeland. An output confidence raster will also be produced. The input raster bands are displayed below.

 Example inputs to Maximum Likelihood Classification

The Maximum Likelihood Classification tool is used to classify the raster into five classes..

• Settings used in the Maximum Likelihood Classification tool dialog:

Input raster bands : redlands

Input signature file : wedit.gsg

Output multiband raster : mlclass_1

Reject fraction : 0.01

A priori probability weighting : EQUAL

Input a priori probability file : apriori_file_1

Output confidence raster : reject_ras

The classified raster appears as shown:

 Example output from Maximum Likelihood Classification

Areas displayed in red are cells that have less than a 1 percent chance of being correctly classified. These cells are given the value NoData due to the 0.01 reject fraction used. The dry riverbed class is displayed as white, with the forest class as green, lake class as blue, residential/grove class as yellow, and rangeland as orange.

The list below is the value attribute table for the output confidence raster. It shows the number of cells classified with what amount of confidence. Value 1 has a 100 percent chance of being correct. There are 3,033 cells that were classified with that level of confidence. Value 5 has a 95 percent chance of being correct. There were 10,701 cells that have a 0.005 percent chance of being correct with a value of 14.

```RECORD    VALUE    COUNT
1             1     3033
2             2     3061
3             3     9187
4             4    16717
5             5    37361
6             6   136420
7             7   269592
8             8   250863
9             9   105001
10           10    23598
11           11    11190
12           12    11546
13           13     3621
14           14    10701```

6/29/2011