Raster data statistics
Statistics are required for a raster dataset or mosaic dataset to perform some geoprocessing operations or certain tasks in ArcGIS Desktop applications (for example, ArcMap or ArcCatalog), such as applying a contrast stretch or classifying data.
For raster datasets, the statistical information, including a histogram, is stored in an associated auxiliary file if it cannot be stored internally. Once the auxiliary file has been created, the statistics within it will be reused for future procedures that require statistical information. Statistical information for mosaic datasets is stored internally. Statistics cannot be calculated on a raster catalog, but they can be calculated for each raster dataset it contains.
It is not essential to always calculate statistics; however, statistics need to be calculated before performing some geoprocessing analysis operations. In ArcMap and ArcCatalog, if statistics are not calculated, they will be calculated automatically when they are needed. For example, in ArcMap, when a raster dataset without statistics is first added to the data frame and statistics are needed to render the raster correctly, ArcMap functions will calculate default statistics and place these into an auxiliary file. If statistics are not calculated and you add your raster dataset or mosaic dataset to ArcMap, it may appear as a large black image.
You can modify the stretch parameters on the Layer Properties dialog box to use the statistics from the current display extent, or you can generate the statistics for the dataset. You can also modify the stretch parameters in the Display section on the Image Analysis Window. Creating statistics for rasters prior to their use in ArcMap is recommended so that you don't have to wait for statistics to be calculated when displaying the raster dataset.
The default display of a raster will be improved in most cases if statistics have already been calculated because a standard deviation stretch is applied to the raster if statistics are present. In other cases, if the data to be displayed shouldn't be displayed with stretch by default, it would be appropriate not to calculate statistics for the dataset.
If statistics do not exist, they can be created in ArcCatalog or the Catalog window or by using the Calculate Statistics tool. There are two sets of parameters you can specify when calculating statistics: a skip factor and values to ignore. Setting a skip factor speeds up the process of calculating statistics by skipping pixels. The default is a skip factor of 1 for both row and column, which means that every cell in the raster will be used in the calculation, resulting in the most accurate statistics. It is recommended that you use a skip factor (such as 100) when you are calculating statistics on a large raster stored in ArcSDE or a large mosaic dataset. This will save you time because every cell will not be examined. A skip factor is not used with all raster formats. The raster formats that will calculate statistics and take advantage of the skip factor include TIFF, IMG, NITF, DTED, RAW, ADRG, CIB, CADRG/ECRG, DIGEST, GIS, LAN, CIT, COT, ERMapper, ENVI DAT, BIL, BIP, BSQ, and geodatabase. You can also specify one or more ignore values, which are the cell values you do not want used when calculating statistics such as background values (for example, the edges of some satellite data) or NoData values.
The statistics for a raster dataset or mosaic dataset can be viewed on the dataset's Properties dialog box. Below is an example showing the statistics for a thematic raster dataset, such as a land-use dataset. Statistics are calculated for each band; if there is more than one band in the raster dataset, the statistics for each band are present. You can see that the parameters used to build the statistics are listed. The statistics that are calculated include the minimum and maximum pixel values, as well as the mean and standard deviation of the calculated pixel values, and if the dataset is thematic, the number of classes is listed. If your dataset is continuous, there will be no classes.
You cannot recalculate statistics on a grid dataset, because they are stored within that file format and are always present. Statistics are calculated using every cell in the grid except for cells with the value of NoData.
Statistics within mosaic datasets
Statistics (and the histogram) are used to enable automated stretching of imagery and are important for some types of analysis. They can exist at three locations within a mosaic dataset:
- The mosaic dataset
- With each source raster dataset
- On each raster item in the mosaic dataset after the functions have been applied
Mosaic dataset statistics
These statistics are applied to the entire mosaic dataset when it is displayed.
When you calculate statistics for a mosaic dataset, the base pixels are examined; that is, the source raster datasets with the lowest pixel sizes are examined and statistics are generated across the entire mosaic. This is why it is recommended that you use a skip factor. One way to identify a reasonable skip factor value is to divide the number of columns by 1,000 and use the quotient (integer) as the skip factor. However, if your mosaic dataset has overviews, statistics will be generated using the overviews. When building overviews, statistics will be generated automatically.
To calculate statistics on the mosaic dataset right-click the mosaic dataset in the Catalog window and click Calculate Statistics, the Calculate Statistics tool is opened or you can open this tool directly.
Source raster dataset statistics
These are the statistics of the source raster datasets within the mosaic dataset. They are necessary if you plan to color balance the raster dataset.
Statistics are not automatically generated for each raster dataset in the mosaic dataset; however, when adding the raster data to a mosaic dataset, you can check Calculate Statistics to calculate the statistics for each source raster dataset if they don't already exist. Or you can use the Build Pyramids And Statistics tool, add the mosaic dataset as the input, then check the Calculate Statistics and Include Source Datasets options.
Raster item statistics
Each row in the mosaic dataset's attribute table represents a raster item in the mosaic dataset. There is not always a one-to-one relationship with the raster datasets and raster items in the mosaic dataset; therefore, they are considered separately. For example, a raster item may represent a pan-sharpened image that is created from two datasets. Each raster item can have its own function chain, which may cause the statistics to be altered significantly (thereby affecting the rendering); for example, the NDVI function, Arithmetic function, or Stretch function can alter the pixel values and change the statistics. Like the source raster datasets, the statistics are not automatically generated for each raster item in the mosaic dataset.
To calculate statistics on the raster items in the mosaic dataset
- Use the Build Pyramids And Statistics tool, check the Calculate Statistics option, and uncheck the Include Source Datasets option.
- Use Synchronize Mosaic Dataset tool and check the Calculate Statistics option to calculate the statistics for each raster item. This tool will honor selections, so that the statistics can be computed for a subset of the complete mosaic dataset.
Statistics function and Stretch function
The Statistics function calculates focal statistics for each pixel based on a defined focal neighborhood, not the histogram and statistics this topic is discussing.
The Stretch function can be used to enhance an image by changing properties such as brightness, contrast, and gamma through multiple stretch types. By default, the statistics used by this function are retrieved from the data; however, you can enter your own statistics in the function's dialog box. If you do not specify your own statistics, you must make sure statistics have been calculated. And depending on where this function is added will determine the tool you use to calculate the statistics (as discussed above).
- If the Stretch function is added on the mosaic dataset, the statistics for the mosaic dataset need to be calculated.
- If the Stretch function is added as the first function in the raster's function chain, or is the first function in the chain to affect the pixel values, the raster dataset's statistics need to be calculated.
- If the Stretch function is added after functions that can affect the pixel values, the raster item's statistics need to be calculated.
Color balancing attempts to remove trends across images to make them look more seamless. Statistics must exist for the rasters within a mosaic dataset when using color balancing. If you attempt to use the Color Balance Mosaic Dataset tool or the Mosaic Color Correction window to color balance a mosaic dataset containing raster datasets that do not have statistics, color balancing fails to complete and a message reports that the statistics are missing.
Display properties (turning off a default stretch)
By default, when statistics exist, the application (for example, ArcMap) will apply a stretch to enhance the imagery. If you have prestretched (enhanced) imagery in your mosaic dataset, or you've used the Stretch function, you may not want the application to apply a default stretch. In this case, you can modify a property to turn off this default: open the mosaic dataset's properties, click the Defaults tab, then set the Is Preprocessed Data property value to Yes.