Desktop Help 10.0 - Cluster and Outlier Analysis (Anselin Local Moran's I) (Spatial Statistics)

Summary

Given a set of weighted features, identifies statistically significant hot spots, cold spots, and spatial outliers using the Anselin Local Moran's I statistic.

Learn more about how Cluster and Outlier Analysis (Anselin Local Moran's I) works

Illustration

Usage

This tool creates a new Output Feature Class with the following attributes for each feature in the Input Feature Class: Local Moran's I index, z-score, p-value, and cluster/outlier type (COType). The field names of these attributes are also derived tool output values for potential use in custom models and scripts.
The z-scores and p-values are measures of statistical significance which tell you whether or not to reject the null hypothesis, feature by feature. In effect, they indicate whether the apparent similarity (a spatial clustering of either high or low values) or dissimilarity (a spatial outlier) is more pronounced than one would expect in a random distribution.
A high positive z-score for a feature indicates that the surrounding features have similar values (either high values or low values). The COType field in the Output Feature Class will be HH for a statistically significant (0.05 level) cluster of high values and LL for a statistically significant (0.05 level) cluster of low values.
A low negative z-score (e.g., < -1.96) for a feature indicates a statistically significant (0.05 level) spatial outlier. The COType field in the Output Feature Class will indicate if the feature has a high value and is surrounded by features with low values (HL) or if the feature has a low value and is surrounded by features with high values (LH).
The z-score is based on the randomization null hypothesis computation. For more information on z-scores, see What is a z-score? What is a p-value?
Calculations based on either Euclidean or Manhattan distance require projected data to accurately measure distances.
For line and polygon features, feature centroids are used in distance computations. For multipoints, polylines, or polygons with multiple parts, the centroid is computed using the weighted mean center of all feature parts. The weighting for point features is 1, for line features is length, and for polygon features is area.
The Input Field should contain a variety of values. The math for this statistic requires some variation in the variable being analyzed; it cannot solve if all input values are 1, for example. If you want to use this tool to analyze the spatial pattern of incident data, consider aggregating your incident data.
Your choice for the Conceptualization of Spatial Relationships parameter should reflect inherent relationships among the features you are analyzing. The more realistically you can model how features interact with each other in space, the more accurate your results are. Explore these recommendations. Here are some additional tips:
- FIXED_DISTANCE_BAND
  The default value for the Distance Band or Threshold Distance parameter ensures that each feature has at least one neighbor, and this is important. But often, this default will not be the most appropriate distance to use for your analysis.
  Click here to learn more about the Distance Band or Threshold Distance parameter.
- INVERSE_DISTANCE or INVERSE_DISTANCE_SQUARED
  When 0 is entered for the Distance Band or Threshold Distance parameter, all features are considered neighbors of all other features; when this parameter is left blank, the default threshold distance is applied.
  Weights for distances less than 1 become unstable. The weighting for features separated by less than one unit of distance (common with geographic coordinate system projections) is 1.
  Caution:
  
  Analysis on features with a geographic coordinate system projection is not recommended when you select any of the inverse distance-based spatial conceptualization methods (INVERSE_DISTANCE, INVERSE_DISTANCE_SQUARED, or ZONE_OF_INDIFFERENCE).
  
  For these Inverse Distance options, any two points that are coincident are given a weight of 1 to avoid zero division. This ensures features are not excluded from analysis.
Additional options for the Conceptualization of Spatial Relationships parameter are available using the Generate Spatial Weights Matrix or Generate Network Spatial Weights tools. To take advantage of these additional options, use one of these tools to construct the spatial weights matrix file prior to analysis; select GET_SPATIAL_WEIGHTS_FROM_FILE for the Conceptualization of Spatial Relationships parameter; and, for the Weights Matrix File parameter, specify the path to the spatial weights file you created.
Map layers can be used to define the Input Feature Class. When using a layer with a selection, only the selected features are included in the analysis.

Note:

If this tool is part of a custom model tool, the HTML link will only appear in the Results window if it is set as a model parameter prior to running the tool.
For best display of HTML graphics, ensure your monitor is set for 96 DPI.

If you provide a Weights Matrix File with a .SWM or .swm extension, this tool is expecting a spatial weights matrix file created using either the Generate Spatial Weights Matrix or Generate Network Spatial Weights tools. Otherwise this tool is expecting an ASCII formatted spatial weights matrix file. In some cases, behavior is different depending on which type of spatial weights matrix file you use:

ASCII formatted spatial weights matrix files:
- Weights are used "as is". Missing feature-to-feature relationships are treated as zeros.
- If the weights are row standardized, results will likely be incorrect for analyses on selection sets. If you need to run your analysis on a selection set, convert the ASCII spatial weights file to a .swm file by reading the ASCII data into a table, then using the CONVERT_TABLE option with the Generate Spatial Weights Matrix tool.
.SWM formatted spatial weights matrix file
- If the weights are row standardized, they will be restandardized for selection sets. Otherwise weights are used "as is".

Running your analysis with an ASCII formatted spatial weights matrix file is memory intensive. For analyses on more than about 5000 features, consider converting your ASCII formatted spatial weights matrix file into a .swm formatted file. First put your ASCII weights into a formatted table (using Excel, for example). Next run the Generate Spatial Weights Matrix tool using CONVERT_TABLE for the Conceptualization of Spatial Relationships parameter. The output will be a .swm formatted spatial weights matrix file.
When this tool runs in ArcMap, the Output Feature Class is automatically added to the Table of Contents (TOC) with default rendering applied to the COType field. The rendering applied is defined by a layer file in <ArcGIS>/ArcToolbox/Templates/Layers. You can reapply the default rendering, if needed, by importing the template layer symbology.
The Modeling Spatial Relationships help topic provides additional information about this tool's parameters.

Caution:

When using shapefiles, keep in mind that they cannot store null values. Tools or other procedures that create shapefiles from non-shapefile inputs may store or interpret null values as zero. This can lead to unexpected results. See also Geoprocessing considerations for shapefile output.

Legacy:

In ArcGIS 9.2, the Global standardization option was removed. Global standardization returns the same results as no standardization. Models built with previous versions of ArcGIS that use the Global standardization option may need to be rebuilt.

Syntax

ClustersOutliers_stats (Input_Feature_Class, Input_Field, Output_Feature_Class, Conceptualization_of_Spatial_Relationships, Distance_Method, Standardization, Distance_Band_or_Threshold_Distance, {Weights_Matrix_File})

Parameter	Explanation	Data Type
Input_Feature_Class	The feature class for which cluster/outlier analysis will be performed.	Feature Layer
Input_Field	The numeric field to be evaluated.	Field
Output_Feature_Class	The output feature class to receive the results fields.	Feature Class
Conceptualization_of_Spatial_Relationships	Specifies how spatial relationships among features are conceptualized. INVERSE_DISTANCE —Nearby neighboring features have a larger influence on the computations for a target feature than features that are far away. INVERSE_DISTANCE_SQUARED —Same as INVERSE_DISTANCE except that the slope is sharper so influence drops off more quickly and only a target feature's closest neighbors will exert substantial influence on computations for that feature. FIXED_DISTANCE_BAND —Each feature is analyzed within the context of neighboring features. Neighboring features inside the specified critical distance receive a weight of 1, and exert influence on computations for the target feature. Neighboring features outside the critical distance receive a weight of zero and have no influence on a target feature's computations. ZONE_OF_INDIFFERENCE —Features within the specified critical distance of a target feature receive a weight of 1 and influence computations for that feature. Once the critical distance is exceeded, weights (and the influence a neighboring feature has on target feature computations) diminish with distance. POLYGON_CONTIGUITY_(FIRST_ORDER) —Only neighboring polygon features that share a boundary will influence computations for the target polygon feature. (Requires an ArcInfo license) GET_SPATIAL_WEIGHTS_FROM_FILE —Spatial relationships are defined in a spatial weights file. The path to the spatial weights file is specified in the Weights Matrix File parameter.	String
Distance_Method	Specifies how distances are calculated from each feature to neighboring features. EUCLIDEAN_DISTANCE —The straight-line distance between two points (as the crow flies) MANHATTAN_DISTANCE —The distance between two points measured along axes at right angles (city block); calculated by summing the (absolute) difference between the x- and y-coordinates	String
Standardization	Row standardization is recommended whenever the distribution of your features is potentially biased due to sampling design or an imposed aggregation scheme. NONE —No standardization of spatial weights is applied. ROW —Spatial weights are standardized; each weight is divided by its row sum (the sum of the weights of all neighboring features).	String
Distance_Band_or_Threshold_Distance	Specifies a cutoff distance for Inverse Distance and Fixed Distance options. Features outside the specified cutoff for a target feature are ignored in analyses for that feature. However, for Zone of Indifference, the influence of features outside the given distance is reduced with distance, while those inside the distance threshold are equally considered. The value entered should match that of the output coordinate system. For the Inverse Distance conceptualizations of spatial relationships, a value of 0 indicates that no threshold distance is applied; when this parameter is left blank, a default threshold value is computed and applied. This default value is the Euclidean distance that ensures every feature has at least one neighbor. This parameter has no effect when Polygon Contiguity or Get Spatial Weights From File spatial conceptualizations are selected.	Double
Weights_Matrix_File (Optional)	The path to a file containing spatial weights that define spatial relationships among features.	File

Code Sample

ClusterandOutlierAnalysis Example (Python Window)

The following Python Window script demonstrates how to use the ClusterandOutlierAnalysis tool.

import arcpy
arcpy.env.workspace = "c:/data/911calls"
arcpy.ClustersOutliers_stats("911Count.shp", "ICOUNT","911ClusterOutlier.shp","GET_SPATIAL_WEIGHTS_FROM_FILE","EUCLIDEAN_DISTANCE", "NONE","#", "euclidean6Neighs.swm")

ClusterandOutlierAnalysis Example (stand-alone Python script)

The following stand-alone Python script demonstrates how to use the ClusterandOutlierAnalysis tool.

# Analyze the spatial distribution of 911 calls in a metropolitan area
# using the Cluster-Outlier Analysis Tool (Anselin's Local Moran's I)

# Import system modules
import arcpy

# Set geoprocessor object property to overwrite outputs if they already exist
arcpy.gp.OverwriteOutput = True

# Local variables...
workspace = r"C:\Data\911Calls"

try:
    # Set the current workspace (to avoid having to specify the full path to the feature classes each time)
    arcpy.env.workspace = workspace

    # Copy the input feature class and integrate the points to snap
    # together at 500 feet
    # Process: Copy Features and Integrate
    cf = arcpy.CopyFeatures_management("911Calls.shp", "911Copied.shp",
                         "#", 0, 0, 0)

    integrate = arcpy.Integrate_management("911Copied.shp #", "500 Feet")

    # Use Collect Events to count the number of calls at each location
    # Process: Collect Events
    ce = arcpy.CollectEvents_stats("911Copied.shp", "911Count.shp", "Count", "#")

    # Add a unique ID field to the count feature class
    # Process: Add Field and Calculate Field
    af = arcpy.AddField_management("911Count.shp", "MyID", "LONG", "#", "#", "#", "#",
                     "NON_NULLABLE", "NON_REQUIRED", "#",
                     "911Count.shp")
    
    cf = arcpy.CalculateField_management("911Count.shp", "MyID", "[FID]", "VB")

    # Create Spatial Weights Matrix for Calculations
    # Process: Generate Spatial Weights Matrix... 
    swm = arcpy.GenerateSpatialWeightsMatrix_stats("911Count.shp", "MYID",
                        "euclidean6Neighs.swm",
                        "K_NEAREST_NEIGHBORS",
                        "#", "#", "#", 6) 

    # Cluster/Outlier Analysis of 911 Calls
    # Process: Local Moran's I
    clusters = arcpy.ClustersOutliers_stats("911Count.shp", "ICOUNT", 
                        "911ClusterOutlier.shp", 
                        "GET_SPATIAL_WEIGHTS_FROM_FILE",
                        "EUCLIDEAN_DISTANCE", "NONE",
                        "#", "euclidean6Neighs.swm")

except:
    # If an error occurred when running the tool, print out the error message.
    print arcpy.GetMessages()

Environments

Current Workspace, Scratch Workspace, Output Coordinate System, Qualified Field Names, Output has Z values, Default Output Z Value, Output has M values

Output Coordinate System: Feature geometry is projected to the Output Coordinate System prior to analysis, so values entered for the Distance Band or Threshold Distance parameter should match those specified in the Output Coordinate System. All mathematical computations are based on the spatial reference of the Output Coordinate System.

Licensing Information

ArcView: Yes

ArcEditor: Yes

ArcInfo: Yes

3/7/2012

Cluster and Outlier Analysis (Anselin Local Moran's I) (Spatial Statistics)