Ordinary Least Squares (OLS) (Spatial Statistics)
Summary
Performs global Ordinary Least Squares (OLS) linear regression to generate predictions or to model a dependent variable in terms of its relationships to a set of explanatory variables. Results are accessible from the Results window.
Learn more about how Ordinary Least Squares regression works
Illustration
Usage

Results from OLS regression are only trustworthy if your data and regression model satisfy all of the assumptions inherently required by this method. Consult the table, Common Regression Problems, Consequences, and Solutions in Regression Analysis Basics to ensure your model is properly specified.

Dependent and Explanatory variables should be numeric fields containing a variety of values. OLS cannot solve when variables have all the same value (all the values for a field are 9.0, for example). Linear regression methods, like OLS, are not appropriate for predicting binary outcomes (e.g., all of the values for the dependent variable are either 1 or 0).

The Unique ID field links model predictions to each feature. Consequently, the Unique ID values must be unique for every feature, and typically should be a permanent field that remains with the feature class. If you don't have a Unique ID field, you can easily create one by adding a new integer field to your feature class table and calculating the field values to be equal to the FID/OID field. You cannot use the FID/OID field directly for the Unique ID parameter.

Whenever there is statistically signficant spatial autocorrelation of the regression residuals the OLS model will be considered misspecified and, consequently, results from OLS regression are unreliable. Be sure to run the Spatial Autocorrelation tool on your regression residuals to assess this potential problem. Statistically significant spatial autocorrelation of regression residuals almost always indicates a key missing explanatory variable.

You should visually inspect the over and under predictions evident in your regression residuals to see if they provide clues about potential missing variables from your regression model. It sometimes helps to run Hot Spot Analysis on the residuals to help you visualize spatial clustering of the over and under predictions.

When misspecification is the result of trying to model nonstationary variables using a global model (OLS is a global model), then Geographically Weighted Regression may be used to improve predictions and to better understand the nonstationarity (regional variation) inherent in your explanatory variables.

When the result of a computation is infinity or undefined, the output for nonshapefiles will be Null; for shapefiles the output will be DBL_MAX = 1.7976931348623158e+308.
Model summary diagnostics are written to the OLS summary report and to the optional diagnostic output table. Both include diagnostics for the corrected Akaike Information Criterion (AICc), Coefficient of Determination, Joint F statistic, Wald statistic, Koenker's BreuschPagan statistic, and the JarqueBera statistic. The diagnostic table also includes uncorrected AIC and Sigmasquared values.

The optional coefficient and/or diagnostic output tables, if they already exist, will be overwritten when the Geoprocessing Option to overwrite the outputs of geoprocessing operations is checked ON.

Map layers can be used to define the Input Feature Class. When using a layer with a selection, only the selected features are included in the analysis.
The primary output for this tool is the OLS summary report which is written to the Results window. Rightclicking on the Messages entry in the Results window and selecting View will display the OLS summary report in a Message dialog box.
The OLS tool also produces an output feature class and optional tables with coefficient information and diagnostics. All of these are accessible from the Results window. The output feature class is automatically added to the Table of Contents, with a hot/cold rendering scheme applied to model residuals. A full explanation of each output is provided in Interpreting_OLS_results.
If this tool is part of a custom model tool, the optional tables will only appear in the Results window if they are set as model parameters prior to running the tool.
When using shapefiles, keep in mind that they cannot store null values. Tools or other procedures that create shapefiles from nonshapefile inputs may store or interpret null values as zero. This can lead to unexpected results. See also Geoprocessing considerations for shapefile output.
Syntax
Parameter  Explanation  Data Type 
Input_Feature_Class 
The feature class containing the dependent and independent variables for analysis.  Feature Layer 
Unique_ID_Field 
An integer field containing a different value for every feature in the Input Feature Class.  Field 
Output_Feature_Class 
The output feature class to receive dependent variable estimates and residuals.  Feature Class 
Dependent_Variable 
The numeric field containing values for what you are trying to model.  Field 
Explanatory_Variables 
A list of fields representing explanatory variables in your regression model.  Field 
Coefficient_Output_Table (Optional) 
The full pathname to an optional table that will receive model coefficients, standard errors, and probabilities for each explanatory variable.  Table 
Diagnostic_Output_Table (Optional) 
The full pathname to an optional table that will receive model summary diagnostics.  Table 
Code Sample
The following Python Window script demonstrates how to use the OrdinaryLeastSquares tool.
import arcpy arcpy.env.workspace = r"c:\data" arcpy.OrdinaryLeastSquares_stats("USCounties.shp", "MYID","olsResults.shp", "GROWTH","LOGPCR69;SOUTH;LPCR_SOUTH;PopDen69","olsCoefTab.dbf","olsDiagTab.dbf")
The following standalone Python script demonstrates how to use the OrdinaryLeastSquares tool.
# Analyze the growth of regional per capita incomes in US # Counties from 1969  2002 using Ordinary Least Squares Regression # Import system modules import arcpy # Set the geoprocessor object property to overwrite existing outputs arcpy.gp.overwriteOutput = True # Local variables... workspace = r"C:\Data" try: # Set the current workspace (to avoid having to specify the full path to the feature classes each time) arcpy.workspace = workspace # Growth as a function of {log of starting income, dummy for South # counties, interaction term for South counties, population density} # Process: Ordinary Least Squares... ols = arcpy.OrdinaryLeastSquares_stats("USCounties.shp", "MYID", "olsResults.shp", "GROWTH", "LOGPCR69;SOUTH;LPCR_SOUTH;PopDen69", "olsCoefTab.dbf", "olsDiagTab.dbf") # Create Spatial Weights Matrix (Can be based off input or output FC) # Process: Generate Spatial Weights Matrix... swm = arcpy.GenerateSpatialWeightsMatrix_stats("USCounties.shp", "MYID", "euclidean6Neighs.swm", "K_NEAREST_NEIGHBORS", "#", "#", "#", 6) # Calculate Moran's Index of Spatial Autocorrelation for # OLS Residuals using a SWM File. # Process: Spatial Autocorrelation (Morans I)... moransI = arcpy.SpatialAutocorrelation_stats("olsResults.shp", "Residual", "NO_REPORT", "GET_SPATIAL_WEIGHTS_FROM_FILE", "EUCLIDEAN_DISTANCE", "NONE", "#", "euclidean6Neighs.swm") except: # If an error occurred when running the tool, print out the error message. print arcpy.GetMessages()