Ordinary Least Squares (OLS) (Spatial Statistics)
Summary
Performs global Ordinary Least Squares (OLS) linear regression to generate predictions or to model a dependent variable in terms of its relationships to a set of explanatory variables. Results are accessible from the Results window.
Learn more about how Ordinary Least Squares regression works
Illustration
![]() |
Ordinary Least Squares Regression: predicted values in relation to observed values. |
Usage
-
Results from OLS regression are only trustworthy if your data and regression model satisfy all of the assumptions inherently required by this method. Consult the table, Common Regression Problems, Consequences, and Solutions in Regression Analysis Basics to ensure your model is properly specified.
-
Dependent and Explanatory variables should be numeric fields containing a variety of values. OLS cannot solve when variables have all the same value (all the values for a field are 9.0, for example). Linear regression methods, like OLS, are not appropriate for predicting binary outcomes (e.g., all of the values for the dependent variable are either 1 or 0).
-
The Unique ID field links model predictions to each feature. Consequently, the Unique ID values must be unique for every feature, and typically should be a permanent field that remains with the feature class. If you don't have a Unique ID field, you can easily create one by adding a new integer field to your feature class table and calculating the field values to be equal to the FID/OID field. You cannot use the FID/OID field directly for the Unique ID parameter.
-
Whenever there is statistically signficant spatial autocorrelation of the regression residuals the OLS model will be considered misspecified and, consequently, results from OLS regression are unreliable. Be sure to run the Spatial Autocorrelation tool on your regression residuals to assess this potential problem. Statistically significant spatial autocorrelation of regression residuals almost always indicates a key missing explanatory variable.
-
You should visually inspect the over and under predictions evident in your regression residuals to see if they provide clues about potential missing variables from your regression model. It sometimes helps to run Hot Spot Analysis on the residuals to help you visualize spatial clustering of the over and under predictions.
-
When misspecification is the result of trying to model nonstationary variables using a global model (OLS is a global model), then Geographically Weighted Regression may be used to improve predictions and to better understand the nonstationarity (regional variation) inherent in your explanatory variables.
-
When the result of a computation is infinity or undefined, the output for non-shapefiles will be Null; for shapefiles the output will be -DBL_MAX = -1.7976931348623158e+308.
Model summary diagnostics are written to the OLS summary report and to the optional diagnostic output table. Both include diagnostics for the corrected Akaike Information Criterion (AICc), Coefficient of Determination, Joint F statistic, Wald statistic, Koenker's Breusch-Pagan statistic, and the Jarque-Bera statistic. The diagnostic table also includes uncorrected AIC and Sigma-squared values.
-
The optional coefficient and/or diagnostic output tables, if they already exist, will be overwritten when the Geoprocessing Option to overwrite the outputs of geoprocessing operations is checked ON.
-
Map layers can be used to define the Input Feature Class. When using a layer with a selection, only the selected features are included in the analysis.
![Note Note](rsrc/note.png)
The primary output for this tool is the OLS summary report which is written to the Results window. Right-clicking on the Messages entry in the Results window and selecting View will display the OLS summary report in a Message dialog box.
The OLS tool also produces an output feature class and optional tables with coefficient information and diagnostics. All of these are accessible from the Results window. The output feature class is automatically added to the Table of Contents, with a hot/cold rendering scheme applied to model residuals. A full explanation of each output is provided in Interpreting_OLS_results.
![Note Note](rsrc/note.png)
If this tool is part of a custom model tool, the optional tables will only appear in the Results window if they are set as model parameters prior to running the tool.
![Caution Caution](rsrc/caution.png)
When using shapefiles, keep in mind that they cannot store null values. Tools or other procedures that create shapefiles from non-shapefile inputs may store or interpret null values as zero. This can lead to unexpected results. See also Geoprocessing considerations for shapefile output.
Syntax
Parameter | Explanation | Data Type |
Input_Feature_Class |
The feature class containing the dependent and independent variables for analysis. | Feature Layer |
Unique_ID_Field |
An integer field containing a different value for every feature in the Input Feature Class. | Field |
Output_Feature_Class |
The output feature class to receive dependent variable estimates and residuals. | Feature Class |
Dependent_Variable |
The numeric field containing values for what you are trying to model. | Field |
Explanatory_Variables |
A list of fields representing explanatory variables in your regression model. | Field |
Coefficient_Output_Table (Optional) |
The full pathname to an optional table that will receive model coefficients, standard errors, and probabilities for each explanatory variable. | Table |
Diagnostic_Output_Table (Optional) |
The full pathname to an optional table that will receive model summary diagnostics. | Table |
Code Sample
The following Python Window script demonstrates how to use the OrdinaryLeastSquares tool.
import arcpy arcpy.env.workspace = r"c:\data" arcpy.OrdinaryLeastSquares_stats("USCounties.shp", "MYID","olsResults.shp", "GROWTH","LOGPCR69;SOUTH;LPCR_SOUTH;PopDen69","olsCoefTab.dbf","olsDiagTab.dbf")
The following stand-alone Python script demonstrates how to use the OrdinaryLeastSquares tool.
# Analyze the growth of regional per capita incomes in US # Counties from 1969 -- 2002 using Ordinary Least Squares Regression # Import system modules import arcpy # Set the geoprocessor object property to overwrite existing outputs arcpy.gp.overwriteOutput = True # Local variables... workspace = r"C:\Data" try: # Set the current workspace (to avoid having to specify the full path to the feature classes each time) arcpy.workspace = workspace # Growth as a function of {log of starting income, dummy for South # counties, interaction term for South counties, population density} # Process: Ordinary Least Squares... ols = arcpy.OrdinaryLeastSquares_stats("USCounties.shp", "MYID", "olsResults.shp", "GROWTH", "LOGPCR69;SOUTH;LPCR_SOUTH;PopDen69", "olsCoefTab.dbf", "olsDiagTab.dbf") # Create Spatial Weights Matrix (Can be based off input or output FC) # Process: Generate Spatial Weights Matrix... swm = arcpy.GenerateSpatialWeightsMatrix_stats("USCounties.shp", "MYID", "euclidean6Neighs.swm", "K_NEAREST_NEIGHBORS", "#", "#", "#", 6) # Calculate Moran's Index of Spatial Autocorrelation for # OLS Residuals using a SWM File. # Process: Spatial Autocorrelation (Morans I)... moransI = arcpy.SpatialAutocorrelation_stats("olsResults.shp", "Residual", "NO_REPORT", "GET_SPATIAL_WEIGHTS_FROM_FILE", "EUCLIDEAN_DISTANCE", "NONE", "#", "euclidean6Neighs.swm") except: # If an error occurred when running the tool, print out the error message. print arcpy.GetMessages()