Geoprocessing considerations for ArcSDE data

Dynamic creation of ArcSDE connection files

ArcSDE connection files can be created as needed using the Create ArcSDE Connection File tool. This tool allows you to automate the creation of ArcSDE connection files as you need them rather than having to prepare connection files prior to running tools that access ArcSDE data. You can get the connection properties from any source you want, such as a text file, encoded XML stream, or secure database, and pass these to the Create ArcSDE Connection File tool to create the required ArcSDE connection file. When you are finished with the connection file, you can immediately delete it using the Delete tool or the appropriate operating system delete command.

CautionCaution:

Geoprocessing history and results store all parameters passed to tools. Since your connection information is entered in the Create ArcSDE Connection File tool parameters, these too will be stored unless you disable history logging. Each time a tool is executed, whether it is a system, model, or script tool, a record of the execution is written to a history log file. The information contained in the log file is essentially the same as that found in the Results window. To prevent connection information (password or any other secure information entered into a parameter) from being stored in the history log or the Results window when the Create ArcSDE Connection File tool is run, disable history logging. See History log files for details.

Performance of geoprocessing tools when using ArcSDE data

Geoprocessing does not have control over the actual performance of the ArcSDE server, the geodatabase, or the database. Proper tuning of the database and the design and implementation of the application using the database (ArcSDE and the geodatabase in this case) are the leading factors in getting the best performance when using ArcSDE data. However, there are a few things to take into consideration when using ArcSDE data in geoprocessing tools to avoid some workflow pitfalls that may affect the performance of your tools and workflows.

Geoprocessing tools are mostly data creation/loading operations

Most geoprocessing tools create new data. So, for the most part, geoprocessing output should be viewed as a data loading operation when it comes to sending tool output to ArcSDE. It is important to plan all data loading operations so as to not adversely affect the database as a whole. Most database administrators (and users) will not be very happy with you when your Union tool output contains 50 million records and is loading the data to the ArcSDE database during the morning rush hour on the network and database. Overly large data loading operations could impact every user of the database (which in some cases could be hundreds or thousands of users) as well as impacting any user of the network the data is being sent across. If the inputs to your geoprocessing operation are very large or are being combined (such as when using the overlay tools) to create a very large output feature class, you want to consider scheduling the creation of this output at a time when it will least affect the network and database load.

NoteNote:

In most cases, it may be advantageous to have your tools create output in a file geodatabase. Once the output has been validated as correct in the file geodatabase, you can transfer the data to ArcSDE (using, for example, Copy Features or Feature Class to Feature Class) at a time when it has the least effect on the network and database performance.

It also may be necessary to plan the loading of geoprocessing output to make sure it is stored as efficiently as possible for retrieval and update. How the data is going to be used may determine how it should be best stored in the database. ArcSDE configuration keywords can be created to define how the data should be stored. Usually you cannot determine the best storage configuration until after analyzing the output data, so sending the output to a file geodatabase first will give you the opportunity to analyze the output data and avoid loading the data into ArcSDE unnecessarily. Once the output in the file geodatabase has been analyzed to determine the best storage parameters to use for the type and use of the data, the output can be loaded into ArcSDE using an appropriate configuration keyword.

When NOT to store your data in ArcSDE

Intermediate data is created when running models and scripts that have many steps to come to a final output. If your data is in ArcSDE and you send all your intermediate data to ArcSDE also, you could create a lot of network traffic/contention as well as cause the load on the database server to go up unnecessarily. It is recommended that all intermediate data be sent to a file geodatabase or, in the case of simple feature classes, to an in-memory feature class.

If the output of your analysis is temporary or is not to be shared widely, store it somewhere other than ArcSDE. See Types of geodatabases for a comparison of the different types of geodatabases for help in deciding which type of geodatabase is best suited for your needs.

Spatial index considerations for geoprocessing tools that edit existing data

Spatial indexes are used in ArcGIS to quickly locate features in feature classes. Whenever a feature is inserted or deleted, the spatial index needs to be updated. When creating a new output, geoprocessing defers the creation of the new feature classes' spatial index until after all data has been loaded. When editing existing feature classes, there are two options for when the spatial index is updated. You can leave the spatial index in place and let it be updated after every edit, or you can remove the spatial index prior to performing the edits and have it be updated once after all edits are complete.

Spatial indexes and geoprocessing tools

A few of the geoprocessing tools can edit existing data:

When using geoprocessing tools that edit existing data, an environment setting, MaintainSpatialIndex, is available to provide some level of control over how the ArcSDE spatial index is dealt with during insert, delete, and update operations (referred to as edits). There are two options available that may provide an improvement in performance for edits to existing data. You can set MaintainSpatialIndex to true in order to leave the spatial index in place and let ArcSDE automatically update it as the edits are taking place. Or you can set it to false, which will cause geoprocessing tools that perform edits to existing data to drop the ArcSDE spatial index before performing any edits. You will have to read the spatial index after all the edits have been completed.

Spatial indexes when using update and insert cursors

Update and insert cursors also work with existing data. When using update and insert cursors to edit data, you may want to use the Remove Spatial Index tool to remove the ArcSDE spatial index before edit operations that will involve a very large number of rows. Once the edits are complete, you can then use the Add Spatial Index tool to re-create the spatial index. This may improve the performance of the overall edit operation when a very large number of rows are being edited. See An overview of spatial indexes in the geodatabase for help using spatial indexes.

ArcSDE data guidelines when writing scripts

Relational database management systems (RDBMS) such as Oracle, SQL Server, IBM DB2, and PostgreSQL require object names to be fully qualified, especially if the data you are accessing is outside of your schema. Using fully qualified names enables you to unambiguously qualify the name of an object so the correct object (feature class, relational table, table, and so on) is used. Each database has a slightly different standard when fully qualifying an object name. Consult your database documentation for details.

Use the fully qualified table or feature class name

When an unqualified table or feature class name is passed to a tool, geoprocessing automatically qualifies the name using the currently connected user name from the connected workspace that was created using the ArcSDE connection file. If a script needs to access data from a different user other than the connected user, it should fully qualify the name of the table or feature class to avoid the geoprocessing tool qualifying it using the connected user. Otherwise, a failure may occur or the tool may use the wrong data.

Shows the use of a fully qualified feature class that is owned by a user other than the connected user. The toolbox database user must have select permissions on the map user's data.

import arcpy
# Create and ArcSDE connection file that connects as the toolbox database user
arcpy.CreateArcSDEConnectionFile_management(r'C:\temp',r'toolboxuser.sde','gpserver','5151','',"DATABASE_AUTH",'toolbox','toolbox')

# Perform a union operation using data owned by the connected user, toolbox,
# and another feature class owned by the map database user.
arcpy.Union_analysis(r'C:\temp\toolboxuser.sde\toolbox.states;C:\temp\toolboxuser.sde\map.counties',r'C:\temp\toolboxuser.sde\statesCountiesUnion')

Each database is slightly different in how it expects an object to be fully qualified. See your DBMS SQL documentation for details.

Use fully qualified field names

Fields are also objects in the database, and when accessing objects not owned by your connected user, you should access fields using their fully qualified name. Anywhere you build a complex SQL statement fully qualifying field names will help ensure your SQL works as required. Once again, each database is slightly different in how it fully qualifies its objects. See your DBMS SQL documentation for details.

Geoprocessing with versioned data

Geodatabase versioning is only available in Enterprise geodatabases. The Enterprise geodatabase is ArcSDE. Geoprocessing tools have two ways of accessing versioned data depending on whether the ArcSDE data is passed as a dataset path or a layer name.

When using feature classes, geoprocessing always uses the connection information in the connection file that is part of the path to the ArcSDE feature class to open the feature class. This means you are bound by the version set in the connection properties of the .sde file when accessing the data.

If you add your data to ArcMap, the data is represented as feature layers or table views. When a geoprocessing tool identifies an input as a feature layer or table view, it does not go back to the ArcSDE connection file to reopen the feature class using the connection properties in the connection file. Instead it accesses the feature class using the layer's already open workspace. If any changes, such as what version the workspace is pointing to, have been made to the workspace, the tool honors them. If you use the Change Version tool in ArcMap on ArcSDE data in the ArcMap table of contents, the geoprocessing tool uses the version of the data the workspace is connected to after the Change Version tool is called.

Using versioned feature classes with geoprocessing tools

When accessing feature classes directly in geoprocessing tools using a path that includes the ArcSDE connection file (.sde), you manage the version your feature classes are connected to as follows:

  1. Reference your feature classes using a path to an ArcSDE connection file.
  2. Create a new version using the Create Version tool. When you create a version this way, the new version will be a child of the version listed in the connection file passed in as an argument to the tool.
  3. Create a new connection file using the Create ArcSDE Connection File tool, making sure to use the newly created version in the Version parameter of the tool.
  4. Use this new connection file to access your feature class while connected to the new version.
  5. To return to using the feature class while connected to the parent version, use the original connection file.
TipTip:

When naming ArcSDE connection files, it may be helpful if you name the .sde file in such a way as to clearly identify the work being done using each connection file you create:

  • Using the connection parameters: gpserver5151toolboxVersion1.sde (server name + port + user name + version)
  • Using the GIS analyst's name that will be using the .sde file: Ken.sde
  • Naming the .sde file after the project it will be used for: waterDeptJune2010Upgrade.sde

Whatever naming convention you do use, make sure the file name is clear and does not break any security policies in your organization.

Python script showing workflow for using versioned feature classes

# VersionedFeatureClassExample.py 
# Description: Simple example showing how to access versioned feature classes in geoprocessing tools.
# Author: ESRI

# import system modules
import arcpy
import sys, os

# Set variables
sdeConnFilePath = sys.path[0] + os.sep + "sdeconnectionfiles"
analysisVersion = "TOOLBOX.proposedStreets2k9"
defaultVersionConnection = r'gpserver5151toolboxDEFAULT.sde'
proposedStreetsVersion = r'gpserver5151toolboxproposedStreets2k9.sde'

# Perform analysis on the Default version to determine current number of streams within 100 meters of streets.
inputFC = sdeConnFilePath + os.sep + defaultVersionConnection + os.sep + r'TOOLBOX.Redlands\TOOLBOX.streams'
selectFC = sdeConnFilePath + os.sep + defaultVersionConnection + os.sep + r'TOOLBOX.Redlands\TOOLBOX.street'
outputFC = sdeConnFilePath + os.sep + defaultVersionConnection + os.sep + r'StreamsNearStreets'

arcpy.MakeFeatureLayer_management(arcpy.SelectLayerByLocation_management(inputFC,"WITHIN_A_DISTANCE",selectFC,"100 Meters","NEW_SELECTION",'#'), outputFC,'','','')
print "Streams within 100 Meters of current streets: " + str(arcpy.GetCount_management(outputFC))

# Create an ArcSDE connection file for connecting to the proposedStreets2k9 version
folderName = sdeConnFilePath
fileName = proposedStreetsVersion
serverName = "gpserver"
serviceName = "5151"
databaseName = ""
authType = "DATABASE_AUTH"
username = "toolbox"
password = "toolbox"
saveUserInfo = "SAVE_USERNAME"
versionName = analysisVersion
saveVersionInfo = "SAVE_VERSION"

arcpy.CreateArcSDEConnectionFile_management (folderName, fileName, serverName, serviceName, databaseName, authType, username, password, saveUserInfo, versionName, saveVersionInfo)

# Perform the same analysis on the proposedStreets2k9 version to see the effect of the proposed changes.
inputFC = sdeConnFilePath + os.sep + proposedStreetsVersion + os.sep + r'TOOLBOX.Redlands\TOOLBOX.streams'
selectFC = sdeConnFilePath + os.sep + proposedStreetsVersion + os.sep + r'TOOLBOX.Redlands\TOOLBOX.street'
outputFC = sdeConnFilePath + os.sep + defaultVersionConnection + os.sep + r'NewStreamsNearStreets'

arcpy.MakeFeatureLayer_management(arcpy.SelectLayerByLocation_management(inputFC,"WITHIN_A_DISTANCE",selectFC,"100 Meters","NEW_SELECTION",'#'), outputFC,'','','')
print "Streams projected to be within 100 Meters of streets after proposed street additions:" + str(arcpy.GetCount_management(outputFC))

Accessing versioned feature classes using feature layers/table views with geoprocessing tools

The workflow to use when accessing versioned feature classes using feature layers or table views is as follows:

  1. Create a layer using the Make Feature Layer or Make Table View tool for all your inputs. Make sure to use the ArcSDE connection file that has the parent or starting version specified in the Version property of the connection file.
  2. Use these layers in all your tools.
  3. When you want to change the version you are working on, use the Change Version tool.
  4. Continue to use the feature layers in your tools. Geoprocessing honors the version the open workspace is pointing to after the Change Version tool is called.
  5. Use the Change Version tool as required by your project to change to other versions to perform your analysis.

Python script showing workflow for using versioned feature layers

# VersionedFeatureLayersExample.py 
# Description: Simple example showing how to access versioned feature classes using layers.
# Author: ESRI

# import system modules
import arcpy
from arcpy import env
import sys, os

# Set Environments
env.workspace = sys.path[0] + os.sep + "gpserver5151toolboxDEFAULT.sde"

# Create the layers
arcpy.MakeFeatureLayer_management(r'TOOLBOX.Redlands\TOOLBOX.street', 'RedlandsStreets')
arcpy.MakeFeatureLayer_management(r'TOOLBOX.Redlands\TOOLBOX.streams', 'RedlandsStreams')

# Perform analysis on the Default version to determine current number of streams within 100 meters of streets.
arcpy.MakeFeatureLayer_management(arcpy.SelectLayerByLocation_management("RedlandsStreams","WITHIN_A_DISTANCE","RedlandsStreets","100 Meters","NEW_SELECTION",'#'), 'StreamsNearStreets','','','')
print "Streams within 100 Meters of current streets: " + str(arcpy.GetCount_management("StreamsNearStreets"))

# Change to the development version
arcpy.ChangeVersion_management('RedlandsStreets','TRANSACTIONAL', 'TOOLBOX.proposedStreets2k9','')

# Perform the same analysis on the development version to see the effect of the proposed changes.
arcpy.MakeFeatureLayer_management(arcpy.SelectLayerByLocation_management("RedlandsStreams","WITHIN_A_DISTANCE","RedlandsStreets","100 Meters","NEW_SELECTION",'#'), 'NewStreamsNearStreets','','','')
print "Streams projected to be within 100 Meters of streets after proposed street additions: " + str(arcpy.GetCount_management("NewStreamsNearStreets"))

ArcSDE data and sharing tools

The recommended way of sharing tools, whether across the network or in geoprocessing services, is to use a tool share folder structure, as illustrated below. The required ArcSDE connection file is copied or created in the ToolData folder along with any other data required for the tool to successfully run.

The tool share folder structure

Model and script tools you develop should be written to access the ArcSDE connection file found in the ToolData folder using a relative path to the connection file.

Another method of providing access is to store the ArcSDE connection file at a location that is accessible on the network. Your tool then accesses this ArcSDE connection file using a UNC path.

You may have noticed this is really no different than how you would share other geodatabase data, such as file geodatabase data. For file geodatabases, you also have to provide the file geodatabase folder (equivalent to providing the ArcSDE connection file) along with the tools or provide access to the file geodatabase on the network and have your tools access the data using UNC paths.

Learn more about sharing tools


5/14/2012