Updating and fixing data sources with arcpy.mapping
There are numerous reasons why data sources need to be repaired or redirected to different locations. The idea of making these changes manually in every affected map document can be overwhelming. Methods are available with the arcpy.mapping scripting environment that make it possible to automate these changes without even having to open a map document. You have control of updating data sources for individual layers, or you can update all layers at once that have a common workspace. Brief discussions are available with each method's help topic, but this document is intended to summarize and compare them all.
There are essentially two different types of methods that work on MapDocument, Layer, and TableView classes. The method names change slightly depending upon the class and the parameters available.
- findAndReplaceWorkspacePath or findAndReplaceWorkspacePaths allows you to substitute an entire or partial string for a layer or table's workspace path. It can't be used if the workspace type or dataset name has changed. It is ideal for scenarios where drive letters change, switch to UNC paths, update SDE connection file information, and so on. See the Common Scenarios section below.
- replaceWorkspaces or replaceDataSource allows you to change the workspace path, workspace type, and / or change the dataset name.
The capabilities of these methods overlap; therefore, there are multiple ways of updating a layer or table's workspace or dataset. The purpose of this document is to discuss all the methods and suggest best practices for many common scenarios.
Several definitions are provided to summarize the terms being used in this document:
- A workspace is a container for data. A workspace can be a folder that contains items like shapefiles, CAD files, or rasters, or it can be an ArcInfo coverage, personal geodatabase, file geodatabase, connection to SDE, and so on.
- A workspace path is the system path to the workspace. In the case of file-based data (for example, CAD, shapefiles, rasters), it includes the name of the folder that contains the data. With file-based geodatabases, it includes the name of the geodatabase. SDE workspace paths are defined by the system path to the SDE connection file.
- A workspace type is an ArcGIS supported data model format. For example, shapefile, personal geodatabase, file geodatabase, CAD, SDE, and so on.
- A dataset is the feature class or table that is in a workspace. When a dataset name is required, it is the name of the object in the workspace, not the way it appears in a map document's table of contents.
- A data source for a layer or table is the combination of the workspace and dataset.
Methods for updating and fixing data sources
- MapDocument.findAndReplaceWorkspacePaths(find_workspace_path, replace_workspace_path, {validate})
- The findAndReplaceWorkspacePaths method on the MapDocument object is intended to perform a global find and replace of workspace paths for all layers and tables in a map document that share that workspace path. This method can be used to replace the path to multiple workspace types at once (for example, CAD files, shapefiles, and file geodatabases).
- MapDocument.replaceWorkspaces(old_workspace_path, old_workspace_type, new_workspace_path, new_workspace_type, {validate})
- The replaceWorkspaces method on the MapDocument object is similar to the findAndReplaceWorkspacePaths method but also allows you to switch from one workspace type to another (for example, personal geodatabase to file geodatabase). Generally, this method works on one workspace type at a time. If more than one workspace type needs to be replaced, use the method multiple times for each workspace type. See the "General usage notes" section below for other alternative usages.
- Layer.findAndReplaceWorkspacePath(find_workspace_path, replace_workspace_path, {validate})
- The findAndReplaceWorkspacePath method on the Layer object is intended to perform a find and replace of the workspace path for a single layer in a map document or a layer file.
- Layer.replaceDataSource(workspace_path, workspace_type, dataset_name, {validate})
- The replaceDataSource method on the Layer object is similar to the findAndReplaceWorkspacePath method, but in addition to being able to change a workspace type, the dataset can also be changed. This is the only method that enables you to replace a layer's dataset within a workspace. This is especially useful when the name of a dataset is different in the new workspace.
- TableView.findAndReplaceWorkspacePath(find_workspace_path, replace_workspace_path, {validate})
- The findAndReplaceWorkspacePath method on the TableView object is intended to perform a find and replace of the workspace path for a single table in a map document.
- TableView.replaceDataSource(workspace_path, workspace_type, dataset_name, {validate})
- The replaceDataSource method on the TableView object is similar to the findAndReplaceWorkspacePath method, but in addition to being able to change a workspace type, the dataset can also be changed. This is the only method that enables you to replace a table's dataset within a workspace. This is especially useful when the name of a dataset is different in the new workspace.
General usage notes
- The ListBrokenDataSources() function is useful for determining which layers or tables are broken in a map document or layer file.
- These methods work recursively through all layers and tables. This means that joined and related tables are also searched and updated (see "Known limitations" section below).
- You can find and replace all or part of a workspace path. If replacing only a portion of the path, ensure it is a unique string. For example, finding only a letter C and replacing it with a letter D for a drive gives a less predictable result than finding r"C:\" and replacing it with r"D:\".
- Using a layer name to isolate a data source is not foolproof because it is possible to have multiple data sources with the same layer name. It may be necessary to use other layer properties to uniquely isolate a layer.
- Do not include the names of geodatabase feature datasets in the workspace path. Feature datasets are part of the workspace. If a feature class, for example, is moved from being a stand-alone feature class into a geodatabase feature dataset, a map document will still open without the layer being broken.
- The workspace path parameters should not include the layer or table's dataset name; therefore, many of the methods above won't work if a layer or table's dataset name has changed. The replaceDataSource method on the Layer and TableView objects does accommodate this scenario.
- When switching workspaces via the replaceWorkspaces method, the dataset names must be identical. For example, a shapefile called Highways.shp can be redirected to a file geodatabase workspace only if the dataset name in the file geodatabase is also called Highways. Use the replaceDataSource method on the Layer or TableView objects if the dataset name is different.
- If an empty string ("") is used for the old_workspace_path parameter and "NONE" is used for the old_workspace_type parameter, all data sources are matched to the new workspace. This allows you to redirect multiple workspaces to a single workspace using a single method. Only an empty string can be used with the keyword NONE.
The optional validate parameter
- This parameter allows you to first test to see if the new data source exists before changing or replacing a layer or table's data source.
- If validate is True and the new data source exists and is valid, the original data source will be updated to the new data source; otherwise, it will remain pointing to the original data source and nothing will be updated.
- If validate is False, the new data source does not have to be valid, that is, it might not already exist. This can be useful for scenarios that require data sources to be updated ahead of the data being created. Note, in these cases, the data would appear broken in the associated map documents or layer files.
- When using the findAndReplaceWorkspacePath or findAndReplaceWorkspacePaths methods, the validate parameter applies to the replace_workspace_path parameter.
- When using the replaceWorkspacePaths method, the validate parameter applies to the new_workspace_path and new_workspace_type parameters.
- When using the replaceDataSource method, the validate parameter applies to the dataset_name parameter.
Working with SDE connections
- The workspace path for SDE connections is the path to the SDE connection (.sde) file. You can provide the full path to the file, or in the case where the connection file appears in the Database Connections folder, you can use that string in the path as well, for example, find_workspace_path=r"Database Connections\myConnectionFile.sde"
- The SDE connection file information used in the script must be identical to the SDE connection that was used to add data to a map document or layer file. For example, you can have two identical SDE connection files in two separate locations. If the data was added to a map using the first connection file path, but you use the second connection file path in the script, the data sources won't be updated as expected.
- The dataSource property on the Layer and TableView objects provides a way to determine the path that is being used in the map document or layer file. Use this value to help determine which SDE connection file path should be used. Here is an example:
lyr.findAndReplaceWorkspacePath(find_workspace_path=lyr.dataSource, replace_workspace_path=r"C:\Newpath\To\SDE_ConnectionFile.sde")
- The dataSource property on the Layer and TableView objects provides a way to determine the path that is being used in the map document or layer file. Use this value to help determine which SDE connection file path should be used. Here is an example:
- When removing password information from SDE connections so that it is no longer persisted in a map document or layer file, it is necessary to set validate=False. This must be done because if you attempt to use an SDE connection file that does not persist all the appropriate connection information, the connection will obviously fail. Setting validate=False will force the use of the new connection file, and the user will be prompted for the login information when they open a map document that contains the SDE data.
Known limitations
- Schematic datasets cannot be updated due to the hidden tables that are part of their design.
- Database server connections that connect directly to SQL Server Express databases are currently not supported. As a workaround, create a spatial database connection to the SQL Server Express database and use that instead.
- Joins and relates associated with raster layers will not be updated.
- SQL queries are not automatically updated. Different workspaces (for example, personal versus file geodatabase) use different SQL syntax for layer definition queries, and this syntax is not automatically updated. The syntax can be corrected by updating the Layer object's definitionQuery property. Simple Python string functions are available to perform basic search and replace operations. Some examples are as follows:
- For personal geodatabases, field names are enclosed in brackets ([]), whereas file geodatabases are enclosed in quotation marks ("").
-
Wildcard characters for personal geodatabases are asterisk (*) and question mark (?), whereas wildcards for file geodatabases are percent symbol (%) and underscore (_).
-
String searches are case insensitive for personal geodatabases but are case sensitive for file geodatabases.
-
Personal geodatabases use UCASE and LCASE to convert string case. File geodatabases use UPPER and LOWER.
-
Dates and times in personal geodatabases are delimited using the pound sign (#), while file geodatabases are preceded by the word date.
- Layer label expressions are also not automatically updated. These, too, will need to be changed to support the appropriate SQL syntax. The syntax can be corrected by updating the layer's LabelClass SQLQuery or expression properties.
- Saved selections within a map document are cleared when a layer's data source is updated (this is also the case when using Set Data Source via the Layer Properties dialog box within ArcMap). There is currently no workaround for this, other than manually re-creating the map document feature selections.
Common scenarios
File-based data is moved to a different folder or located on a different network drive
Either the mapped network drive letter changes, the folder structure for the data's location changes, or something else occurs that breaks the link to the existing data that needs to be updated. In this scenario, it is simply a matter of redirecting the same data sources to the new folder location or drive name. This scenario applies to nearly all file-based data structures including spatial database connection files (for example, shapefiles, CAD data, personal and file geodatabases, file-based rasters).
In this scenario, the data was located directly under the C:\Project\Data folder but was moved into a subfolder called Data2. This script updates a single map document.
import arcpy mxd = arcpy.mapping.MapDocument(r"C:\Project\Project.mxd") mxd.findAndReplaceWorkspacePaths(r"C:\Project\Data", r"C:\Project\Data2") mxd.saveACopy(r"C:\Project\Project2.mxd") del mxd
Change a local path to a UNC path
In this scenario, a user wants to share map documents with other staff that have access to the user's drive and doesn't want staff to have to copy the data to their local drives for them to open the map document successfully.
This scenario involves changing a local system path to a UNC path. This script updates all the map documents in a folder.
import arcpy, os folderPath = r"C:\Project" for filename in os.listdir(folderPath): fullpath = os.path.join(folderPath, filename) if os.path.isfile(fullpath): basename, extension = os.path.splitext(fullpath) if extension.lower() == ".mxd": mxd = arcpy.mapping.MapDocument(fullpath) mxd.findAndReplaceWorkspacePaths(r"C:\Project\Data", r"\\ComputerName\Project\Data") mxd.save() del mxd
SDE connection properties are changed
There are numerous reasons why this may happen—perhaps the server name and/or port number is changed; a connection includes login credentials and, for security reasons, they need to be removed; the password is changed; connections change from a three-tier connection to a two-tier direction connection; or you want to point layers to a different version. Regardless of the specific scenario, when modifying spatial database connection properties, all you need to do is pass in the path to the connection file.
In this scenario, a user wants to change the geodatabase version for all map layers. The user creates a new SDE connection file and replaces the original connection file with the new connection file.
import arcpy mxd = arcpy.mapping.MapDocument(r"C:\Project\Project_default.mxd") mxd.findAndReplaceWorkspacePaths(r"C:\Project\Connection to Default.sde", r"C:\Project\Connection to Version1.sde") mxd.saveACopy(r"C:\Project\Project_V1.mxd") del mxd
In this scenario, a user wants to remove the password information saved within a map document. The data sources in the map document came from an SDE connection file where the password information was saved with the connection information. Next, the user created a new SDE connection file to the same database but this time did not save the password information. In the script below, the validate parameter must be set to False for the password information to be successfully removed. After the script is run, a user will need to log in to open the resulting map document.
import arcpy mxd = arcpy.mapping.MapDocument(r"C:\Project\Project_default.mxd") mxd.findAndReplaceWorkspacePaths(r"C:\Project\Connection with password info saved.sde", r"C:\Project\Connection with no password info saved.sde", False) mxd.saveACopy(r"C:\Project\Project_NP.mxd") del mxd
Data is migrated from one workspace type to another workspace type
Some common scenarios involve migrating data, such as shapefiles and personal geodatabases, into a file geodatabase or from a file geodatabase into an enterprise SDE connection. Because geodatabase workspaces have subtle differences, it is important to review the "Known limitations" section above to better understand the additional items that may need to be addressed.
This scenario involves updating two different workspaces in a map document. First, shapefiles are redirected to a file geodatabase called Parcels.gdb. Second, layers from a personal geodatabase are redirected to another file geodatabase called Transportation.gdb.
import arcpy mxd = arcpy.mapping.MapDocument(r"C:\Project\Project.mxd") mxd.replaceWorkspaces(r"C:\Project\Data", "SHAPEFILE_WORKSPACE", r"C:\Project\Data\Parcels.gdb", "FILEGDB_WORKSPACE") mxd.replaceWorkspaces(r"C:\Project\Data\Transportation.mdb", "ACCESS_WORKSPACE", r"C:\Project\Data\Transportation.gdb", "FILEGDB_WORKSPACE") mxd.saveACopy(r"C:\Project\Project2.mxd") del mxd
This example is identical to the example above except that all data sources are directed to a single file geodatabase. Multiple workspace types can be redirected into a single workspace type if a value of NONE is used in place of the old_workspace_type parameter.
import arcpy mxd = arcpy.mapping.MapDocument(r"C:\Project\Project.mxd") mxd.replaceWorkspaces("", "NONE", r"C:\Project\Data\BackgroundData.gdb","FILEGDB_WORKSPACE") mxd.saveACopy(r"C:\Project\Project2.mxd")
The following scenario redirects all layers from a personal geodatabase to a file geodatabase. It also fixes the SQL expressions for a layer's definitionQuery property and a label class's SQLQuery property. For example, personal geodatabases have square brackets ([]) around field names, whereas file geodatabases use double quotes (""). Wildcard characters are also replaced.
import arcpy mxd = arcpy.mapping.MapDocument(r"C:\Project\Project.mxd") mxd.replaceWorkspaces(r"C:\Project\Data\Parcels.mdb", "ACCESS_WORKSPACE", r"C:\Project\Data\Parcels.gdb", "FILEGDB_WORKSPACE") for lyr in arcpy.mapping.ListLayers(mxd): if lyr.supports("DEFINITIONQUERY"): lyr.definitionQuery = lyr.definitionQuery.replace("[", "\"") lyr.definitionQuery = lyr.definitionQuery.replace("]", "\"") lyr.definitionQuery = lyr.definitionQuery.replace("*", "%") if lyr.supports("LABELCLASSES"): for lblClass in lyr.labelClasses: lblClass.SQLQuery = lblClass.SQLQuery.replace("[", "\"") lblClass.SQLQuery = lblClass.SQLQuery.replace("]", "\"") lblClass.SQLQuery = lblClass.SQLQuery.replace("*", "%") mxd.saveACopy(r"C:\Project\Project2.mxd") del mxd
Individual datasets are moved to a new folder location
There are scenarios where just individual datasets are moved and not entire workspaces. Rather than attempting to update all layers in a map document, it sometimes makes sense just to focus on the individual items that have moved.
In this scenario, an individual feature class called MapIndex was moved from one file geodatabase in a folder called Data to a copy of the file geodatabase in a different folder called Data2.
import arcpy mxd = arcpy.mapping.MapDocument(r"C:\Project\Project.mxd") for lyr in arcpy.mapping.ListLayers(mxd): if lyr.supports("DATASOURCE"): if lyr.dataSource == r"C:\Project\Data\Parcels.gdb\MapIndex": lyr.findAndReplaceWorkspacePath(r"Data", r"Data2") mxd.saveACopy(r"C:\Project\Project2.mxd") del mxd
Individual datasets are moved in and out of a geodatabase feature dataset
Geodatabase feature datasets are part of a workspace and their names should not be included in the workspace path. If a feature class is moved into or out of a geodatabase feature dataset within the same workspace, nothing should be done in terms of updating map documents or layer file data sources. If a dataset is moved into a different workspace, but of the same workspace type, provide a path to the new workspace, without the geodatabase feature dataset name. If the workspace type is different, using findAndReplaceWorkspacePath on a layer will not work and you will need to use replaceDataSource instead.
In this scenario, a feature class was moved from being stand alone in one file geodatabase into a feature dataset in another file geodatabase.
import arcpy mxd = arcpy.mapping.MapDocument(r"C:\Project\Project.mxd") for lyr in arcpy.mapping.ListLayers(mxd): if lyr.supports("DATASOURCE"): if lyr.dataSource == r"C:\Project\Data\Parcels.gdb\MapIndex": lyr.findAndReplaceWorkspacePath(r"Parcels.gdb", r"Transportation.gdb") mxd.saveACopy(r"C:\Project\Project2.mxd") del mxd
A feature class is renamed
Schema changes are common, and sometimes the name of a feature class is changed. In this scenario, all layers that point to that dataset will necessarily be broken.
In this scenario, a feature class was renamed from MajorRoads to Highways, therefore breaking individual layers in a map document or layer file. The script will evaluate only the broken data sources using the ListBrokenDataSources() function and perform the appropriate fix.
import arcpy mxd = arcpy.mapping.MapDocument(r"C:\Project\Project.mxd") for lyr in arcpy.mapping.ListBrokenDataSources(mxd): if lyr.supports("DATASOURCE"): if lyr.dataSource == r"C:\Project\Data\Transportation.gdb\MajorRoads": lyr.replaceDataSource(r"C:\Project\Data\Transportation.gdb", "FILEGDB_WORKSPACE", "Highways") lyr.name = "Highways" mxd.saveACopy(r"C:\Project\Project2.mxd") del mxd