Managing intermediate data in shared models
Intermediate data is created and deleted when a model tool is run. In the majority of cases, you do not have to be concerned about intermediate data, even with shared models. In some cases, however, model tools cannot create intermediate data: the model fails, and the error messages you (or your user) receive may not point directly to the problem. For example, the message may contain "Unexpected error" when, in fact, the tool simply cannot write its output because the folder or geodatabase does not exist.
Another problem you may run into if you don't consider your intermediate data location is performance. Sending your intermediate data to a remote machine or to an ArcSDE geodatabase should be avoided. The overhead of communicating across the network with the remote file system or remote database is often too costly and may cause your model performance to slow down significantly.
The simplest technique to ensure that your models can create and delete intermediate data is the following:
- Make all your intermediate data variables managed data.
- Set the scratch workspace environment to an existing file geodatabase.
The steps are as follows:
- From the Catalog window, right-click the model tool and click Edit. This opens ModelBuilder.
- In ModelBuilder, right-click all intermediate data variables and choose Managed. A check mark appears next to Managed.
- Save and close the model.
- From the Geoprocessing menu, choose Environments and click Workspace.
- For the Scratch Workspace setting, specify a new file geodatabase or leave it pointing to the default geodatabase.
- Click OK.
- Execute your model by double-clicking the model tool in the Catalog window, entering parameters, then clicking OK. If there are still errors, double-check that you've made all intermediate data variables managed and that the scratch workspace is set properly. If there are still problems, either one of the following is true:
- The problem is not with intermediate data.
- The scratch workspace environment is being reset by the model. Check the model environment settings:
- In the Catalog window, right-click your model tool and click Properties.
- On the properties dialog box, click the Environments tab.
- If the check box next to Workspace is checked, click the plus sign.
- Highlight Scratch Workspace by clicking it, then click the Values button.
- If the Scratch Workspace setting contains a path to a workspace, check that the workspace exists. (You can check this by clicking browse and browsing to the location.) If the workspace doesn't exist, enter the path to an existing workspace.
How models manage intermediate data
You need to make sure that any model you share has a location to create and delete its intermediate data. The simplest method is to make all your intermediate data managed data, as described in the steps above.
The remainder of this topic contains an in-depth look at how geoprocessing manages intermediate data and gives you the insight you need to troubleshoot problems. The sections below describe
- How output paths are autogenerated
- How ModelBuilder keeps track of whether you have altered any data variable
- How the Managed option can be used for intermediate data
- How you can use variable substitution for output paths
- How ArcGIS Server overrides the scratch workspace environment
- How the ToolShare folder structure is designed to help you manage intermediate data
- How scratch data can be written to the in_memory workspace
- How scratch workspace environments are used in scripting
How output paths are autogenerated
When you open a tool and provide input datasets, the location of the output data is automatically generated. This autogenerated name is constructed using the location of the scratch workspace if it is set or, if it is not set, the location of the current workspace. The output data variable will contain the autogenerated name regardless of whether the variable eventually becomes intermediate data, managed data, or a tool parameter.
When you distribute your models, recipients will surely have different settings for the scratch or current workspace, and they will want their environment settings to apply. That is, when they open and run the tool dialog box for your model, they want all intermediate data to be written to their scratch workspace as set in their environments. This will occur as long as you don't alter the autogenerated name in your data variables, as described next.
The altered state of data variables
Whenever you modify the value of a variable within ModelBuilder, it is considered altered. Once a variable is altered, ArcGIS operates on the principle that you want to use the altered value and will never again modify it. If the altered variable contains a path to folders or workspaces that do not exist on another user's computer, the model will fail.
If the variable is an output dataset, and its value is empty or unaltered, geoprocessing tools will autogenerate a path. You want to take advantage of this fact and leave output dataset parameters unaltered so that geoprocessing will autogenerate a path for you.
In ModelBuilder, there is no way for you to determine if a data variable is considered altered, but you can reset the altered state of a variable by deleting (blanking out) the existing value and validating the entire model. Validation will then see that the output value is blank, will autogenerate a new name for intermediate data, then mark the data variable as unaltered. A better method, however, is to set the variable to Managed, as described next.
Using managed data
You might choose to have ModelBuilder manage the location of intermediate data (using the logic described). You can set a data variable to be managed by right-clicking the variable and clicking the Managed option. Once you've set a variable to be managed, you cannot change the output path within ModelBuilder (the parameter control will always be unavailable). This means that managed data cannot have its altered status changed and will have a new autogenerated path for the data each time the model executes.
Custom script tools may or may not provide an autogenerated output path. If they don't provide one, you can use variable substitution in ModelBuilder for your output paths, as shown below.
The main issue with using variable substitution is that you rarely know if %scratchworkspace% will be a system folder or a geodatabase when the tool is executed. If, when you built your model in ModelBuilder, your scratch workspace was a shapefile workspace (a folder), ModelBuilder would have automatically appended .shp to the feature dataset name (that is, you entered %scratchworkspace%/temp and ModelBuilder automatically replaced it with %scratchworkspace%/temp.shp). At a later time, if you change your scratch workspace to a file geodatabase and run the model tool using its dialog box, the model fails because it is trying to write temp.shp to the file geodatabase, and geodatabases cannot contain special characters, such as the dot found in .shp.
There are only two cases where you can safely predict the type of scratch workspace:
- When the tool is running on ArcGIS Server
- If you use the recommended ToolShare folder structure that contains a scratch folder
Both cases are examined in more detail below.
ArcGIS Server scratch workspaces
When a server tool is executed on the server, ArcGIS Server creates a unique job folder for the tool to use. Inside this job folder is a folder named scratch, and within this folder is a file geodatabase named scratch.gdb, as shown below.
ArcGIS Server sets the application-level scratch workspace environment to the location of this unique scratch folder. It ignores the tool-, model-, or model process-level settings for scratch workspace. When the server tool is run, the location of any intermediate or managed output data variable will be reset to use the job folder's scratch workspace, unless the data variable is not managed and has been altered.
Since ArcGIS Server always creates this scratch folder with a scratch geodatabase and sets the scratch workspace environment to the scratch folder, you can safely use variable substitution for all output paths. For example:
Using the share folder structure
A structure for sharing tools described a recommended folder structure, called the ToolShare folder, shown below.
This ToolShare folder structure works well for sharing tools, whether you are sharing on an LAN or publishing to ArcGIS Server.
Note that like the unique job folder created by ArcGIS Server, the ToolShare folder contains a scratch folder and a scratch.gdb. You can set up your model so that its intermediate data is always written to this scratch folder, as follows:
- Set the model environment scratch workspace to the scratch folder within the share folder.
- Set the toolbox to use relative paths.
- In your model, use variable substitution for any intermediate data.
Using %scratchworkspace% in a model parameter will take the application-level scratch workspace, not the model-level scratch workspace, so you only want to use this technique for non-parameter data variables, such as intermediate data.
If you use this technique when sharing your toolbox across an LAN, any execution of your tools will write intermediate data to this scratch folder. The following configuration is an example:
- A user on \\pondermatic adds a folder connection to your share folder \\cogitator\GPTools.
- He or she opens and executes a tool within the RetailFunctions toolbox.
- Since you stored the RetailFunctions toolbox with relative paths, the location of the scratch workspace, %scratchworkspace%, expands to \\cogitator\GPTools\scratch and all intermediate data is written to \\cogitator\GPTools\Scratch.
Whether you want to use this technique for sharing across an LAN is up to you. The first consideration is whether you grant permissions to other users to write data to your shared folder. Secondly, writing data across an LAN is generally slower than writing to a local disk. It is preferable to use the scratch workspace environment set by the tool user. However, as noted above, you don't know if the user has set his or her scratch workspace to a folder or a geodatabase. Using this technique, you know the type of scratch workspace.
Writing scratch data to the in_memory workspace
Geoprocessing provides an in-memory workspace where you can write features and tables.
Only tables and feature classes (points, lines, polygons) can be written to the in_memory workspace. The in_memory workspace does not support extended geodatabase elements such as subtypes, domains, representations, topologies, geometric networks, and network datasets. Only simple features and tables can be written.
Do not set your scratch workspace environment to the in_memory workspace. Use in_memory only for outputs that you know to be simple features and tables.
Be very careful when using in-memory workspaces—you only want to write datasets that you know will be small to the in_memory workspace.
Scratch data in scripting
In your scripts, you often need to construct a location to write scratch data.