Geoprocessing considerations for shapefile output

Over the years, ESRI has developed three main data formats for storing geographic information—coverages, shapefiles, and geodatabases. Shapefiles were developed to provide a simple, nontopological format for storing geographic and attribute information. Because of the simplicity of shapefiles, they are a very popular open data transfer format. While shapefiles may seem to be an easy choice because of their simplicity, there are limitations in their use that geodatabases address. When using shapefiles, you should be aware of their limitations. In broad general terms,

These issues (and more) mean that shapefiles are an extremely poor choice for active database management—they do not handle the modern life cycle of data creation, editing, versioning, and archiving.

When should I use a shapefile?

When should I not use a shapefile?

With some exceptions that are noted below, shapefiles are acceptable for storing simple feature geometry. However, shapefiles have serious problems with attributes. For example, they cannot store null values, they round up numbers, they have poor support for Unicode character strings, they do not allow field names longer than 10 characters, and they cannot store both a date and time in a field. These are just the main issues. Additionally, they do not support capabilities found in geodatabases, such as domains and subtypes. So unless you have very simple attributes and require no geodatabase capabilities, do not use shapefiles.

Shapefile components and file extensions

Shapefiles are stored in three or more files that all have the same prefix and are stored in the same system folder (shapefile workspace). You will see the individual files when viewing the folder in Windows Explorer, not in ArcCatalog.

Extension

Description

Required?

.shp

The main file that stores the feature geometry. No attributes are stored in this file—only geometry.

Yes

.shx

A companion file to the .shp that stores the position of individual feature IDs in the .shp file.

Yes

.dbf

The dBASE table that stores the attribute information of features.

Yes

.sbn and .sbx

Files that store the spatial index of the features.

No

.atx

Created for each dBASE attribute index created in ArcCatalog.

No

.ixs and .mxs

Geocoding index for read-write shapefiles.

No

.prj

The file that stores the coordinate system information.

No

.xml

Metadata for ArcGIS; stores information about the shapefile.

No

Shapefile extensions

Geometry limitations

Attribute limitations

Data type containing null value

Shapefile representation

Number—When tool requires a NULL, infinity, or NaN (Not a Number) to be output

-1.7976931348623158e+308 (IEEE standard for the maximum negative value)

Number (all other geoprocessing tools)

0

Text

" " (blank—no space)

Date

Stored as zero, but displays "<null>"

Shapefile representation of null

Unsupported capabilities

Shapefiles have no extended data types at either the workspace or feature class level. Any conversion to shapefile from a geodatabase feature class or other format will result in the loss of the following:

Shapefiles and geoprocessing

Any geoprocessing tool that outputs a feature class allows you to choose either a shapefile or geodatabase feature class as the output format. Similarly, a tool that outputs a table allows you to choose either a dBASE file (.dbf) or a geodatabase table as the output. You should always be aware of which format you use and the consequences of converting a geodatabase input to a shapefile output.

Geoprocessing tools autogenerate an output feature class or table for you. This autogenerated output is based on a number of factors as described in Using the current and scratch workspace environments. If your scratch workspace environment is set to a system folder, and not a geodatabase, the autogenerated output features class will be a shapefile or dBASE file, as illustrated below.

Shapefile and dBASE output

It is suggested that you set your scratch workspace to a file geodatabase so that the autogenerated output is written to a file geodatabase, not a shapefile or .dbf table.

Learn more about geoprocessing environments

Because shapefiles write quickly, they are often used to write intermediate data in models, since this makes for faster model execution. However, writing to a file geodatabase is almost as fast as writing to a shapefile, so unless execution speed is critical, you should always use a file geodatabase for intermediate and output data. If you do use shapefiles, be aware of their limitations as described above and only use shapefiles for simple features and attributes. An alternative to using shapefiles for intermediate data is to write features to the in_memory workspace.

Learn more about the in_memory workspace

Spatial reference and shapefiles

The topic Spatial reference and geoprocessing discusses the importance of spatial reference properties when using geoprocessing tools. There are a number of geoprocessing environments that control the spatial reference used by tools. The follow environments are NOT honored when the output of a tool is a shapefile:


2/11/2011