Geoprocessing considerations for shapefile output
Over the years, ESRI has developed three main data formats for storing geographic information—coverages, shapefiles, and geodatabases. Shapefiles were developed to provide a simple, nontopological format for storing geographic and attribute information. Because of the simplicity of shapefiles, they are a very popular open data transfer format. While shapefiles may seem to be an easy choice because of their simplicity, there are limitations in their use that geodatabases address. When using shapefiles, you should be aware of their limitations. In broad general terms,
- Geographic data is more than the simple features and attributes that a shapefile can store. For example, there are annotation, attribute relationships, topology relationships, attribute domains and subtypes, coordinate precision and resolution, and numerous other capabilities that are supported in geodatabases but not in shapefiles.
- Because shapefiles are an open format popular for data transfer, many non-ESRI software packages output shapefiles. (You can find the shapefile format specification at http://www.esri.com/library/whitepapers/pdfs/shapefile.pdf.) Unfortunately, these packages do not always do a good job of creating properly formatted shapefiles. You may have already experienced the frustration of receiving corrupted shapefiles from another source.
- Shapefiles make use of the dBASE file format (.dbf file) to store attributes. dBASE is a non-ESRI format developed in the early 1980s and was, at that time, the most popular format for storing tables of attributes. However, time has passed them by, and there have been a number of data representation improvements since then, such as the Unicode standard, to support most of the world's writing systems. This is one reason why shapefiles do not work well for storing information in a language other than English.
These issues (and more) mean that shapefiles are an extremely poor choice for active database management—they do not handle the modern life cycle of data creation, editing, versioning, and archiving.
When should I use a shapefile?
- When exporting data for use in a non-ESRI software application
- When exporting data for use in ArcView 3 or ArcInfo Workstation
- When you need to write simple features and attributes quickly, such as for ArcGIS Server geoprocessing services (however, you must be aware of the limitations as detailed below.)
When should I not use a shapefile?
With some exceptions that are noted below, shapefiles are acceptable for storing simple feature geometry. However, shapefiles have serious problems with attributes. For example, they cannot store null values, they round up numbers, they have poor support for Unicode character strings, they do not allow field names longer than 10 characters, and they cannot store both a date and time in a field. These are just the main issues. Additionally, they do not support capabilities found in geodatabases, such as domains and subtypes. So unless you have very simple attributes and require no geodatabase capabilities, do not use shapefiles.
Shapefile components and file extensions
Shapefiles are stored in three or more files that all have the same prefix and are stored in the same system folder (shapefile workspace). You will see the individual files when viewing the folder in Windows Explorer, not in ArcCatalog.
Extension |
Description |
Required? |
---|---|---|
.shp |
The main file that stores the feature geometry. No attributes are stored in this file—only geometry. |
Yes |
.shx |
A companion file to the .shp that stores the position of individual feature IDs in the .shp file. |
Yes |
.dbf |
The dBASE table that stores the attribute information of features. |
Yes |
.sbn and .sbx |
Files that store the spatial index of the features. |
No |
.atx |
Created for each dBASE attribute index created in ArcCatalog. |
No |
.ixs and .mxs |
Geocoding index for read-write shapefiles. |
No |
.prj |
The file that stores the coordinate system information. |
No |
.xml |
Metadata for ArcGIS; stores information about the shapefile. |
No |
Geometry limitations
- There is a 2 GB size limit for any shapefile component file, which translates to a maximum of roughly 70 million point features. The actual number of line or polygon features you can store in a shapefile depends on the number of vertices in each line or polygon (a vertex is equivalent to a point).
- Shapefiles do not contain an x,y tolerance like geodatabase feature classes. The x,y tolerance is the minimum distance between coordinates before they are considered equal. This x,y tolerance is used when evaluating relationships between features within the same feature class or between several different feature classes. It is also used extensively when editing features. If you are performing any sort of operation involving comparison between features, such as use of Overlay tools, the Clip tool, the Select Layer By Location tool, or any tool that takes two or more feature classes as input, you should be using geodatabase feature classes (which have an x,y tolerance) rather than shapefiles.
- A shapefile may take up three to five times as much space as a file geodatabase or SDE because of shape compression methods.
- Shapefiles support multipatches but lack support for the following advanced multipatch capabilities:
- Texture coordinates
- Textures and part color
- Lighting normals
- The spatial index for a shapefile is inefficient compared to that of a geodatabase feature class. This means that spatial queries (such as selecting features within a polygon) take longer compared to a geodatabase feature class. This inefficiency is only noticeable when dealing with large numbers of features.
- Parametrically defined curves (also known as circular arc curves) are not supported on shapefiles. Parametric curves are created by editing geodatabase feature classes, as described in Creating a curve. Circular arc curves use a mathematical formula to draw the curve. If you export a geodatabase feature class containing circular arc curve features to a shapefile, the curved features are transformed to simple line features with closely spaced vertices to capture the curved shape.
Attribute limitations
- Unlike other formats, shapefiles store numeric attributes in character format rather than binary format. For real numbers (that is, numbers containing decimal places), this may lead to rounding errors. This limitation does not apply to shape coordinates, only attributes. The following table summarizes the field width for each attribute data type.
Field widths in a dBASEGeodatabase data type
dBASE field type
dBASE field width (number of characters)
Object ID
Number
9
Short Integer
Number
4
Long Integer
Number
9
Float
Float
13
Double
Float
13
Text
Character
254
Date
Date
8
- The dBASE file standard only supports ANSI characters in their field names and values. ESRI has added extensive Unicode support for dBASE files to allow you to store Unicode field names and values. But this additional support resides only in ArcGIS and is not available in non-ESRI applications. Supporting Unicode in dBASE is an ongoing effort at ESRI, meaning that issues continue to be found and resolved.Note:
If you have to support Unicode in your field names or field values, we strongly suggest that you use geodatabases rather than shapefiles.
- Date fields support either the date or the time, but not both in the same field.
- Null values are not supported in shapefiles. If a feature class containing nulls is converted to a shapefile, then the null values will be changed into the following:
Data type containing null value |
Shapefile representation |
---|---|
Number—When tool requires a NULL, infinity, or NaN (Not a Number) to be output |
-1.7976931348623158e+308 (IEEE standard for the maximum negative value) |
Number (all other geoprocessing tools) |
0 |
Text |
" " (blank—no space) |
Date |
Stored as zero, but displays "<null>" |
- Field names cannot be longer than 10 characters.
- The maximum record length for an attribute is 4,000 bytes. The record length is the number of bytes used to define all the fields, not the number of bytes used to store the actual values.
- The maximum number of fields is 255. A conversion to shapefile will convert the first 255 fields if this limit is exceeded.
- The dBASE file must contain at least one field. When you create a new shapefile or dBASE table, an integer ID field is created as a default.
- dBASE files do not support type blob, guid, global ID, coordinate ID, or raster field types.
- dBASE files have little SQL support aside from a WHERE clause.
- Attribute indexes are deleted when you save edits, and you must re-create them from scratch.
Unsupported capabilities
Shapefiles have no extended data types at either the workspace or feature class level. Any conversion to shapefile from a geodatabase feature class or other format will result in the loss of the following:
- Subtypes
- Attribute domains
- Geometric networks
- Topologies
- Annotation
Shapefiles and geoprocessing
Any geoprocessing tool that outputs a feature class allows you to choose either a shapefile or geodatabase feature class as the output format. Similarly, a tool that outputs a table allows you to choose either a dBASE file (.dbf) or a geodatabase table as the output. You should always be aware of which format you use and the consequences of converting a geodatabase input to a shapefile output.
Geoprocessing tools autogenerate an output feature class or table for you. This autogenerated output is based on a number of factors as described in Using the current and scratch workspace environments. If your scratch workspace environment is set to a system folder, and not a geodatabase, the autogenerated output features class will be a shapefile or dBASE file, as illustrated below.
It is suggested that you set your scratch workspace to a file geodatabase so that the autogenerated output is written to a file geodatabase, not a shapefile or .dbf table.
Learn more about geoprocessing environments
Because shapefiles write quickly, they are often used to write intermediate data in models, since this makes for faster model execution. However, writing to a file geodatabase is almost as fast as writing to a shapefile, so unless execution speed is critical, you should always use a file geodatabase for intermediate and output data. If you do use shapefiles, be aware of their limitations as described above and only use shapefiles for simple features and attributes. An alternative to using shapefiles for intermediate data is to write features to the in_memory workspace.
Learn more about the in_memory workspace
Spatial reference and shapefiles
The topic Spatial reference and geoprocessing discusses the importance of spatial reference properties when using geoprocessing tools. There are a number of geoprocessing environments that control the spatial reference used by tools. The follow environments are NOT honored when the output of a tool is a shapefile: