Spatial indexes generated by the DB2 Spatial Extender

The IBM DB2 Spatial Extender provides a three-tiered grid spatial index to allow applications to query the two-dimensional geographic data contained in a spatial column and quickly identify all geometries lying within a given extent.

The two-dimensional spatial index differs from the traditional hierarchical B-tree index provided by DB2. The B-tree index may not be applied to a spatial column because the two-dimensional characteristic of the spatial column requires the structure of a spatial index. For the same reason, you may not apply a spatial index to a nonspatial column, and a spatial index may not be applied to a composite column of any kind.

For this reason, the spatial index's CREATE INDEX syntax includes the additional USING clause, which directs DB2 to use the Spatial Extender's spatial index rather than a B-tree index. The full syntax is as follows:

CREATE INDEX <index_name> 
ON <table> (<spatial column>)
USING db2gse.spatial_index (<grid level 1>, [grid level 2], [grid level 3])

The addition of the USING clause distinguishes the spatial index from the B-tree index. The db2gse schema name must qualify the spatial_index extension name because this statement does not follow the current function path.

Because of the simple nature of the data that a B-tree was designed to index, the database designer merely directs DB2 to create the index on one or more table columns. However, since spatial data is complex, it requires the designer to understand its relative size distribution. The designer must determine the optimum size and number of the spatial index's grid levels.

The grid levels ([grid level 1], [grid level 2], [grid level 3]) are entered by increasing cell size. Thus, the second level must have a larger cell size than the first and the third a larger cell size than the second. The first grid level is mandatory, but you may disable the second and third with a zero value (0).

How the Spatial Extender generates a spatial index

The DB2 Spatial Extender constructs a spatial index as follows:

  1. The Spatial Extender intersects each geometry's envelope with the grid, beginning with the first level.
  2. If less than four intersections occur with the first grid level, the Spatial Extender enters the geometry ID and the intersecting grid cell IDs in the spatial index and continues with the next geometry.
  3. If the Spatial Extender detects more than four intersections, it intersects the geometry with the second grid level. If you have not enabled the second grid level, the Spatial Extender enters the geometry ID and grid cell IDs in the spatial index and continues with the next geometry.
  4. If less than four intersections occur with the second grid level, the Spatial Extender enters the geometry ID and the intersecting grid cell IDs in the spatial index and continues with the next geometry.
  5. If the Spatial Extender detects more than four intersections, it intersects the geometry with the third grid level. If you have not enabled the third grid level, the Spatial Extender enters the geometry ID and grid cell IDs in the spatial index and continues with the next geometry.
  6. The Spatial Extender enters the geometry ID and the intersecting grid cell IDs of the third level in the spatial index and continues with the next geometry.

The Spatial Extender does not actually create a polygon grid structure of any sort. The Spatial Extender manifests each grid level parametrically by defining the origin as the x,y offsets of the column's spatial reference system extending into positive coordinate space. Using a parametric grid, the Spatial Extender generates the intersections mathematically.

How the Spatial Extender uses the spatial index

The Spatial Extender uses a spatial index to improve the performance of a spatial query. The box query—the most basic and probably the most popular spatial query—returns geometries of a spatial column that intersect a user-defined box. If a spatial index does not exist, the Spatial Extender must compare all of a spatial column's geometries with the box.

Using the spatial index, the Spatial Extender identifies index grid entries that intersect the box. Since the spatial index is ordered on a grid, the Spatial Extender quickly obtains a list of candidate geometries. This process is referred to as the first pass.

A second pass goes through the list of candidate geometries and disqualifies any geometries that have envelopes that do not intersect the box.

A third pass compares the actual coordinates of the candidate geometry with the box to determine whether or not the geometry intersects the box. This last complex process of comparison operates on a subset of the table rows, significantly reduced by the first two passes.

All spatial queries perform the three passes with the exception of the EnvelopesIntersect function, which performs only the first two passes. The EnvelopesIntersect function was designed for display operations that use display driver clipping routines and that don't require the granularity of the third pass.

Optimum grid cell sizes

Selecting the grid cell size is complicated by the fact that envelopes of irregularly shaped geometries do not fit neatly within a grid cell. Because of this irregularity, some geometry envelopes intersect several grids, while others fit inside a single grid cell. Conversely, grid cells may intersect several geometry envelopes, depending on the spatial distribution of the data.

A spatial index performs well when you enable the correct number of levels and their grid cell sizes to fit the data. To simplify this discussion, first consider a spatial column containing geometry of a uniform size. In this case, it is not necessary to create a multileveled spatial index, since a single grid level will suffice. Create a spatial index with a single grid level whose grid cell size is 1.5 times the size of the average geometry envelope.

TipTip:

Since point data has a small envelope, the grid size could also be small.

The general rule is that the grid size should be about one-tenth of the typical query window size. For point data, only a single grid level should be necessary.

While testing your application, you may find that it performs better with a larger grid cell size because each grid cell references more geometries, enabling the first pass to discard nonqualifying geometries faster. However, if you continue to increase the grid cell size, performance deteriorates as the number of geometries filtered by the second pass increases.

DB2 Spatial Extender provides a utility, the Index Advisor, that lets you create a simulated grid index and tune this index into a model for a real index. It also determines whether to retain or replace an existing grid index.

Following is an example of how to use the Index Advisor to return detailed information about an existing grid index. In this example, the grid index's fully qualified name is mydb.myindex:

gseidx connect to mydb user test using test get geometry statistics for index mydb.myindex detail

Both shp2sde and cov2sde use a similar algorithm to calculate the default spatial index for grid size level 1 when the optional –g option is not present. The defaults for grid size level 2 and level 3 are always set to ZERO, where shp2sde is based on the shapefile's extent and cov2sde is based on the ArcInfo coverage's extent. For details on using the shp2sde or cov2sde commands, see the ArcSDE Administration Command Reference provided with ArcSDE.

Selecting the number of levels

Few spatial columns contain geometry of the same relative size. However, geometries of most spatial columns can be grouped into size intervals. For instance, consider a spatial column of county parcels containing a vast number of small parcels clustered in the urban areas surrounded by a few large rural parcels. These situations are common and require the use of a multilevel spatial index. To select the grid cell sizes of each level, determine the intervals of geometry envelope sizes. Create a spatial index with grid-level cell sizes slightly larger than each interval. Test the index by performing queries against the spatial column using your application. Try adjusting the grid sizes up or down slightly to determine whether an appreciable improvement in performance can be obtained.

For further details on this topic, refer to chapter 11, "Using indexes and views to access spatial data", in the IBM DB2 Spatial Extender User's Guide and Reference.


8/19/2013