Loading large raster datasets into ArcSDE
This topic applies to ArcEditor and ArcInfo only.
Given the nature of the data, rasters tend to be large. Raster datasets and raster catalogs are hardly ever less than a few gigabytes (GB) and can occupy several terabytes (TB) within your database management system (DBMS). Coping with the large size of raster data challenges even the most experienced database administrator (DBA). This topic will provide you with the basic steps for loading large amounts of raster data into a raster dataset or raster catalog. It is assumed that you have already made the decision to store your rasters in an ArcSDE geodatabase.
There are three main processes to building large raster databases: preparing to load the data, loading the data, and verifying and validating the final product. This involves the following tasks:
Preparing to load the data
- Configuring your system
- Estimating the size of the DBMS storage space
- Allocating the DBMS storage space
Loading the data
- Preparing the source data
- Creating the raster object
- Loading the raster data into the raster
- Building the DBMS statistics
- Building the raster statistics
Verifying and validating
- Viewing the finished product
Configuring your system
Configuring your system consists of two objectives: configuring the DBMS and configuring ArcSDE.
Configuring your DBMS parameters
The configuration of each DBMS supported by ESRI is unique, but the basic goal is the same—increase the data throughput into the database as much as possible. To load vast amounts of raster data, the DBMS should be optimized for write performance; however, a DBMS system that is optimized to load several hundred gigabytes or more of raster data can impede the performance of other applications using the same resources to accomplish other tasks. Therefore, it is recommended that if the DBMS is not dedicated to loading raster data, you may need to consider reconfiguring the system and loading the raster data during off-hours to avoid impacting the work of others or creating a separate instance of the DBMS dedicated to loading the raster data.
There are guidelines for the storage of raster data in SQL Server, Oracle, DB2, and Informix. The information listed below highlights the information that can be found in each of these guides.
Configuring your SQL Server DBMS parameters
- Govern the resources consumed by SQL Server only if there are other applications running on the server.
- When possible, set the database recovery model to simple. This is because the recovery to a specific point in time is not necessary during the loading of the raster datasets, and having all the logging continue makes the logs unnecessarily large.
- Deactivate autoclose and autoshrink, since these can have a negative impact on performance.
To learn more, see Raster datasets and raster catalogs in a geodatabase in SQL Server.
Configuring your Oracle DBMS parameters
- Configure the checkpoint interval to occur only during an online redo log switch. The Oracle DBAs set the Oracle initialization parameters LOG_CHECKPOINT_INTERVAL and LOG_CHECKPOINT_TIMEOUT to 0. Doing so forces the checkpoint to occur during a log switch.
- Increase the size of the online redo log files to at least 1 GB each.
- Increase the data block buffer cache to hold the dirty blocks by increasing the size of the DB_BUFFER_CACHE.
- Create the Oracle database with a data block size of 8 KB. This is the optimum data block size for storing BLOB data, which has become the underlying default storage type of all ArcGIS binary data. Using a large data block size of 16 KB or 32 KB will likely result in a waste of space in the BLOB log segments. For a thorough understanding of BLOB storage, consult your Oracle documentation.
To learn more, see Raster datasets and raster catalogs in a geodatabase stored in Oracle.
Configuring your DB2 DBMS parameters
- Create separate buffer pools for the raster table spaces.
- Create a large buffer pool for the table space storing the raster blocks table.
- If DB2 is installed on an AIX platform, set the following parameters:
- db2set DB2_MMAP_READ=OFF
- db2set DB2_MMAP_WRITE=OFF
To learn more, see Raster datasets and raster catalogs in a geodatabase in DB2.
Configuring your Informix DBMS parameters
Configure the onconfig parameters:
- Set BUFFERS large enough to stay ahead of the cleaners.
- Set the LOGSIZE to 100000.
- Make sure that the physical log is not created in the rootdbs dbspace.
- Set the LOG_BACKUP_MODE to continuously back up the logical logs.
- Set the LOGSMAX to 100.
- Set RA_PAGES to 125.
- Set RA_THRESHOLD to 85.
- Set RESIDENT to -1.
- Set the NETTYPE parameter to favor remote connections if you intend to use direct connections.
To learn more, see Raster datasets and raster catalogs in a geodatabase in Informix.
Configuring the ArcSDE server
ArcSDE uses transport buffers to transfer data between the ArcSDE client and the ArcSDE server. During a write operation, when the client-side data buffer reaches its threshold, it is flushed to the ArcSDE server. While the ArcSDE server processes that buffer, the ArcSDE client accepts more data into its buffer. The client sends its buffer again once the threshold is reached, but only if the ArcSDE server is waiting. For raster data, the size of the transport buffer is controlled by the ArcSDE RASTERBUFSIZE server parameter. By default, the parameter is set to 200 KB, which is adequate for most loading operations. ArcSDE allocates the amount of memory specified by RASTERBUFSIZE to double buffers on both the ArcSDE client and server. Therefore, if you use the recommended 200 KB default value, 400 KB are allocated on the client, and 400 KB are allocated on the server. In addition to the transport buffers, ArcSDE also allocates another three buffers on the server for data read and written to and from the DBMS. This increases the total amount of memory allocated on the server to 1000 KB. If you are using direct connect, the functionality of the ArcSDE client and server operates on the client machine; therefore, the memory allocated by a direct connection is seven times the size specified by RASTERBUFSIZE. You should only change the size of the RASTERBUFSIZE if its default size is not large enough to fit the uncompressed size of a raster tile. The uncompressed size of a raster tile can be determined by multiplying the tile width by the tile height and adjusting for the pixel depth. For instance, if you use the default values of 128 by 128 for the tile width and tile height, the product of these two values is 16,384. Multiply this product by the pixel depth factor from the table below to obtain the raster tile's uncompressed size. If the pixel depth is 32, the uncompressed size would be calculated as 65,536 bytes. This is well under the 200 KB default value. However, if you decide to increase the tile size to 256 by 256, it would increase the uncompressed size to 262,144 bytes. Without increasing the size of the RASTERBUFSIZE parameter, you would risk getting an SE_RASTER_BUFFER_TOO_SMALL (-294) error. In this case, you would need to increase the size of the RASTERBUFSIZE parameter to approximately 300 KB.
Pixel depth factor
sdeconfig -o alter -v RASTERBUFSIZE=10240000 -u sde -p esri
The change is actually made to the SDE.SERVER_CONFIG table, which stores the new value. The value affects all ArcSDE sessions that connect after the parameter has been altered.
You should not increase the RASTERBUFSIZE parameter beyond what is required to hold the uncompressed size of a raster tile. Setting the parameter too large can actually be detrimental to the flow of raster data through the system.
The construction of the reduced-resolution pyramid is a multithreaded, processor-intensive operation. This task is performed by the ArcSDE server functionality. By positioning the server functionality on the machine with the most CPU power within your client/server architecture, you will achieve a higher throughput of raster data into the database. CPU power is determined by multiplying the number of CPUs by their processor speed.
The server functionality can either be run stand alone as it is when you connect to an ArcSDE service or be embedded in the client application (for example, ArcCatalog), as it is when you make a direct connection.
The decision of whether or not to use a direct connection when loading raster data depends on whether the CPU power is greatest on the client machine or the server hosting the ArcSDE service. To build the pyramids on the client machine, if it is more powerful, use a direct connection.
You must also consider the capacity of the server machine hosting the ArcSDE service. It is possible that it is more powerful than the client machine, but it may already be near its limit servicing requests from other applications. If the server machine is not reaching its limit and is more powerful than the client machine, then connect to the ArcSDE service.
Estimating the size of the DBMS storage space
Organizing your DBMS storage space prior to starting a large raster data loading project is recommended. To do so, you will need to obtain an estimate of the space required to store the raster data in the database as well as any space required to stage the image files before they are loaded. Once you have an estimate of the required disk space, you can allocate the DBMS space, set the ArcSDE DBTUNE parameters, and begin loading your raster data.
The raster BLK table is by far the largest of the raster's tables and indexes. It is typically 150 times larger than the next largest DBMS object, which is the raster blocks table's composite index. The remainder of the raster object's tables and indexes are thousands of times smaller and are generally grouped together with the raster blocks table's composite index into a single DBMS storage space. Since the raster blocks table is so much larger than all the other objects, the task of estimating the size of the raster data is focused on it.
For raster datasets and raster catalogs the pixel data is stored inside the DBMS in the BLK table. On the other hand, mosaic datasets reference the raster datasets and do not store the pixel data in the BLK table. The BLK table will remain empty unless the mosaic dataset overviews are stored in the DBMS.
There are two basic methods of estimating the amount of space that a raster blocks table will occupy in the database. One method involves loading a few sample images and extrapolating the total amount of space the raster blocks table is expected to occupy in the database from the amount of space occupied by the sample. Another method attempts to calculate the space required based on a formula that accepts as input the expected raster objects' properties. The first method generally provides a better estimate than the latter, but the formula method is useful in cases where the source data is not readily available prior to the construction of the database. To accurately estimate the size of a raster blocks table, you must have already determined the following properties that will affect the storage size of a raster blocks table:
Estimating the size of the raster blocks table based on a sample
Create a small raster object by loading a number of sample raster datasets with ArcCatalog. The raster object should have the same compression type, compression quality, pixel depth, number of bands, and tile size as the final raster object that it will represent. These raster properties affect storage size of a raster object within the database.
After loading sample raster data with ArcCatalog, determine how much database space has been occupied by the sample raster object blocks table.
For Oracle, this is done by querying the BLOCKS column of the Oracle USER_TABLES system table for the raster object's raster blocks table. To do that, you will first need to determine the name of the raster object's raster blocks table, which, in turn, requires you to obtain the rastercolumn_id from the SDE.RASTER_COLUMN table.
The following examples in this document assume the name of your sample raster object's business table is EARTH.
- Obtain the rastercolumn_id for the raster object.
SQL> select rastercolumn_id from sde.raster_columns where table_name = 'EARTH'; RASTERCOLUMN_ID =--------------- 14
- Obtain the size of the raster blocks table with the following query.
SQL> exec dbms_stats.gather_table_stats (NULL,'EARTH'); SQL> select sum(length(rasterband_id)) + sum(length(rrd_factor)) + sum(length(row_nbr)) + sum(length(col_nbr)) + sum(length(block_data)) "total size" from sde_blk_14; total size =----------- 762707968
The total size of the raster blocks table is 762,707,968 bytes, or approximately 727 MB.
Determine the ratio of the sample raster dataset's size to the total size of all source raster dataset files that you are about to load. The size of this sample is 2 GB, while the data vendor has provided a total size of all raster dataset files of approximately 3 TB.
First, convert the 3 terabytes of total file size to gigabytes:
3 TB * 1024 = 3072 GB
The total raster dataset size to sample size ratio is calculated as
total size of all rasters / sample raster size = total size to sample size
3072 / 2 = 1536
- Multiplying the ratio of total raster dataset size to sample size by the size of the sample stored in the database provides a rough estimate of the total amount of disk space required to store all the raster dataset files in the raster object.
727 MB * 1536 = 1,116,672 MB = 1090 GB = 1.06 TB
To determine the sample size in SQL Server, you first obtain the sample size, then move to step 3 above.
- Obtain the rastercolumn_id for the raster.
1> select rastercolumn_id 2> from sde.sde_raster_columns 3> where table_name = 'EARTH'; rastercolumn_id =-------------- 3
- Use the ID to determine the space used by the sample data.
1> exec sp_spaceused 'sde_blk_3'; 2> go name rows reserved data index_size used =-------------------------------------------------------------- SDE_blk_3 4233 29712 KB 29632 KB 16 KB 64 KB
In this case, the original raster dataset size was 50 MB, but once loaded, the amount of space used is 29,632 KB + 16 KB = 29,648 KB, or approximately 29 MB.
Estimating the size of the raster blocks table based on a formula
The formula method that follows provides a rough estimate of the size of the raster blocks table. If the source data is available either in part or in total, you should use the previously described method to obtain an estimate of your raster blocks table, because it is a more accurate method.
- Obtain a rough estimate of the extent in either meters or feet. For raster datasets with a pixel resolution in meters (for example, 5 meters, 15 meters, or 150 meters), obtain the estimate in those units. Likewise, if the resolution is expressed in feet or inches, then obtain the extent in those units. A good way to obtain the extent is to create a polygon outline of the image extent using ArcMap. You need to eliminate as much void space as possible, since the geodatabase does not store empty blocks.
Convert the extent to pixels.
(Extent of raster in pixel units) / (pixel resolution) = Number of pixels
For source raster datasets with a resolution of 15 meters and the extent of 450 square kilometers, the conversion to pixels would be calculated as follows:
(km2 to m2 conversion factor) / (the pixel resolution in m2) = pixels (450 km2 * 1,000,000) / (15 x 15) = 2,000,000 pixels
As another example, consider 50 square miles of 1-foot resolution. Estimated pixel extent is approximately as follows:
(mi2 to ft2 conversion factor) / (the pixel resolution in ft2) = pixels (50 mi2 * 52802)/ (1 x 1) = 1,393,920,000 pixels
- Multiply by the number of bands. If it is a single-band grayscale or black-and-white raster dataset, you can skip this step. For this example, assume the 15-meter data from step 2 has three bands (representing RGB).
2,000,000 pixel extent * 3 bands = 6,000,000 pixels
- Convert to bytes by applying the pixel depth factor from the table.
List of pixel byte factors for pixel depths
8-bit data: 6,000,000 pixels * 1 = 6,000,000 Bytes / 10242 = 5.7 MB 16-bit data: 6,000,000 pixels * 2 = 12,000,000 Bytes / 10242 = 11.4 MB
- Add the reduced-resolution pyramid. The reduced-resolution pyramid increases the size of the raster by one-third. Therefore, multiply the byte count from step 4 by 1.33.
5.72 MB * 1.33 = 7.67 MB
- Apply the raster compression. The following table provides an example of the expected savings for a given compression and quality. The exact values will vary depending on the data being used.
Compression factors for compression types
JPEG 2000 80/100
JPEG 2000 60/100
JPEG 2000 55/100
JPEG 2000 50/100
JPEG 2000 45/100
- Extending the example in step 5 and applying a JPEG compression with a quality of 50 percent, multiply the byte count by the 0.1 compression factor.
7.67 MB * 0.1 = 0.77 MB
- Increase the estimate for DBMS overhead and the raster blocks table integer columns. The value from step 7 is an estimate of the amount of absolute space required for pixel data stored in the raster blocks table. The pixels are divided into blocks according to the tile size that you select when you create the raster object. These blocks of pixels are stored in the database data blocks. The raster blocks will not fit perfectly given the variability of compression. There will always be a certain amount of unused block space. In addition, the raster blocks table also stores four integer columns, along with the pixel BLOB column, to uniquely identify each block. The integer columns also add overhead to the storage. Increase the estimate in step 7 by at least 10 percent to account for the DBMS overhead and the raster blocks table integer columns.
Allocating the DBMS storage space
How you create the DBMS storage space depends on how you need to manage it. Some DBMSs allow you to move the data files of a database, so you may want to store all the raster tables and indexes in a single space. If transporting the data is not a concern, create storage space according to size: a large space for the blocks table and its index and a smaller space for the other raster tables and their indexes.
Some DBMSs allow you to autogrow the DBMS space; however, you should preallocate the space to ensure that you will not encounter problems during the load. Since you have gone to the trouble of estimating the space requirements of the data, you might as well allocate space for it.
Creating the Oracle DBMS storage space
- Create the nonraster blocks table space.
create tablespace earth datafile 'd:\oradata\earth.dbf' size 500M extent management local uniform size 1M;
- Create the raster blocks table space.
create tablespace earth_blocks datafile 'e:\oradata\earth_blocks.dbf' size 50000M extent management local segment space management manual uniform size 100M;
Creating the SQL Server DBMS storage space
- Create the SQL Server database large enough to store the entire raster object.
Creating the DB2 DBMS storage space
- Create a table space in which to store all the nonraster blocks data.
create tablespace earth managed by database using (file 'd:\earth.dat' 500000);
- Create a table space in which to store the raster blocks table data.
create long tablespace earth_blocks managed by database using (file 'e:\earth_blocks.dat' 50000000);
Creating the Informix DBMS storage space
- Create the dbspace to store all nonraster blocks tables.
onspaces -c -d earth -p d:\earth.dbs -o 0 -s 5000
- Create the dbspace(s) to store the raster blocks table.
onspaces -c -d earth_blocks -p e:\earth_blocks.dbs -o 0 -s 50000000
Configuring the DBTUNE keywords
The SDE.DBTUNE table holds the storage parameters used by ArcSDE to create tables and indexes within the database. The storage parameters are organized by keyword. By specifying a DBTUNE keyword during the construction of a geodatabase object, such as a raster, you specify how you want the tables and indexes of the geodatabase object to be created and in which database storage unit they are to be located.
There is a DBTUNE storage parameter for each of the raster's table and index combinations. You will edit the SDEHOME/etc/dbtune.sde file by appending a new keyword followed by the specification for each storage parameter. Examples of this are shown below. Then you will use the sdedbtune import operation to update the SDE.DBTUNE table.
Depending on the DBMS, the storage parameters may vary. The 11 Oracle DBTUNE storage parameters are listed below; however, many of these are common to the other DBMSs.
- The storage parameter for the business table is B_STORAGE, and if the business table has an ArcSDE registered row ID column, the storage parameter for it is B_INDEX_ROWID. The business table for a raster dataset contains a single row and is, therefore, very small. The business table for a raster catalog is typically small, holding one row for each raster dataset.
Examples of the B_STORAGE and B_INDEX_ROWID storage parameters are as follows:
B_STORAGE "PCTFREE 0 INITRANS 4 TABLESPACE EARTH" B_INDEX_ROWID "PCTFREE 0 INITRANS 4 TABLESPACE EARTH"
- The storage of the raster table is controlled by the RAS_STORAGE parameter, while the storage of the raster_id index on that table is controlled by the RAS_INDEX_ID parameter. The raster table of a raster dataset contains a single row, and it is very small. The raster table for a raster catalog is typically small, storing only one row for each raster dataset.
Examples of the RAS_STORAGE and RAS_INDEX_ID storage parameters are as follows:
RAS_STORAGE "PCTFREE 0 INITRANS 4 TABLESPACE EARTH" RAS_INDEX_ID "PCTFREE 0 INITRANS 4 TABLESPACE EARTH"
- The raster band table is stored according to the BND_STORAGE storage parameter. The BND_INDEX_ID controls the storage of the rasterband_id index, and the BND_INDEX_COMPOSITE storage parameter controls the storage of the composite index on the sequence_nbr and raster_id columns. The raster bands table is typically small and only slightly larger than the raster object's raster or business table, storing one record for each raster band. For a raster catalog storing 1,000 three-band raster datasets, the raster bands table will have 3,000 records.
Examples of the BND_STORAGE, BND_INDEX_COMPOSITE, and BND_INDEX_ID storage parameters are as follows:
BND_STORAGE "PCTFREE 0 INITRANS 4 TABLESPACE EARTH" BND_INDEX_COMPOSITE "PCTFREE 0 INITRANS 4 TABLESPACE EARTH" BND_INDEX_ID "PCTFREE 0 INITRANS 4 TABLESPACE EARTH"
- The AUX_STORAGE storage parameter controls the storage of the raster auxiliary table, while the AUX_INDEX_COMPOSITE storage parameter controls the storage of the table's composite index. The number of records in the raster auxiliary table varies depending on the optional raster band data stored. For efficiency, the NoData bit mask is stored for each raster band, but only if the bit mask is less than 10 MB; otherwise, it is not stored, and ArcSDE must retrieve the NoData bit mask from the pixel blocks stored in the raster blocks table. It also stores the raster statistics, if they have been calculated, and the color map, if one exists. It is possible that for a given raster object, the raster auxiliary table can be empty or contain more records than the raster bands table. It is usually about the same size as the raster bands table.
Examples of the AUX_STORAGE and AUX_INDEX_COMPOSITE storage parameters are as follows:
AUX_STORAGE "PCTFREE 0 INITRANS 4 TABLESPACE EARTH" AUX_INDEX_COMPOSITE "PCTFREE 0 INITRANS 4 TABLESPACE EARTH"
- The BLK_STORAGE controls the storage of the raster blocks table, which is the largest of the raster object's table and index. The raster blocks table stores the pixel data as defined by the tile size when the raster object is created. The BLK_INDEX_COMPOSITE controls the storage of the raster blocks table's composite index, which indexes the four integer columns rasterband_id, row_nbr, col_nbr, and rrd_factor. The raster blocks table is approximately 150 times larger than its index. However, since the raster blocks table can become very large, it is practical to store the raster blocks table index in the same table space as the raster blocks table.
Examples of the BLK_STORAGE and BLK_INDEX_COMPOSITE storage parameters are as follows:
BLK_STORAGE "PCTFREE 0 INITRANS 4 TABLESPACE EARTH_BLOCKS STORAGE (INITIAL 500M MINEXTENTS 10) LOB( BLOCK_DATA) STORE AS (ENABLE STORAGE IN ROW CHUNK 8K RETENTION CACHE)" BLK_INDEX_COMPOSITE "PCTFREE 0 INITRANS 4 TABLESPACE EARTH_BLOCKS"
The footprints of the raster dataset and raster catalog are stored in the spatial column. If the spatial column is created as an ArcSDE binary type, you will also need to configure the following DBTUNE storage parameters:
- S_STORAGE—The spatial index table
- S_INDEX_ALL—The coverage index, which lists all the spatial index table columns
- S_INDEX_SP_FID—The spatial index table's fid column index
Examples of how the DBTUNE keyword would appear in the DBTUNE file for each DBMS are shown below. The double pound sign (##) signifies the beginning of the EARTH DBTUNE keyword, while END signifies its termination.
SQL Server DBTUNE keywords example
##EARTH_15M B_CLUSTER_ROWID 0 B_CLUSTER_SHAPE 1 B_CLUSTER_RASTER 0 B_INDEX_ROWID "WITH FILLFACTOR = 100" B_INDEX_SHAPE "WITH FILLFACTOR = 100" B_INDEX_RASTER "WITH FILLFACTOR = 100" B_STORAGE "" B_TEXT_IN_ROW 256 RAS_STORAGE "" RAS_CLUSTER_ID 1 RAS_INDEX_ID "WITH FILLFACTOR = 100" BND_STORAGE "" BND_CLUSTER_ID 0 BND_INDEX_ID "WITH FILLFACTOR = 100" BND_CLUSTER_COMPOSITE 0 BND_INDEX_COMPOSITE "WITH FILLFACTOR = 100" AUX_STORAGE "" AUX_CLUSTER_COMPOSITE 1 AUX_INDEX_COMPOSITE "WITH FILLFACTOR = 100" BLK_STORAGE "" BLK_CLUSTER_COMPOSITE 1 BLK_INDEX_COMPOSITE "WITH FILLFACTOR = 100" END
Oracle DBTUNE keywords example
##EARTH_15M RAS_STORAGE "PCTFREE 0 INITRANS 4 TABLESPACE EARTH" RAS_INDEX_ID "PCTFREE 0 INITRANS 4 TABLESPACE EARTH" BND_STORAGE "PCTFREE 0 INITRANS 4 TABLESPACE EARTH" BND_INDEX_COMPOSITE "PCTFREE 0 INITRANS 4 TABLESPACE EARTH" BND_INDEX_ID "PCTFREE 0 INITRANS 4 TABLESPACE EARTH" AUX_STORAGE "PCTFREE 0 INITRANS 4 TABLESPACE EARTH" AUX_INDEX_COMPOSITE "PCTFREE 0 INITRANS 4 TABLESPACE EARTH" BLK_STORAGE "PCTFREE 0 INITRANS 4 TABLESPACE EARTH_BLOCKS STORAGE (INITIAL 500M MINEXTENTS 10) LOB( BLOCK_DATA) STORE AS (ENABLE STORAGE IN ROW CHUNK 8K RETENTION CACHE)" BLK_INDEX_COMPOSITE "PCTFREE 0 INITRANS 4 TABLESPACE EARTH_BLOCKS" UI_TEXT"" END
DB2 DBTUNE keywords example
##EARTH_15M AUX_STORAGE "IN EARTH INDEX IN EARTH" BLK_STORAGE "IN EARTH INDEX IN EARTH LONG IN EARTH_BLOCKS" BND_STORAGE "IN EARTH INDEX IN EARTH" RAS_STORAGE "IN EARTH INDEX IN EARTH" END
Informix DBTUNE keywords example
##EARTH_15M RAS_STORAGE "EXTENT SIZE 16 NEXT SIZE 16 LOCK MODE ROW IN EARTH" RAS_INDEX_ID "FILLFACTOR 90 IN EARTH" BND_STORAGE EXTENT SIZE 16 NEXT SIZE 16 LOCK MODE ROW IN EARTH" BND_INDEX_COMPOSITE "FILLFACTOR 90 IN EARTH" BND_INDEX_ID "FILLFACTOR 90 IN EARTH" AUX_STORAGE "EXTENT SIZE 16 NEXT SIZE 16 LOCK MODE ROW IN EARTH" AUX_INDEX_COMPOSITE "FILLFACTOR 90 IN EARTH" BLK_STORAGE "EXTENT SIZE 1000 NEXT SIZE 1000 LOCK MODE ROW IN EARTH_BLOCKS" BLK_INDEX_COMPOSITE "FILLFACTOR 90 IN EARTH" END
Preparing the source data
Make the raster dataset source files available by staging them on disk drives accessible to ArcCatalog or the ArcObjects scripts used to load the data into the geodatabase.
CD or DVD
If the raster datasets are on a CD or DVD, copy them to a fast hard drive. Local hard drives are preferred to network hard drives. Loading files directly from the CD or DVD device is not advised, since the access time is much slower than hard drives.
If you are loading from a tape silo, the raster dataset files to be loaded should be staged to the online cache before you commence loading.
Organizing the source data
Image files should be grouped together into separate folders for the load operation that will consume them. Doing so allows the loading tool to open all the files within a folder and batch load them rather than load each file individually. For example, you can mosaic several raster datasets into a single raster dataset; load each raster dataset input into a single, stand-alone raster dataset; group raster datasets into a raster catalog; or create a raster catalog using multiple, mosaicked raster datasets. Operating systems have different limits on the number of opened files. Therefore, if you reach this limit, you will need to create additional folders and start another load process.
Applying a projection to your source data
Two basic workflows exist to project raster data. You can either apply the projection by creating another copy of the source image file or apply the projection as you insert the source image file into the geodatabase.
Testing has shown that more often than not, the best workflow is to apply the projection to a copy of the raster dataset source file and insert the projected raster dataset source into the geodatabase. Projecting the raster dataset source file while you load may actually be slightly faster than creating an intermediate projected source file, but if you are using several processes to insert many raster dataset sources in parallel, you are better off staging the projected raster dataset source files on the client, then loading them to the server because of the large amount of time required to project the raster data. By having many processes creating projected raster dataset source files and a few processes inserting the projected files into the geodatabase, you actually maximize the resources of client and server machines rather than have an idle server with bursts of activity that can occasionally overwhelm the server's resources.
Creating the raster object and loading the raster data
When you are not just storing multiple, stand-alone raster datasets, it is recommended that you create the raster catalog or raster dataset that will contain the raster datasets you are loading. However, in all situations, you need to be aware of the raster dataset properties you will be loading and the properties they will have once they are loaded.
You may have determined these properties when estimating the storage size requirements or when you allocated the storage space. You may have also edited the dbtune to create a keyword specifying all the storage parameters. If not, you need to determine these properties at this point.
There are four raster geodatabase settings that need to be considered: pyramids, raster statistics, compression, and tile size.
Pyramiding affects the display performance.
Pyramids can be built on a raster dataset as raster data is being mosaicked into the raster dataset, or they can be built when the loading is complete. Building the pyramids after the mosaicking is completed is the fastest method. ArcGIS allows partial pyramid construction, which rebuilds only the part of the pyramid overlapped by the source data during a mosaic operation. This helps when updating a mosaicked raster dataset, because if a new raster dataset is added, the entire raster dataset does not need to rebuild pyramids. However, if you update the data at the raster dataset's origin (pyramid reference point), the pyramid needs to be rebuilt for the entire raster dataset.
The raster dataset's origin is the raster dataset's upper leftmost coordinate. The construction of the pyramid begins at this coordinate and proceeds to the right and down. Mosaicking data to the left or above the raster dataset's origin requires ArcSDE to shift this point so that it remains the upper leftmost coordinate. Shifting the existing raster dataset's origin requires ArcSDE to rebuild the pyramid. Rebuilding the pyramid can be a costly operation, especially if the raster dataset has grown due to a number of raster dataset source files (or other raster datasets) that have already been mosaicked to it.
Because rebuilding the pyramid is a time-consuming operation, you should identify the raster dataset's upper left raster coordinate through analysis of your source data and enter it when you create the raster dataset. ArcSDE allows you to set a pyramid reference point when the raster object is created rather than use the upper left coordinate of the first raster dataset that is inserted. Therefore, it is possible to avoid shifting the raster dataset's origin by setting a pyramid reference point when you create the raster dataset.
Statistics are required for a raster dataset to perform some geoprocessing operations or certain tasks in ArcMap or ArcCatalog such as applying a contrast stretch or classifying data.
Compression is determined by the type of raster data along with the planned use.
ArcSDE subdivides raster datasets into one or more tiles for storage. Each tile is stored as binary data in the raster storage table. If a raster dataset is multiband, pixels from coincident raster blocks in different bands are stored in their own rows in the table.
Tiling raster datasets improves performance. For example, if you zoom in to a small area containing only four tiles, ArcSDE only needs to fetch four of the rows in the raster blocks table. Without partitioning the raster into tiles, ArcSDE would need to fetch the entire raster dataset.
Tiles are sized in terms of the number of rows and columns from which each tile is created. For example, a tile size of 128 by 128 is actually 128 pixels by 128 pixels. Large tiles result in large BLOB fields but fewer rows in the table, while smaller tiles result in small BLOB fields but many rows in the table. The tile size is set to 128 by 128 pixels by default. If the data being loaded is not three-band color imagery, it is possible that modifying the tile size could reduce the storage space requirements by efficiently fitting tiles into database blocks. To determine the best possible size, benchmarking with representative rasters is required. Generally speaking, larger tile sizes produce higher compression ratios and result in smaller tiles being stored in the database.
You can create raster objects using a geoprocessing tool (for example, in ArcCatalog), sderaster, or ArcObjects. Raster objects created by the geoprocessing tools are empty; they have properties but do not contain any pixel data, and they are geodatabase aware. To create a raster dataset, see Creating a raster dataset in an ArcSDE geodatabase. To create a raster catalog, see Creating a raster catalog in an ArcSDE geodatabase.
Raster objects created by sderaster always contain pixel data. They are not geodatabase aware; they lack the footprint column; and for the mosaic operation, the world files must align perfectly. For information on creating rasters with ArcObjects, see the ESRI Developer Network for samples from the Code Exchange for raster data that you can modify and use. For example, you can copy and modify the Create Geodatabase Raster Dataset sample to create a new stand-alone raster dataset. The Mosaic Rasters In A Directory and Subdirectories To An ArcSDE Raster samples provide you with the sample code to mosaic all the raster dataset sources within a directory or folder to a given stand-alone raster dataset. A raster catalog can be created using the Create Geodatabase Raster Catalog sample. You can combine this sample with the sample Loading Raster Datasets In A Workspace To A GDB Raster Catalog, which inserts raster dataset source files into a raster catalog.
Although using the ArcObjects software requires that you or a user at your location possess the skills to modify and execute the sample code, the potential benefits are as follows:
- Referencing a directory that contains many raster dataset source files rather than having to enter the raster dataset sources one at a time
- Adding flow of control to gracefully handle exceptions such as a corrupted source image file
Raster data can be loaded into a geodatabase in several ways: by using Import raster datasets (geodatabase context menu), the Copy Raster tool (geoprocessing tools), or Load Data (ArcCatalog dataset context menu). For more information on loading raster data into a geodatabase, see Importing raster datasets.
For information on loading raster data using sderaster, see Loading rasters in ArcSDE using sderaster.
Building statistics on the DBMS and raster can improve the raster display performance and appearance.
It is not always necessary to build DBMS statistics; however, all the supported ArcSDE DBMSs use cost-based optimization. Cost-based optimization uses statistics previously gathered from the DBMS objects to determine the best execution plan. SQL Server gathers statistics automatically as the data is loaded, but for all other DBMSs, you must generate DBMS statistics at least on the blocks table.
To build the DBMS statistics in ArcCatalog, do the following:
- Right-click the raster object and click Analyze.
- On the Analyze Components dialog box, make sure that the Raster Table component is checked.
- Click OK.
You can use the sdetable update_dbms_stats operation to generate statistics, as shown in the following example:
c:\>sdetable -o update_dbms_stats -t earth -m estimate -u mark -p mark -i 9000
The following is an Oracle example that quickly generates the statistics on the blocks table:
SQL> select rastercolumn_id from sde.raster_columns where table_name = 'EARTH'; RASTERCOLUMN_ID ----------------------- 1 SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(NULL, 'SDE)BLK_1; NULL, 1);
Raster statistics can be generated on the raster datasets and are stored in the ArcSDE AUX table. Raster statistics include the minimum, maximum, mean, and standard deviation. Generating the raster statistics on the base pyramid level can require a significant amount of time for very large raster datasets.
Raster statistics are necessary for data that must be statistically stretched before the objects captured in the raster can be viewed by the human eye.
To learn how to generate raster statistics, see Raster dataset statistics.
Viewing the raster data
The raster data should be viewed using the application that will be used for the project the raster was designed for, such as ArcMap, ArcCatalog, ArcGlobe, or ArcScene. If the pyramid was constructed and the DBMS statistics are current, the raster should display quickly.
If it takes a long time to display, noncurrent or nonexistent DBMS statistics are generally the problem. It is especially important that the DBMS statistics exist on the blocks table. Refer to the previous section, Building statistics, for instructions on building DBMS statistics.
With the advent of partial pyramid construction, it is now possible to view a raster dataset during a large mosaic operation. However, while viewing a raster dataset during a mosaic operation, do not become alarmed if it periodically disappears and reappears when viewed at low resolution. This behavior is a product of partial pyramid construction, a result of the updated upper levels of the pyramid being deleted and reinserted.