Minimize disk I/O contention in Oracle
Of all the configurable components of a geodatabase, storage is perhaps the most frequently and extensively customized. Likewise, each database administrator (DBA) has a preferred method for organizing both the logical and physical storage structures of the Oracle database, based on research and techniques that have proven successful in the past.
You have broad flexibility to design a storage model for your geodatabase that fits the specific needs of your data, applications, and existing management policies. ArcSDE has few strict storage requirements. You may choose to deploy an entry-level computer with a single data disk and one tablespace for GIS data, or a high-end server with dozens of disk arrays and hundreds of Oracle files, each successfully supporting their intended environments. Fortunately, you can adapt both ArcSDE and Oracle to take advantage of whatever resources are available to run your geodatabase.
To minimize disk I/O contention for Oracle databases, you can position frequently accessed files on separate disks, when possible, and group on the same disks frequently accessed files with infrequently accessed files. To do this,
- Estimate the size of all the database components and determine their relative rates of access.
- Position the components given the amount of disk space available and the size and number of disk drives.
Diagramming the disk drives and labeling them with the components help keep track of the location of each component. Have the diagram handy when you create your database.
Some recommendations on how to avoid resource contention on an ArcSDE geodatabase stored in Oracle are listed below. For an explanation of the Oracle components discussed here, such as tablespaces and segments, consult your Oracle documentation.
-
Keep your database design as simple as possible.
This does not mean that your design must be inherently simple, only that complexity in the design should stem from complexity in the data you are modeling, rather than arbitrary decisions.Start with a single tablespace for your geodatabase, then justify and document each additional tablespace you create. Documenting the purpose of each tablespace will not only prove useful for anyone working on the system in the future (yourself included) but will also force you to consider more carefully the benefit of each additional tablespace you need to manage.
-
Separate system segments from user segments.
Keeping user segments, such as feature classes, separate from system segments, such as the ArcSDE and Oracle data dictionaries, simplifies quota management and avoids fragmentation that can lead to decreased performance. Furthermore, it is logically easier to monitor activity at the tablespace level when these segments are stored independently.
-
Separate data for different projects.
Using dedicated tablespaces for different projects, departments, or other logical entities can facilitate monitoring and management. Restore operations affecting one project's tablespace will not take another project offline. You can assign unlimited quotas on a set of tablespaces to users in one department without the risk of them exhausting space for another department. Linking I/O activity and file growth to teams or individuals is easier when those entities use their own tablespaces.
-
Separate large segments from small segments.
You can enforce extent sizes at the tablespace level rather than only at the segment level. Using a uniform extent size for all segments in a tablespace eliminates free space fragmentation and encourages high performance. However, this requires grouping segments by extent size to balance the number of extents per segment, leading to wasted space from excessive extent sizes.Extent sizes of 128 KB for small tables, 1 MB for large feature classes, and 128 MB for large rasters are reasonable, though you can customize these values based on your own environment and research.
-
Separate read-only data from writable data.
If a tablespace contains entirely read-only data, you can put the tablespace explicitly in read-only mode. This reduces the volume of data that you need to back up regularly. Read-only data files are also excellent candidates for storage on redundant array of independent disks (RAID) 5 arrays because they will benefit from striping during read access and, since they are read-only, they won't decrease array performance with excessive write activity.
-
Use multiple disks or arrays to store files.
Important Oracle files, such as control files, online redo logs, and archived redo logs, should be multiplexed, or mirrored, by the Oracle software to provide maximum protection.
Even on database servers with a single-volume storage array, a stand-alone internal disk is usually available for storing the operating system, page file, and ArcSDE and Oracle executables. Use this disk for storing multiplexed control and redo log files.Caution:Control files record critical information such as a list of files that participate in the database. Online and archived redo log files track changes to the database for recovery purposes. It is difficult or impossible to fully recover a database without current copies of these files.
-
If using RAID storage, use RAID types appropriately.
A RAID is a class of storage management services. There are several generic RAID strategies. These strategies are denoted by a number, or RAID level. They are as follows:RAID 0, or striping, stores small pieces of the same file system across multiple physical disks in units called stripes. The advantage of striping is improved performance. By spreading a file system's contents across multiple devices, the RAID controller can read from and write to multiple disks at the same time. The disadvantage to RAID 0 is that when any one disk in the RAID 0 array goes offline, the entire array is unavailable.
RAID 1, or mirroring, stores a duplicate copy of everything written to one disk on a second disk. The advantage of mirroring is data protection. A database can lose one disk with no data loss and no degradation of service during the crash event. For this reason, RAID 1 is used with many geodatabases, especially those with high availability requirements. The disadvantage to mirroring is cost. Because data is stored twice, twice as many disks are necessary to store the same information, compared to stand-alone or striped disks. A small write penalty is also involved due to the need to write duplicate data.RAID 10, also called RAID 1+0, combines the advantages of both RAID levels 1 and 0. RAID 10 arrays stripe data across sets of mirrored disks. This provides the performance advantage of RAID 0 and the data protection advantage of RAID 1. RAID 10, and vendor-specific implementations based on the RAID 10 strategy, offer the best I/O performance for busy geodatabases.
RAID 10 is expensive because it requires additional hardware to store the mirrored data. You may consider using RAID 10 selectively to protect and provide high performance for your busiest files, choosing other RAID or stand-alone configurations for read-only, archived, or less frequently modified data.If you can afford to use RAID 10 for all your database storage, do so. Otherwise, consider mixing various RAID and stand-alone disk configurations to achieve the best reliability and performance for the hardware at your disposal.Whenever possible, store write-intensive files on RAID 1 or RAID 10 devices. Such files include redo logs and data files for undo tablespaces. If necessary, use stand-alone disks with Oracle multiplexing and a comprehensive backup strategy.
RAID 5, also called striping with rotating parity, requires the equivalent of only one additional disk in the entire array for storing redundant information. To accomplish this, RAID 5 stripes data across multiple disks, then stores one additional chunk of parity information for the whole stripe. If one disk in the array goes offline, the RAID processor can reconstruct the missing data from the remaining disks and the parity information.The advantages of RAID 5 are improved read performance through the use of striping, and low-cost redundancy by using parity information rather than full mirroring. Because geodatabases tend to be read intensive, RAID 5 is well suited for geodatabase applications, especially when moderately high availability is required.
However, because RAID 5 does not store fully redundant information, it is more susceptible to data loss than RAID 10. Although such occurrences are rare, if two disks in a RAID 5 array go offline simultaneously, the array does not have enough remaining information to reconstruct the missing data, so the whole volume must go offline. Performance can also suffer in two cases. First, when data is written to the array, parity information must also be computed and stored. Second, when a drive is offline, read and write performance decrease dramatically while the RAID processor reconstructs data for the missing disk on the fly. For a highly active geodatabase, the available throughput during these events may be insufficient to provide an acceptable level of service.
-
Utilize Oracle automated storage management.
Oracle 10g introduced the automated storage management (ASM) feature. ASM is essentially a RAID system that is optimized for and dedicated to servicing Oracle databases. ASM uses an Oracle instance to broker data I/O requests to a group of raw partitions that it manages as a disk group. Disk groups can provide striping, mirroring, and a special mirroring called high redundancy that stores three copies of data instead of the normal two.
-
Use a small PCTFREE value for read-only data.
Oracle allows you to reserve a certain amount of free space in each data block when inserting new data into a table. Once the free space limit for a block is reached, Oracle will not insert additional data into that block.This free space can be used only by updates to existing rows stored in that block. By reserving space for update operations only, you can prevent rows from exceeding the available space in their original block and having to migrate to a new block. Migration is an expensive operation for the row, both at update time and when accessed in subsequent operations such as queries.
Many geodatabases are updated infrequently, either because the tables are static—as is largely the case with rasters—or because they are edited in a multiversioned workflow, in which case pairs of delete and insert operations substitute for actual updates.
Therefore, to maximize the amount of space in each block used for storing geodatabase data, ArcSDE is preconfigured to reserve no free space by setting the PCTFREE 0 clause in the DBTUNE storage strings. If you want to reserve free space for SQL updates by custom applications, alter the PCTFREE values in the default DBTUNE configuration. To learn more about DBTUNE configuration, see What is the DBTUNE table?, What are DBTUNE configuration keywords and parameters?, and DBTUNE configuration parameters in Oracle.