Geodatabase transaction management
Transactions are packages of work that make changes to databases. Geographic information system (GIS) databases, like other database applications, must support update transactions that enforce data integrity and application behavior. In many cases, users can use the database management system's (DBMS) transaction framework for managing edits and updates to geodatabases.
However, GIS users universally also have some specialized transactional requirements. For example:
- Often, multiple records are updated in unison as a single transaction.
- Numerous transactions must span long periods of time (sometimes days and months, not just seconds or minutes).
Additionally, users need to be able to undo and redo changes. Editing sessions can span several hours or even days. Often, the edits must be performed in a system that is disconnected from the central, shared database.
Because GIS workflow processes may span days or months, the GIS database must remain continuously available for daily operations, where each user might have a personal view or state of the shared GIS database. In a multiuser database, the GIS transactions must be managed on the DBMS's short transaction framework. The ArcSDE technology plays a key role during these operations by managing the high-level, complex GIS transactions on the simple DBMS transaction framework.
GIS users have many cases in which long transaction workflows are critical. In most instances, these are made possible through the use of a multiuser DBMS and ArcSDE to manage updates to the central GIS database using versioning, which you can read more about below.
The following are examples of GIS data compilation workflows that require a version-based transaction model:
- Multiple edit sessions—A single GIS database update may require numerous changes that span multiple edit sessions occurring over a few days or weeks.
- Multiuser editing—Multiple editors often need to concurrently update the same spatially integrated features. Each user needs to work with a personalized database state, viewing individual updates and ignoring updates by other editors. Eventually, each user needs to post and reconcile updates with the other editors to identify and resolve any conflicts.
- Checkout/Check-in transactions—It is often necessary to check out a portion of a database for a particular area or district to a personal computer and update that information in a disconnected session that could last for days or weeks or, increasingly, to take into the field for mobile editing and updating. Those updates must be posted to the main database.
- History—Sometimes it is advantageous to maintain a historical version of each feature in a GIS database, even after that particular version has been updated, to maintain a copy of the retired and changed features in an archive or to track an individual feature's history—for example, parcel lineage or feature update properties in a national mapping database.
- Transfer of change-only updates—Enterprise databases and spatial data infrastructures in which information is shared across a range of organizations are collaborative efforts that require the sharing of updates across the Internet in a well-defined extensible markup language (XML) schema for sharing change-only updates between databases.
- Distributed geographic database replicas—A regional database may be a partial copy of a main corporate GIS database for a particular geographic region. Periodically, the two databases must be synchronized by exchanging updates.
- Loosely coupled replication across DBMSs—Often, GIS data must be synchronized among a series of database copies (replicas), where each site performs its own updates on its local database. Often, the databases are only periodically connected via the Web. On a scheduled basis, the updates must be transferred from each database replica to the others and their contents synchronized. Many times, the DBMSs are different—for example, replicating datasets between Microsoft SQL Server, Oracle, and IBM DB2.
The geodatabase transaction model: Versioning
The geodatabase mechanism for managing these and many other critical GIS workflows is to maintain multiple states in the geodatabase and, most importantly, to do so while ensuring the integrity of the geographic information, rules, and behavior. This ability to manage, work with, and view multiple states is based on versioning. As the name implies, versioning explicitly records versions of individual features and objects as they are modified, added, and retired through various states. Each version explicitly records each state of a feature or object as a row in a table along with important transaction information. Any number of users can simultaneously work with and manage multiple versions.
Versions enable all transactions to be recorded as a series of changes to the database through time. This means that various users can work with multiple views or states of the geodatabase. The goal is open, high-performance, multiuser access. For example, the system must go fast and must productively support the use of datasets containing hundreds of millions of records accessed by thousands of simultaneous users.
The geodatabase transaction model based on versions is relatively simple—updates are recorded in change tables.
Versions explicitly record the object states of a geodatabase in two delta tables:
- The Adds table
- The Deletes table
Simple queries are used to view and work with any desired state of the geodatabase, for example, to view the database state for a point in time or see a particular user's current version with his or her edits.
ArcSDE plays a critical role in versioned geodatabase applications and is used to manage long transactions in each DBMS by leveraging its short transaction framework as well as to work across different DBMSs.
In the version table example, a parcel (number 45) is updated to become parcel number 47. Using versioning, the original parcel is saved in the Deletes table and the new parcel is saved in the Adds table. Other meta tables record version information about the transaction such as the time and sequence of each update, the version name, and the state ID of each update. Each version also has its own security and access privileges.
This enables most users to work with the default version while various editors are simultaneously making updates to their own versions of the database over time.
Numerous updates can be made to each version, and users connect to and work with the update version as they make additional edits to the data. When users are ready to share the updates with the rest of the enterprise, a reconcile and post operation is performed to commit the edits held in the update version to the main (default) version. A resolution process is used to identify and reconcile any potential conflicts during this process.
To learn more about versioning, see Understanding versioning.