Home    |    Concepts   |   API   |   Samples
Concepts > XML
Basic Principles

The ArcSDE XML API provides support for storing XML documents in a database and allows for searching data stored in the documents, such as ArcGIS metadata.

ArcSDE XML was introduced in ArcSDE 9.0 for most databases and in Informix in ArcSDE 9.3. Starting with ArcSDE 10.0, support has been added for native XML columns in IBM DB2 (PureXML), Oracle 11g (Oracle XML DB), PostgreSQL, and Microsoft SQL Server databases.

Native XML columns

Most database management systems (DBMS) offer an XML data type of their own. This data type is present in default installations of these databases but, in some cases, it is possible that the necessary XML component was not installed when the DBMS was setup. Refer to your DBMS documentation for details on installing and using the database's native XML type.

ArcSDE supports creating native XML columns in geodatabases using the ArcSDE API. In DB2, Oracle, and SQL Server databases, you can also use the ArcSDE API to create and use XML schemas.

The following table lists support for different XML features in each DBMS:

XML features DB2 Oracle PostgreSQL SQL Server
XML operators Y Y N Y
Full-text indexing Y Y N Y
Indexing tags Y

Data types: varchar, varchar hashed, double, date, and timestamp
Y N Y

All tags are indexed for path, propetry, and value.
SQL/XML queries Y Y Y Y
XQuery Y Y N Y
XML schema validation Y

Can define multiple schemas per XML column
Y

One schema per XML column; only available with the binary XML storage type
N Y

Can define multiple schemas per XML column
Namespaces Y Y   Y

To see which DBMS releases are required to use native XML in an ArcSDE geodatabase, see the system requirements page on the ESRI support site.

ArcSDE XML columns

The ArcSDE XML column represents a column in an ArcSDE table, similar to spatial or raster columns, and contains references to associated XML data, which is stored in several internal tables. These internal tables are created and managed by ArcSDE. They store associated XML data such as indexes, tags, and documents. The XML database schema shows an XML column's association with these internal tables.

In the Java API, the XML column is represented by the SeXmlColumn class, while the C API provides several functions (SE_xml*). more ...

XML index and tags

ArcSDE supports two indexing concepts:
  • Index templates—These define all location paths and data types in a particular schema.
  • Index definitions—These define which location paths are to be indexed for a particular XML column.

An XML index can have three types of tags:

  • SE_XML_INDEX_DOUBLE_TYPE—Used for all numeric tags
  • SE_XML_INDEX_STRING_TYPE—Used for all string tags
  • SE_XML_INDEX_VARCHAR_TYPE—Used for all text (simple strings) tags
Search on XML columns can be tag-based (XPATH) or full-text search. A tag-based search uses the tags to search the XML documents. An index is used to enable searches on XML columns. An XML column can have two types of indexes to support different types of text searches.

  • Text index:
    • A text index associated with the XML document supports full text searches against all content in the XML document. The XML tags in the document are transparent to the text index. In other words, the text index is created in the database on the content with all XML markup removed, thus allowing quick and efficient text searches on the document.
    • A text index associated with the XML tag (SE_XML_INDEX_STRING_TYPE) supports tags-based (xpath) searches in the XML document.

  • BTREE index—Full text indexes, apart from being bigger and slower than BTREE indexes, are updated outside the transaction scope, which is inefficient when used to index feature property sets. Further, full text indexes are not available on all databases. To overcome the limitation of full-text indexes, the tag-indexing model has been extended to include Numbers, Strings, and Text. Text tags (simple strings) would now be indexed using a BTREE, while string tags use the full-text indexer.

Search on XML columns can either be tag-based (XPATH) or full-text search. A tag-based search uses tags to search the XML documents. An index is used to enable searches on XML columns. An XML column can have three types of text indexes to support different types of searches.

  • Text index—A text index associated with the XML document supports full text searches against all content in the XML document. The XML tags in the document are transparent to the text index. In other words, the text index is created in the database on the content with all XML markup removed, thus allowing quick and efficient text searches on the document.
     
  • BTREE index—Full text indexes, apart from being bigger and slower than BTREE indexes, are updated outside the transaction scope, which is inefficient when used to index feature property sets. Further, full text indexes are not available on all databases. To overcome the limitation of full text indexes, the tag-indexing model has been extended to include Numbers, Strings, and Text. Thus, simple strings would now be indexed using a BTREE, while string tags will use the full text indexer. more ...

XML Document

An XML document is a hierarchically structured text document consisting of labels, called tags, and data. It is used for storing and retrieving XML documents in ArcSDE.

The ArcSDE API uses the XML Document object (SeXmlDoc in Java, SE_xml_doc* functions in C) to store and retrieve an XML document from ArcSDE. ArcSDE Insert and Query stream objects communicate with the XML Doc object, which in turn is associated with the XML column. more ...

XPATH Searches

XPath is used to search XML documents based on an XPath constraint or expression. It is an expression language to address the parts of an XML document. The ArcSDE XPath implementation follows the XPath 1.0 (http://www.w3.org/TR/xpath) specifications. This implementation is also compatible with XPath 2.0 (http://www.w3.org/TR/xpath20). XPath 1.0 is based on two main components.

  • Atomic values for three data types
  • Node trees to represents the structure of an XML document

Operations can be performed on these two components.

  • Operations on atomic values: for example [Area > 2500]
  • Operations on node trees to select nodes: for example /RespParty[OrgName = "ACASIAN"]

Operations on node trees are the core of the XPath language because XML documents are structured as node trees.

ArcSDE supports three atomic values for data types: string, double, and char. It supports a limited number of operations to select nodes in the trees.

An index associated with specific XML elements supports searching text or numeric data in those elements. This index is associated with the SDE_XML_INDEXES table. As an example of using an XPath index, the user might search only the content in an XML element containing scale information to find GIS resources that are appropriate for the user’s application.

A tag or XPATH index on an XML column is an optional index. It contains a list all tags and their data types and lets people search the content of that specific XML element or attribute in each document. It can be created manually, or based on an ArcSDE analysis of some sample documents, or based on a set of templates.

See also

XML API Entities
About XML (external link)

feedback | privacy | legal