Home    |    Concepts   |   API   |   Samples
Concepts > XML > Basic Principles
XPath Support

To perform an XPath-based XML document search, tags and their data must be indexed. In ArcSDE, tagged data can be indexed according to the data type. Three data types are supported:

SE_XML_INDEX_STRING_TYPE Used for text data
SE_XML_INDEX_DOUBLE_TYPE Used for numeric data
SE_XML_INDEX_VARCHAR_TYPE Used for character (char) data

The double data type can be used for numeric data, such as 10, 23.285, or 19950603. The string data type can be defined for text strings greater than or equal to 256 characters in length, while the varchar data type is appropriate for text strings shorter than 256 characters, such as coded values or abbreviations (CDC, UNESCO, or ACASIAN).

A full-text index is created for the string data type. A B-tree index is created on double and varchar data types. A full-text index is not suitable for coded values (double) or short strings for the following reasons:

  • The full-text indexer is slower than a B-tree indexer.
  • Full-text indexes are large.
  • Full-text indexes are updated outside the transaction scope. This may be acceptable for publishing metadata documents, but can be a problem when used to index feature property sets.

Note: Full-text indexes are not available on all database management systems.

The XPath expression is defined according to the tag’s data type. For example, consider the following XML document:

<?xml version = "1.0" ?>
<metadata>
  <Esri MetaID="1000">
   <Theme lang="Chinese">Monitoring Diseases</Theme>
    <idCitation>
     <Title area = "10000">Migratory Birds and Spread of West Nile Virus in Asia.
     </Title>
     <Date>19950603</Date>
     <RespParty>
       <OrgName>ACASIAN</OrgName>      
     </RespParty>
    </idCitation>
  </Esri>
</metadata>

In this example, you would index three tags:

  • /metadata/Esri/idCitation/Title
  • /metadata/Esri/idCitation/Date
  • /metadata/Esri/idCitation/RespParty/OrgName

Based on the data of these tags, define the first tag as string, the second as double, and the third as the varchar data type.

Search the XML document by defining the XPath for these tags:

  • /metadata/Esri/idCitation/[contains(Title = "West Nile")]
  • /metadata/Esri/idCitation/[Date = 1995]
  • /metadata/Esri/idCitation/RespParty/[OrgName = "ACASIAN"]

This is a simple example. A string or varchar data type could be used for the second tag. However, in that case, you cannot use relational operators (<, >, <=, >=).

Once the data type for a tag is defined, there are many ways to formulate an XPath expression to search XML documents.

Location path

XPath is an expression language. The ArcSDE XML type implementation supports a limited number of XPath expressions. One of the main expressions is a location path. A location path is a direction and may consist of a sequence of steps to select a node in a tree. Each step is separated by a forward slash (/). A sequence of steps in the expression forms a relationship between nodes in the document. The relationship is called an axis. Every axis is followed by a node test. A node test indicates which nodes are to be selected.

A location path can be abbreviated or unabbreviated. For example:

Abbreviated Unabbreviated
idCitation[Date = 1995] child::idCitation[child:: Date = 1995]
Esri[@MetaID="1000"] child::Esri[attribute::MetaID ="1000"]

Only the abbreviated form of location paths is supported.

In general, a step in a location path consists of the following:

<axis><node_test><zero_or_many_ predicates>

Therefore, an XPath expression can be broken into four parts:

Axes Node tests Predicates Functions

Axes

The axis contains a part of the document, defined from the perspective of the current node (also called the context node). It determines what general category of nodes may be considered for the node tests.

The ArcSDE XML API supports the following three XPath axes:

Axis Description Abbreviation
Attribute Contains all attributes on the context/current node. The axis will be empty unless the context node is an element @
Descendant-or-self Includes the current node and all its children, including all children's children recursively //
Self (within predicates only) Contains only the current node .

XPath axes supported by ArcSDE API

Top


Node tests

A node test limits the specific elements or attributes that will be addressed. A node test can be a NameTest or TypeTest. A NameTest selects nodes by name. A TypeTest selects nodes based on the type of node. For example:

NameTest
/metadata/Esri/idCitation/RespParty[OrgName = 'ACASIAN']

TypeTest
/metadata/Esri/idCitation/RespParty/OrgName/node()

The ArcSDE API supports the following two types of NameTests for node tests:

Qualified node name The name of an element or attribute node in the location path
* Selects all children nodes

This node test is required to search all tags for a specific value.

Note: If the node name is qualified with a namespace or a namespace prefix, that qualifier must be present in the node's tag within the XML document. For example, a search that uses the full namespace will not find documents that use the namespace prefix in the given node's tag. For more information on XML namespaces, see http://www.w3.org/TR/REC-xml-names.

 

Top


Predicates

A predicate filters a node set from an XPath expression, thereby reducing the returned node set from an XPath query.

Generally, predicates are enclosed in square brackets.

    [<node><relational_operator><value>]

<node> Can be an attribute of the current node (@attribute_name), or a child node of the current node (child_name)
<relational_operator> The following relational operators are supported:  =, !=, <, >, <=, >=

Note: The XPath expression is not itself part of an XML document; therefore, there is no need to escape the special "<" sign with "&lt;".

<value> The literal value to which the node’s value is being compared. String literals must be enclosed in double quotes

A predicate can consist of multiple expressions that are separated by AND or OR. Parentheses can be used to group expressions for the purpose of changing the evaluation order.

Positional predicates are not supported. For example, XPath allows an expression of the form /mynode[5] to return the 5th occurrence of the node "mynode". XPaths are not supported that test the nth position of a tag for a given value; all occurrences of tags with the given name will be searched.

 

Top


Functions

XPath 1.0 provides support for many types of functions.

Function type Examples
Node Set Functions last()
position()
count()
String Functions concat()
substring()
contains()
Boolean Functions true
false
not()
Number Functions sum()
round()
floor()

The majority of these functions are not supported by the ArcSDE XML type implementation. Only one string function—contains()—is supported.

contains (string1, string2)—Returns true if the string1 contains string2; otherwise, it returns false. Note that [contains(Title = "Bird")] is not the same as [Title = "Bird"]. The first expression returns true for all Titles containing the word "Bird" or another form of the word, such as "Birds" or "Birding". The second expression returns true only for the Title "Bird".

The ArcSDE API supports the contains() function in the format [contains(string1, string2)]. Other forms, such as [contains(string1, string2)=boolean] or [contains(string1, string2)!=boolean] are not supported.

Location path aliases

ArcSDE also supports numeric aliases for location paths. Location path aliases do not have to be unique. For example, the FGDC Z39.50 code 31 refers to a Publicaton Date attribute, which is present in about half a dozen elements in the XML schema for FGDC metadata documents. Hence, a query on a path alias may actually search multiple location paths.

When specifying a search criterion with an alias, the numerical alias value is specified in a location_alias function. The location_alias function is not part of the XPath standard. It is an ArcSDE extension used to specify a location path using a numeric value.

For instance, for Example 17 in the "Supported XPath expressions" section, the following tag can be defined as alias 1:

/metadata/Esri/idCitation/Date

Top

Supported XPath expressions

Some examples of XPath expressions are provided here for illustration purpose. These examples cover three tag data types: SE_XML_INDEX_STRING_TYPE, SE_XML_INDEX_DOUBLE_TYPE, and SE_XML_INDEX_VARCHAR_TYPE. Note that SE_XML_INDEX_VARCHAR_TYPE should be used for simple data or single words because it uses a B-tree index, while SE_XML_INDEX_STRING_TYPE should be used for text because it uses full-text indexing.

Examples 9 through 12 demonstrate how you can construct ‘A AND B’, ‘(A AND B)’, ‘A OR B’ and ‘(A OR B)’ types of expressions, respectively. Example 13 shows how to use an ‘OR within contains clause’. Example 14 demonstrates the use of an ‘OR within predicate’, while Example 15 shows the use of an ‘AND within contains' clause.

Other patterns are also supported. These include:

  • (A AND(B))
  • ((A AND(B)) AND C )
  • ((A AND(B)) AND C ) OR D
  • (A OR B) AND (C)
  • A OR (B AND C)
  • A OR B AND C
  • (A OR B) AND (C)
  • (A OR B) AND C
  • (A AND B[C AND D])
  • (A AND B[C OR D])
  • (A[B OR C]) AND ( D OR E)

Each combination can be tested on different tag data types (string, double, and varchar). For example, the (A AND B[C AND D]) combination can be tested as

  • (Ad AND Bv[Cv AND Dv])
  • (Ad AND Bs[Cs AND Ds])
  • (Av AND Bd[Cd AND Dd])
  • (As AND Bd[Cd AND Dd])
where d, v, and s represent double, varchar, and string data types, respectively. Example 16 shows an example of the (Ad AND Bs[Cs AND Ds]) combination.

Use of the string or varchar data type does not restrict the operations on string or varchar data. For example, the contains() function can be used in XPath for a string or varchar data type. Examples 5 and 7 show the use of contains() or the equal (=) operator for the varchar data type.

If the contains() function is used in the XPath expression for a tag search, like it is in example 5, the LIKE function is used in the SQL statement. If the contains() function is not used in a tag search, such as in example 7, the contains() function is used for a full-text search.

The CONTAINS and LIKE functions used in SQL statements are not the same. For example, given there is a col_name value of 'Bird', these two SQL statements return different results:

SELECT col_name FROM table_name WHERE CONTAINS(col_name, 'Birds');

SELECT col_name FROM table_name WHERE col_name LIKE '%Birds%';

The first statement returns the word 'Bird', whereas the second returns no records.

Examples

Example 1

Find all documents with a /metadata/idCitation/Title element equal to 'Migratory Birds'.

XPath: /metadata/Esri/idCitation[Title = 'Migratory Birds']

Metadata, Esri, and idCitation are all node tests. The slashes between them specify child axes. In other words, start at metadata and find all child nodes named 'idCitation').

Example 2

Find all documents with a /metadata/idCitation/Title element containing the word 'Bird'.

XPath: /metadata/Esri/idCitation[contains(Title, 'Bird')]

Example 3

Find all documents with a /metadata/idCitation/Title element containing the word 'Bird' or 'Virus'.

XPath: /metadata/Esri/idCitation[contains(Title, 'Bird') OR contains(Title, ' Virus')]

Note that OR is used within the contains() functions. Similarly, AND can be used if you want to find all documents that contain both words.

Example 4

Find all documents with a /metadata/idCitation/RespParty/OrgName element containing the word 'ACASIAN'.

XPath: /metadata/Esri/idCitation/RespParty[OrgName = 'ACASIAN']

Example 5

Find all documents with a /metadata/Esri element with a metaID attribute equal to 1001 and a /metadata/Esri/idCitation/RespParty/OrgName element equal to the word 'UNESCO'.

XPath: /metadata/Esri[@MetaID = 1001] AND /metadata/Esri/idCitation/RespParty[OrgName = 'UNESCO']

Or you could use the following:

XPath: /metadata/Esri/idcitation/Title[@area < 10000] AND /metadata/Esri/idCitation/RespParty[contains(OrgName, 'UNESCO')]

Note: The same can be written as follows but this type of expression is not supported for ArcSDE XML.

XPath: /metadata/Esri[@MetaID = 1001]/idCitation/RespParty[OrgName = 'UNESCO']

Example 6

Find all documents with a /metadata/Esri/idCitation/Title element with an area attribute less than or equal to 250,000, a /metadata/Esri/idCitation/Title element containing the word 'America' or 'Africa', and a /metadata/Esri/idCitation/RespParty/OrgName element equal to the word 'ACASIAN'.

XPath: /metadata/Esri/idCitation/Title[@area <= 250000] AND (/metadata/Esri/idCitation[contains(Title,'America') OR contains(Title,'Africa')]) AND /metadata/Esri/idCitation/RespParty[OrgName = 'ACASIAN']

Similarly, other XPaths can be defined as follows:

XPath: /metadata/Esri/idCitation/Title[@area < 250000] AND (/metadata/Esri/idCitation[contains(Title,'America') OR contains(Title,'Africa')]) AND /metadata/Esri/idCitation[contains(Title,'Virus')]

XPath: /metadata/Esri/idCitation/Title[@area < 250000] AND (/metadata/Esri/idCitation[contains(Title,'America') OR contains(Title,'Africa')]) AND /metadata/Esri[@MetaID >= 1001]

Example 7

Find all documents that contain specified values in any tag.

A. Find all documents that contain the word 'Birds' in any tag.

XPath: //*[contains (. , 'Birds')]

B. Find all documents, where the value is equal to 4,200.

XPath: //*[. = 4200]

C. Find all documents that have the word ' ACASIAN' in any tag.

XPath: //*[. = 'ACASIAN']

Or you could use:

XPath: //*[contains(. , 'ACASIAN')]

Example 8

Find all documents where a /metadata/Esri/idCitation/Date node has value less than 19950101 and the document contains the phrase ‘'West Nile Virus'.

XPath: /metadata/Esri/idCitation[Date < 19950101] AND //*[contains(.,'West Nile Virus')]

Example 9

Find all documents where /metadata/Esri/idCitation/Date node has date equal to 19940112 and any node contains the word 'UNESCO'.

XPath: /metadata/Esri/idCitation[Date = 19940112] AND //*[contains(.,'UNESCO')]

Example 10

Find all documents where the attribute MetaID of a /metadata/Esri node is equal to 1001 and a /metadata/Esri/idCitation/Title node contains the word 'Africa'.

XPath: (/metadata/Esri[@MetaID = 1001] AND /metadata/Esri/idCitation[contains(Title,'Africa')])

Example 11

Find all documents where any node contains the word 'mosquitoes' or a /metadata/Esri/idCitation/Title node contains the word 'Africa'.

XPath: //*[contains(.,'mosquitoes')] OR /metadata/Esri/idCitation[contains(Title,'Africa')]

Example 12

Find all documents where attribute MetaID of /metadata/Esri node has a value of 1003 or a /metadata/Esri/idCitation/Date node has date equal to 19940112.

XPath: (/metadata/Esri[@MetaID = 1003] OR /metadata/Esri/idCitation[Date = 19940112])

Example 13

Find all documents where any node contains the word 'Migratory Birds' or 'mosquitoes'.

XPath: (//*[contains(.,'Migratory Birds') or contains(.,'mosquitoes') ] )

Example 14

Find all documents where a /metadata/Esri/idCitation/RespParty/OrgName node has the word 'ACASIAN' or 'CDC' and any node contains the world 'mosquitoes'.

XPath: (/metadata/Esri/idCitation/RespParty[OrgName = 'ACASIAN' OR OrgName = 'CDC'] AND //*[contains(.,'mosquitoes')])

Example 15

Find all documents where any node contains the word 'Migratory Birds' and 'Virus' and a /metadata/Esri/idCitation/Date node has a value less than 20050902.

XPath: //*[contains(.,'Migratory Birds') AND contains(.,'Virus')] AND /metadata/Esri/idCitation[Date < 20050902]

Example 16

Find all documents where a /metadata/Esri/@MetaID node has a value greater than or equal to 1000 and a /metadata/Esri/idCitation/Title node contains 'West Nile Virus' and 'Asia'.

XPath: (/metadata/Esri[@MetaID >= 1000] AND /metadata/Esri/idCitation[contains(Title,'West Nile Virus') AND contains(Title,'Asia')])

Example 17

Using aliases, find all documents with Date 19950603.

XPath: location_alias(1) = 19950603

Unsupported XPaths

The following are a few examples of unsupported XPaths. Do not use these with ArcSDE XML API.

  • Select all documents where the attribute MetaID of all /metadata/Esri elements is equal to 1001 and a /metadata/Esri idCitation/RespParty/OrgName element contains the word 'UNESCO'.

    XPath: /metadata/Esri[@MetaID = 1001]/idCitation/RespParty[OrgName = 'UNESCO']

    This type of step is not supported. This XPath should be broken into two expressions. See Example 5 for an example of how this should be rewritten.

     

  • Select all documents where attribute Title of a /metadata/Esri/idCitation tag does not contains the word 'Africa'.

    XPath:/metadata/Esri/idCitation[contains(Title,'Africa') != true]

    This form of the contains() function is not supported. See Functions for a list of supported functions.

     

  • Select all text nodes of /metadata/Esri/idCitation tag.

    XPath: /metadata/Esri/idCitation/text()

    This expression uses a TypeTest, text(), which is not a supported node test.

     

  • Select all nodes with a /metadata/Esri/idCitation element tag.

    XPath: /metadata/Esri/idCitation/node()

    This expression uses a TypeTest, node(), which is not a supported node test. See Node tests for more information.

     

  • Select all documents with a /metadata/Esri[1]/idCitation/Theme element for first occurrence of the node Esri.

    XPath: /metadata/Esri[1]/idCitation[Theme = 'Monitoring']

    Positional predicates are not supported. See Predicates for supported predicates.

feedback | privacy | legal