xml index


Home \| Concepts \| API \| Samples

Concepts > XML > API Entities

XML Index and Tags

An index in the ArcSDE API is used to enable searches on XML columns, and a tag represents an element.

An ArcSDE XML column can have two types of indexes to support different types of searches.

Text Index—A text index associated with the XML document supports full-text searches against all content in the XML document. The XML tags in the document are transparent to the text index (in other words, the text index is created in the database on the content with all XML markup removed), thus allowing quick and efficient text searches on the document.

A text index is created by default on all element/attribute values. ArcSDE parses the XML document for values of all elements and/or attributes and stores them in an SDE_XML_DOC table that supports full doc text queries. For example: In the following XML document, a text index will search only for values of elements such as "New Mexico", "4", and attribute values such as "YES" for the LISTED attribute in MARK tag.

<gn>
    <coverage>New Mexico</coverage>
    <suptheme>4</suptheme>
    <MARK NUMBER="1" LISTED="yes" TYPE="natural"/>
</gn>
BTREE index—Full-text indexes, apart from being bigger and slower than BTREE indexes, are updated outside the transaction scope, which is inefficient when used to index feature property sets. Further, full text indexes are not available on all databases. To overcome the limitation of full text indexes, the tag-indexing model has been extended to include Numbers, Strings, and Text. These would now be indexed using a BTREE, while text uses the full-text indexer. Therefore, for coded property sets, the tags would be either numbers of strings and not full text. This implementation supports simple strings or single words using indexing technology that exists in every database.

To implement BTREE indexes, a new column, STRING_TAG, has been added to the existing sde_xml_idx table. Since the value for this column will be small, its size is set to 256. At the time of inserting an XML document, the DOUBLE_TAG, STRING_TAG, and TEXT_TAG columns will be populated with appropriate values based on tag type.

Thus, we have a new index type, SE_XML_INDEX_VARCHAR_TYPE.

By default, only string will be indexed using a BTREE, and a full-text index will be created on text tag values. The full-text index can be activated by calling SE_xmlcolumninfo_set_fulltext_indexing() on text tag values.

In the following XML document, consider the "metadata/idCitation/resTitle" node for full-text index and the "metadata/idCitation/citRespParty/rpOrgName" node for BTREE index.

<?xml version = \"1.0\" ?>
<metadata>
  <Esri MetaID=\"1000\">
    <idCitation>
      <resTitle>
        Migratory Birds and Spread of West Nile Virus in the Asia.
      </resTitle>
      <citRespParty>
        <rpOrgName>ACASIAN</rpOrgName>
      </citRespParty>
    </idCitation>
  </Esri>\
</metadata>

Example 1

Find all documents with a "metadata/idCitation/resTitle" node equal to "West Nile"

Xpath: //metadata/idCitation[resTitle = "West Nile"]

SQL:

SELECT sde_xml_id
FROM sde_xml_idx<xmlcolumn id> x INNER JOIN sde.sde_xml_index_tags t
ON x.tag_id = t.tag_id
WHERE t.tag_name = '//metadata/idCitation/resTitle' AND
CONTAINS (x.text_tag, 'West Nile') > 0;

Example 2

Find all documents with a metadata/idCitation/citRespParty/rpOrgName node equal to "ACASIAN".

Xpath: //metadata/idCitation/citRespParty[rpOrgName = "ACASIAN"]

SQL:

SELECT sde_xml_id
FROM sde_xml_idx<xmlcolumn id> x INNER JOIN sde.sde_xml_index_tags t
ON x.tag_id = t.tag_id
WHERE t.tag_name =
'//metadata/idCitation/citRespParty/rpOrgName' AND
x.string_tag = 'ACASIAN';

Note that different predicates are used for searching text and string. For text, use the CONTAINS function and for string, use usual text operator "=" or "!=".

All indexes associated with an ArcSDE XML column are CONTEXT indexes, which are designed for indexing articles, reports, and similar documents such as metadata describing GIS resources. It is important to manage these indexes appropriately as new documents are published to the XML column to maintain good search performance. This involves synchronizing and optimizing the text indexes.

For information indexed as text, indexing is accomplished using the database’s full text indexing capabilities. The words that are included in the database’s text index will depend on the indexing rules that are used. Each RDBMS may use slightly different rules to index the same text. Typically, words that have meaning, such as river, pollution, and population, will be included in the text index, but articles such as and, the, and in will not be included. Different rules are used to index text written in different languages. Both the database and its text indexing components must be properly configured for the language used to author text in the published documents before they can be indexed and searched correctly.

The following code snippets show how XML index and XML tags are handled using the ArcSDE C and Java APIs.

Creating Index and Tags (+/-)

C API Java API

LONG rc, in_types[5], i;
SE_COLUMN_DEF column_defs[num_columns];
index_name[SE_QUALIFIED_XML_INDEX_LEN], *in_names[5], *config_keyword[10];
SE_XMLINDEXINFO xml_index;
SE_XMLTAGINFO tag;
SE_XMLCOLUMNINFO xmlcolumn;

/* Create an index definition for the column */
rc = SE_xmlindexinfo_create (&xml_index);
sprintf(index_name, "xml_index");

rc = SE_xmlindexinfo_set_name (xml_index, index_name);
rc = SE_xmlindexinfo_set_type (xml_index, SE_XML_INDEX_DEFINITION);

in_names[0] = "/metadata/demo/gn/suptheme";
in_names[1] = "/metadata/demo/gn/coverage";
in_names[2] = "/metadata/demo/gn/featured";
in_names[3] = "/metadata/demo@MetaID";

in_types[0] = SE_XML_INDEX_DOUBLE_TYPE;
in_types[1] = SE_XML_INDEX_STRING_TYPE;
in_types[2] = SE_XML_INDEX_STRING_TYPE;
in_types[3] = SE_XML_INDEX_DOUBLE_TYPE;

rc = SE_xmltaginfo_create (&tag);

for (i = 0; i < 4; i++){
  rc = SE_xmltaginfo_set_name (tag, in_names[i]);
  rc = SE_xmltaginfo_set_data_type (tag, in_types[i]);
  rc = SE_xmlindexinfo_add_tag (xml_index, tag);
}

/* Add Index to Column */
rc = SE_xmlcolumninfo_set_index (xmlcolumn, xml_index);

int[] in_types = new int[5];
String[] in_names = new String[5];

//Create XML Index
SeXmlIndex index = new SeXmlIndex(conn);
index.setName("xml_index");
index.setDescription("index description");
index.setType(SeDefs.SE_XML_INDEX_DEFINITION);

in_names[0] = "/metadata/demo/gn/suptheme";
in_names[1] = "/metadata/demo/gn/coverage";
in_names[2] = "/metadata/demo/gn/featured";
in_names[3] = "/metadata/demo@MetaID";

in_types[0] = SeDefs.SE_XML_INDEX_DOUBLE_TYPE;
in_types[1] = SeDefs.SE_XML_INDEX_STRING_TYPE;
in_types[2] = SeDefs.SE_XML_INDEX_STRING_TYPE;
in_types[3] = SeDefs.SE_XML_INDEX_DOUBLE_TYPE;

//Create tags
SeXmlTag tag = new SeXmlTag();

for (int i = 0; i <4; i++)<br> {
  tag.setName(in_names[i]);
  tag.setDataType(in_types[i]);
  index.addTag(tag);
}

//Add Index to Column
SeXmlColumn xmlCol = ...;
xmlCol.setIndex(index);

Retrieving Index and Tag Information (+/-)

Top

Deleting Indexes and Tags (+/-)

Top

feedback | privacy | legal