An index in the ArcSDE API is used to enable searches on XML columns, and a
tag represents an element.
An
ArcSDE XML column can have two types of indexes to support different
types of searches.
Text IndexA text index associated with the XML
document supports full-text searches against all content in the XML document.
The XML tags in the document are transparent to the text index (in other words, the text
index is created in the database on the content with all XML markup removed), thus allowing
quick and efficient text searches on the document.
A text index is created by default on all element/attribute values. ArcSDE
parses the XML document for values of all elements and/or attributes and
stores them in an SDE_XML_DOC table that supports full doc text queries.
For example: In the following XML document, a text index will search only
for values of elements such as "New Mexico", "4",
and attribute values such as "YES" for the LISTED attribute
in MARK tag.
<gn>
<coverage>New Mexico</coverage>
<suptheme>4</suptheme>
<MARK NUMBER="1" LISTED="yes" TYPE="natural"/>
</gn> |
BTREE indexFull-text indexes, apart from being bigger and slower than BTREE indexes, are updated outside the transaction scope, which is inefficient when
used to index feature property sets. Further, full text indexes are not available on all databases. To overcome the limitation of full text indexes,
the tag-indexing model has been extended to include Numbers, Strings, and Text. These would now be indexed using a BTREE, while text uses the full-text indexer. Therefore, for coded property sets, the tags would be either numbers of strings and not full text. This implementation supports simple strings or
single words using indexing technology that exists in every database.
To implement BTREE indexes, a new column, STRING_TAG, has been added to the existing sde_xml_idx table. Since the value for
this column will be small, its size is set to 256. At the time of inserting an XML document, the DOUBLE_TAG, STRING_TAG, and TEXT_TAG columns will be populated
with appropriate values based on tag type.
Thus, we have a new index type,
SE_XML_INDEX_VARCHAR_TYPE.
By default, only string will be indexed using a BTREE, and a full-text index will be created on text tag values. The full-text index can be activated
by calling SE_xmlcolumninfo_set_fulltext_indexing() on text tag values.
In the following XML document, consider the "metadata/idCitation/resTitle"
node for full-text index and the "metadata/idCitation/citRespParty/rpOrgName" node
for BTREE index.
<?xml version = \"1.0\" ?>
<metadata>
<Esri MetaID=\"1000\">
<idCitation>
<resTitle>
Migratory Birds and Spread of West Nile Virus in the Asia.
</resTitle>
<citRespParty>
<rpOrgName>ACASIAN</rpOrgName>
</citRespParty>
</idCitation>
</Esri>\
</metadata> |
Example 1
Find all documents with a "metadata/idCitation/resTitle" node equal to
"West Nile"
Xpath: //metadata/idCitation[resTitle = "West Nile"]
SQL:
SELECT sde_xml_id
FROM sde_xml_idx<xmlcolumn id> x INNER JOIN sde.sde_xml_index_tags t
ON x.tag_id = t.tag_id
WHERE t.tag_name = '//metadata/idCitation/resTitle' AND
CONTAINS (x.text_tag, 'West Nile') > 0;
Example 2
Find all documents with a metadata/idCitation/citRespParty/rpOrgName node equal
to "ACASIAN".
Xpath: //metadata/idCitation/citRespParty[rpOrgName = "ACASIAN"]
SQL:
SELECT sde_xml_id
FROM sde_xml_idx<xmlcolumn id> x INNER JOIN sde.sde_xml_index_tags t
ON x.tag_id = t.tag_id
WHERE t.tag_name =
'//metadata/idCitation/citRespParty/rpOrgName' AND
x.string_tag = 'ACASIAN';
Note that different predicates are used for searching text and string. For
text, use the CONTAINS function and for string, use usual
text operator "=" or "!=".
All indexes associated with an ArcSDE XML column are CONTEXT indexes, which are designed for indexing articles, reports, and similar documents such as metadata
describing GIS resources. It is important to manage these indexes appropriately as new documents are published to the XML column to maintain good search performance. This
involves synchronizing and optimizing the text indexes.
For information indexed as text, indexing is accomplished using the
databases full text indexing capabilities. The words that are included in the
databases text index will depend on the indexing rules that are used. Each
RDBMS may use slightly different rules to index the same text. Typically, words
that have meaning, such as river, pollution, and population, will be
included in the text index, but articles such as and, the, and in will not be
included. Different rules are used to index text written in different languages.
Both the database and its text indexing components must be properly configured
for the language used to author text in the published documents before they can
be indexed and searched correctly.
The following code snippets show how XML index and XML tags are handled using the ArcSDE C and Java APIs.
LONG rc, in_types[5], i;
SE_COLUMN_DEF column_defs[num_columns];
index_name[SE_QUALIFIED_XML_INDEX_LEN], *in_names[5], *config_keyword[10];
SE_XMLINDEXINFO xml_index;
SE_XMLTAGINFO tag;
SE_XMLCOLUMNINFO xmlcolumn;
/* Create an index definition for the column */
rc = SE_xmlindexinfo_create (&xml_index);
sprintf(index_name, "xml_index");
rc = SE_xmlindexinfo_set_name (xml_index, index_name);
rc = SE_xmlindexinfo_set_type (xml_index, SE_XML_INDEX_DEFINITION);
in_names[0] = "/metadata/demo/gn/suptheme";
in_names[1] = "/metadata/demo/gn/coverage";
in_names[2] = "/metadata/demo/gn/featured";
in_names[3] = "/metadata/demo@MetaID";
in_types[0] = SE_XML_INDEX_DOUBLE_TYPE;
in_types[1] = SE_XML_INDEX_STRING_TYPE;
in_types[2] = SE_XML_INDEX_STRING_TYPE;
in_types[3] = SE_XML_INDEX_DOUBLE_TYPE;
rc = SE_xmltaginfo_create (&tag);
for (i = 0; i < 4; i++){
rc = SE_xmltaginfo_set_name (tag, in_names[i]);
rc = SE_xmltaginfo_set_data_type (tag, in_types[i]);
rc = SE_xmlindexinfo_add_tag (xml_index, tag);
}
/* Add Index to Column */
rc = SE_xmlcolumninfo_set_index (xmlcolumn, xml_index);
int[] in_types = new int[5];
String[] in_names = new String[5];
//Create XML Index
SeXmlIndex index = new SeXmlIndex(conn);
index.setName("xml_index");
index.setDescription("index description");
index.setType(SeDefs.SE_XML_INDEX_DEFINITION);
in_names[0] = "/metadata/demo/gn/suptheme";
in_names[1] = "/metadata/demo/gn/coverage";
in_names[2] = "/metadata/demo/gn/featured";
in_names[3] = "/metadata/demo@MetaID";
in_types[0] = SeDefs.SE_XML_INDEX_DOUBLE_TYPE;
in_types[1] = SeDefs.SE_XML_INDEX_STRING_TYPE;
in_types[2] = SeDefs.SE_XML_INDEX_STRING_TYPE;
in_types[3] = SeDefs.SE_XML_INDEX_DOUBLE_TYPE;
//Create tags
SeXmlTag tag = new SeXmlTag();
for (int i = 0; i <4; i++)<br> {
tag.setName(in_names[i]);
tag.setDataType(in_types[i]);
index.addTag(tag);
}
//Add Index to Column
SeXmlColumn xmlCol = ...;
xmlCol.setIndex(index);
|
LONG rc, temp_long, in_aliases[5], i, num_tags, in_types[5];
SE_XMLINDEXINFO xml_index;
CHAR temp_str[256], *in_descs[5], *in_names[5];
SE_XMLTAGINFO in_tag, *out_tags;
BOOL in_exclusions[5];
rc = SE_xmlindexinfo_create (&xml_index);
rc = SE_xmlindexinfo_get_name (xml_index, temp_str);
rc = SE_xmlindexinfo_get_type (xml_index, &temp_long);
rc = SE_xmlindexinfo_get_description(xml_index, temp_str);
rc = SE_xmlindexinfo_get_tags (xml_index, &num_tags, &out_tags);
for (i = 0; i < num_tags; i++)
{
rc = SE_xmltaginfo_get_name(out_tags[i],
temp_str);
rc = SE_xmltaginfo_get_data_type (out_tags[i], &temp_long);
rc = SE_xmltaginfo_get_alias (out_tags[i], &temp_long);
rc = SE_xmltaginfo_get_description (out_tags[i], temp_str);
if(SE_xmltaginfo_is_excluded (out_tags[i]))
printf ("Tag %d is excluded\n", i);
}
SeXmlColumn col = ...;
if(col.hasIndex())
{
System.out.print(" Index Name: " + index.getName()
+
"\n Type: " + index.getType() +
"\n Tags: ");
SeXmlTag[] tags = index.getTags();
System.out.print(tags.length + "\n");
if(tags != null)
{
for(int i = 0; i < tags.length; i++)
{
String isExcluded = null;
if(tag.isExcluded())
isExcluded = "Yes\n";
else
isExcluded = "No\n";
System.out.println(" Name: " + tag.getName()
+
"\n Description: " + tag.getDescription()
+
"\n Data Type: " + tag.getDataType()
+
"\n Alias: " + tag.getAlias() +
"\n Is Excluded? " + isExcluded);
}
}
}
|
LONG rc;
SE_XMLINDEXINFO xml_index;
rc = SE_xmlindexinfo_create (&xml_index);
rc = SE_xmlindexinfo_delete_tags (xml_index);
rc = SE_xmlindexinfo_free(xml_index);
rc = SE_xmlcolumn_delete(handle, tableName, xmlcolumn);
SeXmlColumn col = ...;
SeIndex index = col.getIndex();
index.deleteTags();
xmlCol.delete(tableName, xmlColumnName);
|
See also
XML Index Templates
|