Publication Components
Publication Components
The primary data store for cataloged metadata documents is a relational database management system. Currently supported databases include: Oracle (9i, 10g ,and 11g), SQL Server (2005 and 2008), and PostgreSQL (8.3 and 8.4). The relational database contains tables associated with: resource approval status, publication method and additional identification attributes. It also contains tables for referenced users (users owning data within the catalog), remote repositories registered for synchronization, and saved searches per user. The Geoportal extension will make use of the standard Java JDBC API when directly communicating with the relational database. The primary components associated with the publication of documents to the Geoportal extension metadata catalog are depicted in the figure below.
Metadata documents that are classified as either "Approved" or "Reviewed" by an administrator will be sent to the Apache Lucene index used by the Geoportal extension. Documents stored within the index are discoverable through search. Apache Lucene implements an Analyzer during the indexing (and searching) process. The job of the Analyzer is to tokenize terms, considering language based stop words and stemming. Additional Analyzers are available through the Apache Lucene contribution community. The website has two pages exposing metadata publication end points:
- An upload page that provides a publisher the ability to upload metadata documents from a hard drive, or from an HTTP end point
- An online editor page that provides a publisher the ability to create and edit metadata documents. Only those documents that have been created by the online editor are available for subsequent edit.
Within the synchronization process the scheduler assigns registered resources to the queue for synchronization. The watchdog is used only in a load balanced environment to ensure the synchronization processes are coordinated. Once the queue receives resource information it acquires the resource connection information through the resource definition element and proceeds through the dedicated synchronization thread. The resource documents are either selected from the target catalog or created based on the resource information available. The output of the selection, iteration, and reading process is an xml document that will be sent through the publication request component described below. The dedicated synchronization thread also creates a report that is visible through the Geoportal user interface on the Synchronization Report page of the resource. As an option on the Create or Edit Resource page you can choose to have an e-mail of the synchronization results sent.
The website also exposes a REST API, allowing for compatible publication of metadata documents from client applications such as ArcCatalog. The Geoportal extension Publish Client is a plug-in for ArcCatalog that batch publishes metadata documents (from folders or GeoDatabases) through this end point.
Each publication request implements a standardized methodology to process an XML metadata document:
- Interrogation: The document will be interrogated to determine its associated metadata standard
- Evaluation: The document will be evaluated according to the configuration file associated with the standard. Evaluation determines the primary parameters of interest (such as title, abstract, …)
- Validation: The document will be validated according to the configuration file associated with the standard. If the standard has an associated XSD (XML Schema Definition), the document will be validated against the XSD.
- Identification: A determination is made as to whether or not the document currently exists within the catalog. This step is necessary to avoid duplication and is dependent upon the content of the document (some have internal identifiers), and the publication method (some methods can provide a unique URI associated with the source).
- Store Document: The document is sent to the relational database for storage .
- Update Administrative Attributes: Administrative attributes within the relational database are updated through the Java JDBC API. Included are: the publication method, an internal file identifier if available, a URI associated with the source if available.
- Index if Required: If the document has previously been Approved or Reviewed by an administrator (or when it is Approved or Reviewed), the document is sent to the Apache Lucene Index. This step makes use of a Geoportal extension class (LuceneIndexAdapter) to communicate with the index through the Apache Lucene Java API.