Once an organization participating in a central Metadata Service has successfully configured the OAI-PMH Connector, the harvester can harvest documents from the provider’s Metadata Service using the OAI-PMH Harvester.
First, you need to install the ArcIMS Metadata options. See ‘Installing ArcIMS Metadata Options’ in the ArcIMS Installation Guide for instructions.
To start the OAI-PMH Harvester, open the operating system’s command window and navigate to <ArcIMS install directory>/Metadata/OAI-PMH/Client. Type “run” and press Enter. The OAI-PMH Harvester dialog box appears.
The OAI-PMH Harvester dialog box is divided into three panels. At the top is the list of harvesting tasks; each task identifies an OAI-PMH Connector from which you want to harvest and the documents you want to retrieve. In the middle is information about your Metadata Service; the harvested documents will be published directly to the specified service. At the bottom you can see the status of your harvesting tasks; initially it will show that the default configuration file for the OAI-PMH Harvester has been successfully loaded.
Use the toolbar to add and remove harvesting tasks, save the OAI Harvester’s configuration, load a new configuration file, and start the harvesting process.
When the OAI-PMH Harvester opens, the default configuration file, OAIServers.xml, is loaded. It includes a default harvesting task to help you get started. Edit the default harvesting task to identify an OAI-PMH Connector from which you want to harvest, specify the appropriate information to publish the harvested documents to your Metadata Service, and begin harvesting following the instructions below.
In the status panel, messages appear indicating the progress of the harvesting tasks; scroll down to see the latest messages. The status bar also indicates that harvesting is in process. Click the red dot at the bottom right of the dialog box to stop harvesting at any time.
By default, if a problem occurs when the harvested documents are being published, an error message will appear and harvesting will be stopped. For example, this might happen if you did not click elsewhere in the second panel after specifying the password, or if the user doesn’t have permission to publish to the specified Metadata Service.
Any combination of dates can be specified in the From and Until columns as appropriate. To harvest documents modified in a specific time period, specify a date in both the From and Until columns. To harvest documents modified before a given date, specify a date only in the Until column. Be sure to click elsewhere in the task list after specifying a date so that your changes to the task list take effect.
You can modify the existing task in the task list for this provider and make that task active. Or, uncheck the existing task in the Active column to deactivate it, click Add a harvesting task, then define the new task to retrieve the latest updates from that provider. Only active tasks will be executed during the harvesting process.
In the OAI-PMH Harvester, paste this value into the Set column in the task list. Press Enter or click elsewhere in the task list for your changes to take effect.
Your custom settings will be loaded each time the OAI-PMH Harvester is started. However, they won’t appear in the OAI-PMH Harvester dialog box. If you customize the settings for a harvesting task, remove that task, and save your changes, your custom settings will be lost. If you add a task, you’ll have to modify its settings in a text editor if you want it to use the same custom settings as the other tasks.
When you open the configuration file in a text editor, you can see there is one repository element for each provider. Additional properties can be set for a provider as follows.
Maximum number of documents—To set the maximum number of documents that you want to harvest from a provider, add the maxdocs element into the appropriate repository element and set its value to a desired number. For example, you would add “<maxdocs>50</maxdocs>”.
Log in to access a provider’s Metadata Service—If you must log in to access documents in the provider’s Metadata Service, add username and password elements into the appropriate repository element and set their values appropriately. For example, you would add “<username>harvesterUser</username>” and “<password>harvesterPass</password>”.
In the configuration file there is one publisher element for the harvester. Additional properties can be set for the harvester as follows.
Copying the provider’s folder structure—The provider’s Metadata Service may organize documents into a hierarchy of folders so they can be easily browsed. By default, folders matching the provider’s folder structure will be created at the root level of the Metadata Service into which the documents are harvested. Instead, you can choose not to copy the provider’s folder structure; with this option, all documents will be placed at the root level of the harvester’s Metadata Service. To not replicate the provider’s folders, add the useSets attribute to the provider element and set it to “false”, for example, “<provider useSets=‘false’>”. If the useSets attribute isn’t present, the value “true” will be assumed.
Publishing documents to a folder on disk—Instead of publishing the harvested documents directly to the specified Metadata Service, they can be stored on disk. You can review or process the documents on disk before publishing them. This might be desirable if your Metadata Service doesn’t use an administrative table that keeps the harvested documents from becoming public immediately after harvesting.
For this option, remove the URL element from the provider element, set the agent attribute on the provider element to “com.esri.aims.mtier.mh.publisher.FolderPublisher”, and add a folder element with a value indicating the full path where the harvested documents will be stored. For example, “<provider agent=‘com.esri.aims.mtier.mh.publisher.FolderPublisher’>” and “<folder>D:\harvest</folder>”. When the configuration file is loaded, the harvester panel will appear empty; the folder information won’t be shown.
The folders used to organize documents in the provider’s Metadata Service won’t be replicated when documents are harvested to disk, even if the useSets attribute is set to “true”.
Publishing harvested documents—By default, if any problems are encountered when publishing the harvested documents, the harvesting process will stop. Instead, you can choose to attempt to publish each harvested document; depending on the problem, some documents may be published successfully. To attempt to publish all documents, add the strict attribute to the provider element and set it to false, for example, “<provider strict=‘false’>”. If the strict attribute isn’t present, the value “true” will be assumed.
Log options can be sent to several locations: standard output, the status panel in the OAI-PMH Harvester, a log file, and a separate debug log file. By default, information is logged only to standard output and the status panel. Usually it’s most convenient to check the status of the harvesting process in the status panel of the dialog box. Use the debug log file to troubleshoot any problems you may encounter. Because the standard output for the OAI-PMH Harvester is sent to the command window from which it was started, you may choose not to send log information to that location.
Four levels of logging can be used to record increasing amounts of information: ERROR, WARN, INFO, or DEBUG. The level of logging available to all logs is set on the first line of the log.properties file. By default, it’s set to the INFO level, which records the status of the OAI-PMH Harvester and the harvesting process, warnings that are generated, and any errors that occur. At the ERROR level, only error messages are recorded. At the WARN level, only warnings and error messages are recorded. At the DEBUG level, all messages are recorded.
Each log can record information at a different level. To change a log’s level, set its threshold property in the log.properties file. No log can record more information than the overall level, which is specified in the first line of the file. For example, if the overall level is INFO and the debug log’s threshold is set to DEBUG, the debug log file can’t record more than the INFO level of information. With the overall level set to DEBUG, everything is recorded in the debug log file; the other logs will continue at their specified thresholds.
If you start a log file and a debug log file, files named “access.log” and “debug.log”, respectively, will be created by default in the <ArcIMS install directory>/Metadata/OAI-PMH directory. Settings in the log.properties file let you specify different file names and locations. They also let you include a date in the log file’s name.
The steps for starting and stopping a log, changing a log’s threshold, and changing a log file’s name and date are the same for the OAI-PMH Harvester as for the Z39.50 Connector. See ‘Setting log options’ for detailed instructions about how to complete these tasks.