Managing elevation data, Part 2: Design and data management plan
The audience for this workflow is primarily the image data managers within an organization who are mandated to make elevation data accessible to a range of user communities. This workflow presumes the image data manager is using ArcGIS Desktop to manage the data and ArcGIS Server to distribute the data as one or more image services, but this workflow is also applicable for managing and distributing elevation data in only ArcGIS Desktop.
This workflow is intended to address raster, cell-based, elevation data. Elevation data stored as 3D point data (for example, LAS) and Terrain dataset formats must be converted to raster datasets to be used in this workflow.
This workflow is intended to address raster, cell-based, elevation data, and 3D point data stored as LAS files (the LAS dataset and Terrain dataset formats can also be used in this workflow).
This lists the general design of the elevation data management. Each of these will be discussed below.
- Data storage (size, requirements, and locations).
- Prepare data (may require preprocessing).
- Create a mosaic dataset for each collection (source).
- Create a mosaic dataset from each collection (master).
- Create different mosaic datasets for visualization, analysis, user access, and to publish (referenced).
Data storage is not discussed here, but does require some planning depending on your requirements. For a design methodology that outlines a successful geographic information system, refer to Esri's System Design Strategies online document.
One thing to mention is data organization. It is ideal if you can organize your data in folders grouped by their product. For example, keep the SRTM data in one folder and the NED 1/3 arcsec data in another folder. You will see in the steps (Part 3) how this will help when loading data, for quality assurance / quality control (QA/QC), and in longer-term maintenance.
This workflow emphasizes three different modes of using elevation data. Most end users need to see visualizations of the topography to meet their needs, whereas a smaller percentage of users want to see the results of a topographic analysis. The smallest percentage of users, such as engineers, have a requirement to access the actual elevation values to do their own analysis.
It is important to understand the differences and implement the proper usage mode for each of these users, since system efficiency and responsiveness can be greatly impacted. The different usage models that will be referenced repeatedly are viewer usage, analysis results usage, and data values usage.
Usage model 1: viewer usage
Users have a need to view representations of the elevation data. Thus the data manager must create appropriate visualization products on the server then serve those views to the users. This refers to the largest, but least technically demanding, user group—in many cases the general public—who needs easy access to any number of relatively clear products based on elevation data. Examples include
- A hillshaded or shaded relief image—for inclusion in a topographic map or basemap
- An image representing slope—for urban planning, landslide susceptibility, and so on
- An image representing aspect—for agriculture, wildlife habitat delineation and management, climate modeling, and so on
Of course, for users with ArcGIS, by providing them access to the elevation data, the application can generate these products themselves.
Usage model 2: analysis results usage
Users define parameters and a region of interest for server-side analysis of the elevation data then retrieve the results. This refers to a group of users who need access to a variety of analytical products that can be generated on the server. The results are typically maps or discrete features that are then served to the user without transmitting the original source data. Examples include
- Viewshed calculations for visibility and line-of-sight analysis—for determining what can be seen from a location, for siting cell phone towers and microwave communication equipment, or for planning clear cuts
- Uses in disaster management—for evacuation planning, flood mitigation, and a web-based overlay of current flood data for real-time decision making
- Industrial planning—for wind profiles and visibility for wind farm locations, or hydroelectric dam design
- Calculation of cartographic contours—for display on a map
- Calculation of profiles along straight lines or line segments—for engineering pipeline routes and pressure calculations, road planning, or timber harvest road planning and extraction costs
For these applications, the user generally does not need the actual elevation values. By only distributing these products, the bandwidth requirements may be slightly reduced since the data sizes will be smaller. For example, the raw elevation values are 32 bit, whereas a viewshed may be a 1 bit or 8 bit result.
Usage model 3: data values usage
Users need access to the elevation values. This refers to a user who needs the original elevation data in digital format to support numeric calculations (presumably in a client-side web or desktop application). Examples include:
- As input in a secondary process—for the orthorectification of imagery
- As input in their own data models or processes—for creating contours or completing water flow analysis for use in hydrographic design of irrigation systems or flood modeling
Of the three usage models, this is the most expensive in terms of time and resources on the server since it often requires a much larger amount of data to be accessed and transmitted.
To provide the appropriate data and access to meet the usage models above. This is not intended to be a complete listing of requirements, but an introduction to some important requirements specific to elevation data. You must then decide if these requirements are applicable in your organization, and make appropriate decisions regarding proper implementation.
- Distribute an image, result, or the data
- Provide access internally, intranet, or Internet
- Provide one or many elevation models or representations
- Provide access to 1, 10, 100, or 1 million users
- Provide access to the source data
- Many data sources, managed as one
- Limited access to source and management
- Need to be able to easily update or replace content
Data download and export
It is important for image data managers and the end users to understand that any sampling of elevation data will change the data. For example, if a dataset is viewed at a resolution other than full resolution, in a different projection, or pixel alignment, the data will be resampled. It is not uncommon to oversample an elevation dataset, for example, by zooming in to a scale of 1:1,000 of a viewshed created from a 5-meter post spacing dataset—this is far too close.
For some applications, users may need access to the numeric values within a region of interest (data values usage). There are two methods for providing the numeric data values to a client: export or download.
Export refers to extracting data within a specific extent and spatial reference. Exporting may provide the raw data values, a resampled version of them, or a processed version such as a hillshade or slope image. Download refers to transmitting the original (full resolution, nonresampled) data values, typically within a specified area. It should be clear that download can result in the transfer of a very large volume of data from server to client (especially if the data covers a very large area and contains a large number of datasets). Therefore, appropriate constraints must be implemented to ensure the user and system are prepared for the result—such as setting a limit on the maximum amount of data to transfer or designing a web application with a warning.
Here’s a list of the example data that will be used in this workflow. This data can range in bit depths, typically 16 bit or float, signed or unsigned.
This workflow assumes the data manager is using in-house, locally stored data.
GTOPO is a global elevation dataset with resolution of 30 arc seconds (approximately 1 km), available for download at http://www1.gsi.go.jp/geowww/globalmap-gsi/gtopo30/gtopo30.html.
The Shuttle Radar Topography Mission (SRTM) is elevation data on a near-global scale, acquired from the Space Shuttle, to generate the most complete high-resolution digital topographic database of Earth.
It is available at http://srtm.usgs.gov/index.php.
NED 10, NED 30
The National Elevation Dataset (NED) was created by the USGS for the USA. NED data are available nationally at resolutions of 1 arc-second (about 30 meters, NED 30) and 1/3 arc-second (about 10 meters, NED 10).
The lidar data may come from a variety of sources. In this particular case the data is provided from the Oregon Metro Regional Land Information System (RLIS) and can be used to provide both a DTM and DSM.
Data management organization and services (products)
One key objective is to ensure that all the data, regardless of extent, is managed and distributed as a single unit. The alternative (which often occurs over time, as an organization grows and individual projects are completed) is to manage data from different geographic areas as separate datasets. However, ArcGIS provides the capability to efficiently manage very large dataset collections, reducing creation and maintenance costs that previously resulted from data duplication and unnecessary management overhead.
Mosaic dataset organization
The mosaic dataset is the optimal data structure to manage, display, and publish your collection of elevation data, because it can manage all the different raster file formats and resolutions, and the files are maintained in their original format on disk. It also has many options for displaying and processing your elevation data, such as the dynamic mosaicking that allows the best resolution to display at appropriate scales, and functions used to process the data to create multiple products without copying the source data.
Functions specific to elevation data are Hillshade, Shaded Relief, Aspect, and Slope.
It is advantageous to separate mosaic datasets into two types: those that are primarily used for management and those that provide other data representations (such as hillshade) and are published. This separation can aid in organization. You can manage your imagery within a mosaic dataset, but use another mosaic dataset to share or disseminate (publish) the contents.
Here's an overview of the different types of mosaic datasets and what purpose they may serve:
- Source—Used for managing imagery. It generally contains a collection of similar imagery. You may use a number of these source mosaic datasets to manage different collections, such as SRTM and NED. These can be published directly or (more often) used as the source for other mosaic datasets.
- Master (or derived)—Used to compile multiple sources into a single mosaic dataset. The source of a master mosaic dataset is generally one or more source mosaic datasets but can also include other images or services.
- Referenced—A unique type of mosaic dataset, which is mainly used to share or publish the imagery. It is created using one mosaic dataset as input and does not allow the items in the table to be edited, therefore, keeping the inputs safe from alteration. It's often used to provide differently processed outputs of the source or master mosaic dataset inputs.
The source and master (derived) mosaic datasets are symbolic names used to help convey an understanding of the organizational structure of mosaic datasets, whereas a reference mosaic dataset is a physically different form of a mosaic dataset.
Your elevation data can be stored in folders organized by you or the data vendor, but it will all be managed and distributed using one or more mosaic datasets and image services. The data contained within a source mosaic dataset is usually determined by having the same number of bands and bit depths. In this case, it's determined by bit depth and product; for example, the lidar-derived data can be organized in one source mosaic dataset and the SRTM in another. This helps in keeping the data organized and allows data with different vertical units to be separated; for example, the lidar-derived data is measured in feet and the SRTM is measured in meters. This also allows you to refine the footprints, if necessary, or control the NoData, which can be unique for each product.
The source mosaic datasets will be combined together using the master mosaic dataset. Some functions may be added to some of the sources to ensure the data represents the same information, such as the conversion from feet to meters or ellipsoidal to orthometric. (For most requirements, it is recommended that a mosaic dataset with orthometric ground height be created and maintained as the master service upon which others are built.)
Final products and services
Various referenced mosaic datasets can be created from the master mosaic dataset to provide the following recommended elevation data services:
- Orthometric ground height
- Orthometric surface height—if surface elevation data (DSM) is available (which will show buildings, tree canopies, bridges, and so on)
- Ellipsoidal ground height
- Shaded Relief
- Aspect (used for both visualization and analysis)
If more than one geoid correction is required by the user community, the data manager may wish to publish geoids as image services, exposing appropriate options to the users.
In all cases, the data manager must decide how to represent the oceans. The proper choice will depend on the applications the data must support. The options include the following:
- Ocean is elevation with a 0 value
- Ocean is NoData
- Oceans are represented with bathymetric data
For most applications, it is acceptable to represent any sea level with 0. If the sea is defined as NoData, then orthorectification in any NoData areas will fail. A simple method to fill in with zero values is to add a very low-resolution, worldwide dummy image into the mosaic dataset, where all pixel values are 0. Then, when the values in the data, such as the SRTM, are NoData, the 0 value in the dummy image will display.
If the data manager chooses to include bathymetric data, there will be negative elevation values in the oceans, which will allow for visualizations of the ocean floor. This allows flexibility in terms of how multiple services may render (display) the data; one client application could show a blue fill for water where elevation is less than 0, and in the same area, a different client application could render the subsurface elevation as shaded terrain.
At a basic level, mosaic dataset overviews are like raster dataset pyramids. They are lower-resolution images created to increase display speed and reduce CPU usage since fewer rasters are examined to display the mosaicked image; however, they differ greatly because you can control many of the parameters used to create them. You can create them to cover only a specific area or only at specific resolutions. They are created to allow you to view all the rasters contained in the entire mosaic dataset, not just for each raster. Overviews generally begin where raster pyramids stop, but you can specify a base pixel size at which your overviews will be generated if you prefer not to use all the raster's pyramids.
The data manager should consider the best approach for overviews. Overviews can be created from the project data, but if appropriate lower-resolution datasets are available from alternative sources, such as GTOPO, ETOPO, and GMTED2010, it is recommended that they be used. The remainder of this workflow is based on this approach, to build a large-region image service comprised of multiple datasets at different spatial resolutions (so there is generally no requirement for overviews).
There are numerous properties that the data manager should verify regarding all elevation data. The data manager must review and decide which components are important to maintain as well as which metadata fields to expose to the data users. The metadata listed below is recommended for the purposes of both quality assurance and system configuration.
Metadata the data manager should verify includes:
- Data source or owner.
- Horizontal coordinate system (projection, datum, and units).
- Vertical datum (specific model, noting if it is ellipsoidal or orthometric) and units (feet or meters).
- Horizontal accuracy (typically measured as CE90 or CE95, but also may be reported as RMS error or RMSE).
- Vertical accuracy (typically measured as LE90).
- Resolution (sample spacing stored in the data file and is not the same as the horizontal accuracy of the data).
- Elevation surface type (DEM versus DSM).
is NoData defined in this dataset:
- Are there areas of NoData?
- If yes, is it represented by a single value?
- Is the NoData limited to the edges of the datasets or are there holes of NoData within the valid data?
- Some products have associated feature classes that define void regions. Look to see if these areas were filled with a value and if it is the NoData value.
- Was the data sampled from another source?
- Acquisition date for the raw data.
- QA/QC completed on what date, by whom, and using what methods and/or standards.
- Is the data releasable or restricted (data may be proprietary or classified, free for public release)?
For unique metadata fields, it may be necessary to manually add these to the mosaic dataset's attribute table, such as the horizontal and vertical accuracies. This way, you can easily query the mosaic dataset for this information.
It is worth creating a list for the products (or subproducts) you will be using because you may need to modify the data within the mosaic dataset, such as using the Arithmetic function to convert from one unit to another.
Format optimization is not always necessary, but it is necessary when the data is in a format that is not optimal or well supported. The following lists some recommendations supporting preprocessing of elevation data when building a single collection and publishing as a service.
- Conversion of point or TIN data to a raster format. There are several geoprocessing tools to choose from depending on the source. See the To Raster toolset or the 3D Analyst Conversion toolset.
conversion of lidar data to elevation surfaces in ArcGIS 10.0, see
the blog: Lidar Solutions in ArcGIS part2: Creating raster DEMs and DSMs from large lidar point.
In ArcGIS 10.1, LAS files and terrain datasets will not need to be converted to raster datasets. They will be supported directly in the mosaic dataset.
- The optimum format for elevation data is 32 bit floating-point values with LZW compression stored in a tiled TIFF format. LZW compression is lossless, fast, automatically tiles the data within a file, and NoData values don't take up space in the raster format.
Esri's general recommendation is not to convert elevation data from
its original format unless the file format is inefficient and the
server's performance will suffer. For example, if any of the
following are true, the data should be converted before adding it to
the mosaic dataset:
- If the elevation data is stored in an inefficient file format, such as ASCII XYZ (inefficient to read).
- If the elevation data files are large (number of rows or columns > 5000) and the data is not tiled or does not have pyramids and server performance is important.
In some cases, you will want to run a performance check before converting your data to determine if they are suitable or can be optimized with conversion. For example:
- If the original is stored as JPEG 2000, be aware that time is required for decompression, which may have a significant impact on performance. For best performance, you may need to convert the data to tiled TIFF.
- If the original data is in the Esri Grid format, it is also better to convert if scalability is important in a multiprocessing environment.
If the files will be converted from one format to another, with no changes needed in the bit depth or other properties of the dataset, use the Raster To Other Format (Multiple) tool. If you have to make changes to some of the properties, use the Copy Raster tool. When applying this tool to many datasets, you can right-click the tool and click Batch or write a script to ingest multiple datasets. In either case, the environments must be set. You can do this at the application level if you'll be applying this to several tools, or on the tool level.
- Access the Environments dialog box.
- From the main menu, click Geoprocessing > Environments.
- On the tool, click the Environments button.
- Expand the Raster Storage section.
- Check Build Pyramids.
- Click the Pyramid Resampling Technique drop-down arrow and click BILINEAR.
- Click the Pyramid Compression Type drop-down arrow and click LZW.
- Check Calculate Statistics.
- Type 1000 for the x and y skip factors.
This value is derived by dividing the number of columns by 1,000.
- Click the Compression drop-down arrow and click LZ77.
- Verify that the Tile Size width and height are both 128.
If the data does not include pyramids and the tiles are large (with the number of rows or columns greater than 5,000), it is recommended that pyramids are created. To determine if a file has pyramids, right-click in the Catalog window or table of contents and click Properties, then look under Raster Information for Pyramids.
When building pyramids for multiple datasets, use the Build Pyramids And Statistics tool. Use the same environment settings as above.
Pyramids require some additional disk space and are written to separate files with an .ovr extension.
None of the data used in this workflow has pyramids. This is mainly because the various resolutions will be integrated and used together (thereby negating the need for reduced resolutions of the source data) and because of the tiled structure of the data (there aren't any extremely large individual files).
If the number of data files is greater than 100,000 files and new pyramids will be created, you may prefer to write pyramids into the data files to avoid generating too many extra files. To do this, it's recommended you use the FWTools utility, which is not included with ArcGIS.
You can download FWTools from http://fwtools.maptools.org, then execute the following commands:
gdal_translate.exe -of Gtiff -co "TILED=YES" -co "COMPRESS=LZW" Input.xxx Output.tif gdaladdo.exe -r average -ro --config TILED YES --config PHOTOMETRIC_OVERVIEW LZW output.tif 2 4 8 16
For 16 bit imagery, insert -co NBITS=12 before Input.xxx.
If the files created are greater than 4 GB, insert either BIGTIFF=YES or BIGTIFF=IF_NEEDED before Input.xxx.
Statistics are required for a raster dataset or mosaic dataset to perform some geoprocessing operations or certain tasks in ArcGIS Desktop applications, such as applying a contrast stretch or classifying data. In this workflow, there is no need to build statistics for each data source, since none will be displayed or used on their own, nor are any functions or any of the products being created reliant on the statistics from the individual datasets. Statistics will be generated for the mosaic datasets for display purposes.
For more information about statistics, see Raster dataset statistics.