Cache creation in Amazon EC2

Creating an ArcGIS Server map or globe cache in the Amazon Elastic Compute Cloud (EC2) differs from caching outside the cloud in several ways:

This topic discusses the above factors in more detail.

Choosing a machine size and price

Amazon offers a variety of machine sizes and specifications. Each has its own price per hour of usage. The larger Amazon machines, especially those with a lot of memory can generate tiles very quickly. The smaller machines generate tiles more slowly but have a lower cost.

You can create your cache on an attached Amazon Elastic Block Store (EBS) volume using a powerful machine. When the caching completes, you can detach the EBS volume and attach it to your regular machine (which may be smaller and less expensive). You can then terminate the powerful machine that you used for caching. In this way, you can use the power of the cloud to cache while not committing to a relatively expensive machine for any longer than necessary.

You may need to make a decision between economy and speed. Using a low power machine with a low cost per hour is not always the most economical choice, as the total cost of the cache is dependent on the number of hours spent creating tiles. On the other hand, the most powerful machines may also yield a higher total cost of the cache: even though you spend fewer hours caching, you pay a higher price per hour.

Basic testing by ESRI has found that the Amazon high-memory instances (High-Memory Extra Large, High-Memory Double Extra Large, and High-Memory Quadruple Extra Large) are the most economical for caching. This recommendation is subject to change if Amazon adjusts its pricing schemes or machine specifications.

Using a small test cache (perhaps the size of a medium-sized city) and a custom Amazon Machine Image (AMI), you can perform relatively inexpensive tests of your own on different instance types to find out which is most economical for your cache.

Powerful EC2 instance types are well suited to scheduled cache updates, since many update workflows are time sensitive.

Choosing the number of map service instances to use when caching

Each EC2 instance has a certain number of CPU cores. This number is visible when you choose the instance type from the Launch Instance wizard. The number of cores can help you determine how many map service instances (not EC2 instances) to use in your caching. Using too many map service instances will overwork your CPUs, while too few instances will leave your CPUs underutilized.

Generally, a two-core machine can use 2 or 3 instances, a four-core machine can use 5 instances, and an eight-core machine can handle 10 instances. You may need to adjust these figures slightly for your particular map service, but they provide a good starting point.

Scaling ArcGIS Server for caching jobs on the cloud

ArcGIS Server scales differently on the cloud. You do not add stand-alone SOC machines to add more power to your deployment. Instead, you add complete Web Server/SOM/SOC machines and connect them with an Amazon Elastic Load Balancer (ELB). Since tile creation assignments for a map service are distributed by the SOM, the ELB does not work for caching a map service. You need to take a different approach to scaling.

The first thing to determine is if you need more than one machine in your deployment at all. Outside the cloud, you may be accustomed to caching with two or three connected machines of medium power. Inside the cloud, you may be able to generate tiles in the cloud at the same rate or better using a powerful EC2 instance. The High-Memory Quadruple Extra Large instance, for example, has been observed to create about 15,000 tiles per minute for one particular vector street map. Do some testing with small caches to understand how fast you can expect tile creation to be on the most powerful machine.

If one powerful machine in the cloud is not enough, you need to divide your caching job into geographic sectors and assign each sector to a machine. The process is not automatic. You probably already have a feature class determining your full cache boundary. You can use the ArcGIS editing tools to cut this feature class into relatively equal-sized regions, then assign each machine a different zone to cache. You can write all tiles to a shared location as they are cached, or you can write them to local locations and import them into a master location later using the Import Map Server Cache tool.

Although distributing caching jobs cannot be automated, only the very largest caching jobs will require more than a few machines of the most powerful Amazon instance types.

Deciding where to place the cache

As described in Strategies for data transfer to Amazon, there are several types of locations on Amazon where you can place your data. When you first create the cache, you'll write it to an EBS volume that's attached to your EC2 instance. The ArcGIS Server AMI attaches a 100 GB volume, called GIS Data, by default. This is a good place to put the cache if the volume is large enough. If the volume is too small, you need to create and attach another volume and register a server cache directory on it.

Do not build a cache on the C drive of your EC2 instance. If you terminate the instance, the cache will be lost.

Ultimately, you might want to move or place a copy of the cache onto Amazon Simple Storage Service (Amazon S3). If you're just interested in keeping a backup on Amazon S3, you can create an EBS snapshot. A snapshot effectively backs up your drive on Amazon S3, and you can quickly use the snapshot to create a new EBS volume if your existing volume fails for any reason.

You can also serve the tiles from Amazon S3 and access them as a custom tile layer using a JavaScript, Flex, or Silverlight application. The advantage of this is that your tiles do not depend on a running service, and you can optionally use Amazon CloudFront to speed the delivery of tiles across the Internet to all parts of the world. If you want to move tiles to Amazon S3 for this purpose, you can transfer the tiles from your EBS volume using either the Amazon Web Service APIs or a third-party front-end application for Amazon S3. You could also do this if you created the cache outside the cloud.


1/30/2013