Strategies for data transfer to Amazon
Creating a GIS deployment on Amazon requires you to transfer some or all of your GIS data over the Internet to locations on the cloud. This topic lists some options of where you can store your data on the cloud and how you can transfer the data. It also discusses some factors that affect data transfer time.
Places to store the data
Once you create an EC2 instance running ArcGIS Server, you need to prepare to transfer your data to the cloud. There are several places you can store your data. All the following options incur charges from Amazon that are subject to change and that you should research before making your choice.
EBS volumes—Amazon Elastic Block Store (EBS) volumes are virtual disk drives that you can attach to your EC2 instance to add more storage. The ArcGIS Server AMI comes with an attached 100 GB EBS volume "GIS Data", which is mounted as drive D:. The ArcGIS Server directories are configured on this drive, thus when you publish services with the option to copy data to the server, the data goes onto this EBS volume. You can also create other folders on this volume to hold your data.
Similarly, the Enterprise Geodatabase AMI comes with an attached 100 GB EBS volume to store the PostgreSQL cluster. You can optionally remove these and/or add other EBS volumes as needed.
Amazon S3—Amazon Simple Storage Service (S3) is an Amazon service designed specifically for data storage in the cloud. This storage option has the lowest potential for data failure or loss. You can use S3 as a place for data backup or as a middle ground for data transfer between your on-premises deployment and your EBS volumes. Also, any snapshots you create of your EBS volumes are stored on S3.
- EC2 instance—It's possible to transfer data directly onto your EC2 instance; however, it's preferable to keep all your GIS data and your server directories on an attached EBS volume, allowing you to easily restore your configuration or attach it to a different machine. For this reason, the ArcGIS Server AMI apportions a relatively small amount of space (35 GB) on the C: drive to discourage data storage on this drive. In contrast, attached EBS volumes such as the D: drive are larger, and are a safer option for data storage.Caution:
Do not store GIS data or map caches on the C: drive of your EC2 instance in a production deployment of ArcGIS Server.
Options for transferring data to the cloud
Transferring data from your on-premises deployment into the cloud takes time and, in some cases, coordination with your IT security staff. Exporting data to a location on the Internet (in other words, the cloud) is often not as fast or secure as the common data transfers that you do within your local network.
There are many strategies you can use to get data onto the cloud, but if you work with sensitive data, you'll want to make sure you coordinate with your IT staff to make sure your method is secure and approved by your organization. Following are some of your options:
Configure ArcGIS Server to copy the data when you publish a service—You can configure ArcGIS Server so that whenever you publish a service, the data for that service is copied to the server. The data is packaged into a service definition (.sd file), then it is transferred into the ArcGIS Server uploads directory, and finally it is unpacked into the ArcGIS Server input directory. Be aware that this can take a long time and result in the transfer of large amounts of data if you do not limit the extents and datasets used in your map or other resource.
Remote Desktop copy and paste—Windows Remote Desktop allows file system redirection wherein your local drives can be mapped to the remote computer. While logged into your EC2 instance through Remote Desktop, you can open Windows Explorer and copy data from your local drives to your EBS volumes.
To enable file system redirection, in the Remote Desktop Connection window, click the Local Resources tab and check the check box to make your drives available. The wording varies depending on which version of Windows you are using. In Windows 7, you have to click the More button to see the option to make drives available.
If you choose to transfer sensitive data using Remote Desktop, you should ensure that additional layers of security are in place. Older versions of Remote Desktop have been shown to contain security vulnerabilities wherein a computer posing as the server can gain access to your data (sometimes known as man-in-the-middle attacks).
Note:Copy and paste can take a while to transfer data. Do not copy any other file or data before the paste procedure is complete. If you do, the paste terminates and you have to start over.
S3 client utilities—Amazon S3 can be used as a middle ground for moving data from your on-premises deployment to your EBS volumes. To get data to S3 using a purely Amazon solution, you need to use their APIs; but if you don't want to write any code, there are many third-party GUI clients that let you transfer data to S3 just by pointing and clicking.
Two examples of these S3 front-end clients are S3Fox Organizer, which is a Firefox plugin, and Bucket Explorer, which is a lightweight desktop application. You can use these applications on your local computer to get the data to S3. Once your data is on S3, you can install and use the same utility on your EC2 instance to transfer data to your attached EBS volumes.
Your own Web server—Any data available on the Web through HTTP is accessible to your EC2 instance. If you have a Web-facing server in your organization, you can place your data on it, then download the data from your EC2 instance. The advantage of this approach is that you can configure security on your Web server to limit who can download the data and to encrypt the transaction through SSL.
FTP—You can enable file transfer protocol (FTP) to upload files directly onto your EC2 instance. Beware that standard FTP does not encrypt information and sends passwords in clear text. To safely use FTP, you need to take additional security measures, such as encrypting your FTP sessions with SSL, limiting which users are allowed to transfer data to your instance through FTP, and disabling FTP after your initial data transfer. Some third-party products are designed to help you set up secure FTP connections.
AWS Import/Export—If you need to transfer an enormous amount of data to Amazon, it may be faster and/or more cost effective to ship the data to Amazon on a portable storage device and pay Amazon to load the data directly into S3. Amazon offers this service as AWS Import/Export.
If you consider using AWS Import/Export, you'll need to decide if it's appropriate for your organization's data sensitivity. Any time you put a device in the mail, you run the risk, however small, of the physical destruction or interception of your data. You can mitigate these risks by backing up and encrypting the data. If you still have concerns about whether AWS Import/Export is an appropriate choice for your data, contact Amazon directly.
Amazon works with many Solution Providers, some of whom provide data transfer, storage, and security solutions. See Find an AWS Solution Provider to understand whether one of these companies can help with your cloud strategy. Esri itself is one of these providers, and offers various project and implementation services for deploying GIS in the Amazon cloud.
Factors that affect data transfer time
Performance of the above data transfer options can vary based on your physical proximity to the Amazon cloud, the time of day, and the quality of your connection to the Internet.
GIS datasets, especially imagery and map caches, can take large amounts of space and may need to be zipped before transfer, either to reduce the size of the file or to reduce the total number of files for more efficient transfer (especially in the case of map caches). Some S3 client utilities may place limits on the size of any one file you can transfer or the number of individual files you can store. Also, some zipping programs have limits on the amount of data that can be zipped. The zipping time and effort should be taken into account when you choose a data transfer option.
Finally, if using S3, be aware of the limitations on the number of buckets you can create and other restrictions on S3 buckets. Amazon lists these in Bucket Restrictions and Limitations.
Maintaining the integrity of data paths
Any time you move data to a new location, you need to be aware of any paths referencing the data that may also need to be updated. This is a concern with map documents, which may reference dozens of data layers at different paths.
Registering your Amazon EC2 data location with ArcGIS Server can help reduce the effort of fixing broken data paths after publishing. See Registering your data with ArcGIS Server using ArcGIS Desktop.
Another option is to log in to your instance and use ArcMap to repair the out-of-date paths. ArcGIS Desktop is included on the ArcGIS Server AMI so that you can easily make the repairs. See Repairing broken data links to learn about updating data path information in a map document.
Another way to reduce the need to repair data connections is to use relative paths in your map documents and store your maps and data in a common folder.