compressed binary representation of geometry


Home \| Concepts \| API \| Samples

Concepts > Geometry > Representations of Geometry

ArcSDE Compressed Binary

The ArcSDE compressed binary representation of geometry is used to store binary geometry. This binary representation requires that an offset and scale be applied to the coordinates of a geometric object. The resulting integer coordinates are then encoded using the delta from the previous coordinate. Optionally, a CAD or ANNO object is also appended to the geometric object.

Coordinate values

Internally, all ArcSDE coordinates are 64-bit positive integers between 0 and 2147483647 (if defined using a 32-bit coordinate reference) or between 0 and 9007199254740990 (if defined using a 64-bit coordinate reference). Note that 64-bit coordinates are actually limited to a 53-bit range so that no information is lost when converting to or from double precision floating point representation. This format provides better data accuracy, data integrity, and processing speed than real numbers. Developers should be aware of the internal integer representation, because it is possible to attempt to store a number that is too large in a layer. In that case, the ArcSDE software returns the error SE_COORD_OUT_OF_BOUNDS. Developers never need to work directly with the integer values.

Because real-world coordinates are often neither positive nor integer, ArcSDE data requires an offset distance (a false origin) to ensure numbers are positive and a minimum resolution multiplier (called the scale) to convert real numbers to integers. Offset distances are specified in the same units as the data. The scale can be any positive value up to 2147483645 if using a 32-bit coordinate reference, and up to 9007199254740990 if using a 64-bit coordinate reference.

Logical representation of ArcSDE feature geometry (+/-)

Physical representation of ArcSDE feature geometry (+/-)

This section describes the physical view of how an ArcSDE feature's geometry is stored in a binary stream. There are three issues to present: separators, point compression, and the binary layout.

Part separators

The physical representation of the separators which delineate the parts of a feature is an x value of negative one (-1), a y value of zero (0), and the z and m values are undefined. Separators do not require any special logic when being compressed.

Point compression

The compression or decompression of the coordinates stored in the binary stream is a two step process: the conversion to/from the relative-offset scheme and the packing/unpacking of bytes. To compress coordinates, the values are converted to relative-offsets, then packed into a byte array. To decompress coordinates, the byte array is unpacked, then the values are converted to absolute values. Each step is described below.

Relative-offset value calculation

The goal of converting coordinate values to a relative offset scheme is to make the values as small as possible so that they require fewer bits to represent them. In an array of relative-offset values, the first value is an absolute value (stored as a 32-bit integer) while each subsequent value is the offset, or difference, from the previous absolute value. Therefore, given N absolute values, the relative-offset values are calculated by:

relative_value[0] = absolute_value[0]
relative_value[1] = absolute_value[1] - absolute_value[0]
[...]
relative_value[N-2] = absolute_value[N-2] - absolute_value[N-3]
relative_value[N-1] = absolute_value[N-1] - absolute_value[N-2]

Given N relative values, the absolute values are calculated by:
absolute_value[0] = relative_value[0]
absolute_value[1] = absolute_value[0] + relative_value[1]
[...]
absolute_value[N-2] = absolute_value[N-3] + relative_value[N-2]
absolute_value[N-1] = absolute_value[N-2] + relative_value[N-1]

This method is efficient because points within a feature are usually close to neighboring points.

Packing integer values

Relative-offset values are generally represented with fewer bytes than absolute values. The relative-offset values are packed into a series of bytes. The high-order bit of each packed byte acts as a control bit to indicate whether the (integer) value continues into the next byte. For example, if an integer value is packed into three bytes, the high-order bit of bytes one and two is set (indicating the integer value continues into the following byte) and the high-order bit of byte three is not set (indicating that it is the last byte of the integer value). The second bit of the first byte acts as a sign bit. So the first packed byte contains one control bit, one sign bit, and six data bits. All subsequent packed bytes contain one control bit and seven data bits. Because fewer bits are available to represent an integer, up to five packed bytes could be required to represent an integer value (this is a worst case scenario and would only occur when the integer value was greater than 134,217,727).

The record layout, by byte, for packed unsigned integers

Byte Number	Bit(s)	Value
0 Byte	0	Control bit (0 = last byte, 1 = integer value continues into the next)
0	1	Sign bit (0 = positive integer, 1 = negative integer)
0	2-7	Next low-order six bits of the integer
1-4 Byte	0	Control bit (0 = last byte, 1 = integer value continues into the next)
1-4	1-7	Next low-order seven bits of the integer

Integer values are packed by taking the low-order six or seven bits (by performing a binary OR operation between the value to be packed and the hexadecimal values 3F or 7F, respectively), depending on which packed byte the value is being stored in, and storing them in the packed byte. The original value is then shifted to the right (i.e., dividing the value) by six or seven bits. If the new, shifted value is nonzero, the control bit in the packed byte is set, and the steps are repeated again. This process continues until the shifted value is zero. Unpacking is done in a similar manner, but in the reverse order.

Binary layout

In addition to the compressed coordinate values, additional information is stored within the byte stream to provide information about the stored coordinate values. The first eight bytes of the byte stream are reserved for the additional information. Currently, two pieces of additional information are stored within the byte stream: the size of the compressed point byte stream and the dimension of the stored coordinates. Both values are stored as packed integer values (as described previously). The length of the coordinate byte stream is defined as the total length minus the reserved eight bytes (i.e., the size of the compressed point byte stream) and is stored in the first five bytes.

The coordinate dimension indicates whether z and m values are present in the byte stream. The dimension is a one-byte bit vector and is stored in the sixth byte of the byte stream. The first low-order bit of the dimension vector indicates whether z values are present, and the second low-order bit indicates whether measure values are present. If the bit value is turned off (zero), then the corresponding values are not present in the byte stream. If the bit value is turned on (1), then the corresponding values are present. For example, the dimension vector for two-dimensional coordinates has a hexadecimal value of zero (0), for three-dimensional coordinates a value of one (1), for measured two-dimensional coordinates a value of two (2), and for three-dimensional coordinates with measures a value of three (3). The next two bytes of the byte stream are not used currently, but are reserved for future use. The compressed coordinate values are stored in the byte stream following the reserved eight bytes.

The binary representation of a feature's geometry in ArcSDE

Byte Number	Value
0-4 Bytes	Coordinate stream length, packed integer format (byte stream length minus 8 reserved)
5	Coordinate dimension mask, packed integer format
6	Annotation dimension and entity type bitmask
7	Transmitted shape flags bitmask
8+	Compressed coordinate values, packed relative-offset format

feedback | privacy | legal