Home    |    Concepts   |   API   |   Samples
Home > API
Unicode Support in ArcSDE

Unicode provides a unique number or code for every type of character in every known language. A Unicode-enabled application will be able to handle character data in any language on any operating system. This makes a single implementation of an application sufficient for worldwide deployment. For more information on the Unicode standard, please refer to the Unicode home page.

Developing Unicode Applications

In releases 8.x through 9.1 of ArcSDE, the type CHAR always resolved to the ANSI standard character data type. Beginning with ArcSDE 9.2, the type CHAR can also be defined to use the UTF16 data type for Unicode. Support for the Unicode data type is enabled by adding the term SDE_UNICODE to the list of preprocessor definitions during compilation. As shown below, if SDE_UNICODE is defined, the C preprocessor resolves the type CHAR to the UTF16 character encoding data type. If SDE_UNICODE is not defined, the type CHAR resolves to the ANSI standard character data type. Thus, when building custom C API applications with ArcSDE 9.2 or later, you can build either Unicode- or ANSI-based applications.

#ifdef SDE_UNICODE
   #define CHAR SE_WCHAR
#else
   #define CHAR ACHAR
#endif

#define ACHAR char /* ANSI character */
#define SE_WCHAR unsigned short /* UTF16 encoding character */

Also, when SDE_UNICODE is defined, all character variables and parameters used in the ArcSDE C API objects and functions will use the Unicode data type. For example, if SDE_UNICODE is defined, the variable column_name in the ArcSDE object SE_COLUMN_DEF and the table parameter in the API function SE_table_delete will both resolve to the Unicode data type SE_WCHAR. Thus, all character variable and parameter values must be in the Unicode character encoding format, not the ANSI character encoding format. The section “Character conversion” in this topic explains how to convert data between the Unicode and ANSI encodings.

LONG SE_table_delete(SE_CONNECTION connection, const CHAR *table);

typedef struct
{
   CHAR column_name[SE_MAX_COLUMN_LEN]; /* the column name */
   LONG sde_type; /* the SDE data type */
   LONG size; /* the size of the column values */
   SHORT decimal_digits; /* number of digits after decimal */
   BOOL nulls_allowed; /* allow NULL values ? */
   SHORT row_id_type; /* column's use as table's row id */
} SE_COLUMN_DEF;

Upgrading existing applications

An existing ArcSDE C-API application can be recompiled and built using newer ArcSDE client libraries. No additional modifications are required. This application is an ANSI solution and cannot handle Unicode data. However, this ANSI application can be used to work with ArcSDE 9.2 or later releases, as well as ArcSDE Server releases prior to 9.2. Even though ArcSDE 9.2 and later releases handles data in Unicode, it automatically handles conversion between Unicode and ANSI. To upgrade existing applications and support Unicode character data, you have to update all character-literal data in your application and recompile, adding the preprocessor definition for Unicode, SDE_UNICODE.

Since the Java language uses the Unicode standard for the representation of character data, no modifications are required for ArcSDE Java API applications.

Character conversion

To convert character data from one character set to another, you can use one of the following options.

  • IBM ICU
    The International Component for Unicode (ICU) is a mature, portable set of C/C++ and Java libraries for Unicode support, software internationalization (I18N), and globalization (G11N), giving applications the same results on all platforms. Please refer to the IBM ICU home page for more details.
     
  • Windows Unicode Conversion Utility
    On Windows operating systems, the Unicode conversion functions MultiByteToWideChar and WideCharToMultiByte can be used to convert between different character sets.
     
  • UNIX iconv
    On UNIX operating systems, the code conversion function iconv can be used to convert character data between different character sets.


 

feedback | privacy | legal