Using the field mapping control

When merging several input datasets into a single output dataset, the field structure and contents are a consideration. Each input dataset will contain fields that also exist in other input datasets, as well as fields that are unique to only that dataset. How these fields are managed determines the field structure and content in the output dataset. The Field Mapping control allows you to define this output dataset field structure.

All input dataset fields will be mapped to the output dataset. When there is field duplication (based solely on name) between all the inputs, the output dataset field will be a combination of each occurrence. All unique input dataset fields (those not found in other input datasets) are also mapped to the output dataset.

It is possible for a field map's subfields to be of varying data types. In this case, the output field's data type is set to the data type of the first input dataset, and all other subfields are cast to this type. For example, the first input field is text, and the second input field (of the same name) is double. The output data type will be text, and the values in the second input field will be converted to this type. There may be instances when conversion is not possible, and errors will be raised during execution that state where the problem is. For instance, a BLOB field can't be converted to any data type other than BLOB. Also, an alphanumeric field would have to be truncated (start point and end point stated) to remove the alphabetic portion of its values if the field is to be converted as numeric type.

Field mapping is displayed graphically in a catalog tree. All root (top-level) entries are the output dataset fields. You will see their names and their default output data types. The default output field data type is the same as the first input's data type.

Each root-level entry is expandable. When expanded, it will display all subfields from which data will be gathered to populate the output field. For each occurrence of a field (by name) in the input dataset, a subfield entry will appear, showing its source and its data type. The first input dataset's occurrence of a field will be the first to appear in the subfield list.

This is the default structure of the output dataset fields. You may, at any time, add or delete input datasets; add, delete, or rename output dataset fields; and add, delete, rename, or custom-format subfields.

Modifying the default field mappings

The default field mapping structure can be modified at any time. Output fields can be added, deleted, or renamed. Subfields can also be added, deleted, or renamed, and in the case of text output fields, they may be formatted. In the field mapping tree view, shortcut menus (accessed by right-clicking a selected field, subfield, or in the white space) can be used to make these modifications.

The shortcut menu for output fields has the following options: Add Input Field, Delete, Rename, and Properties.

The shortcut menu for subfields has the following options: Delete and Format (text fields only).

The shortcut menu of the field mapping pane (white space) has the following options: Add Output Field and Reset.

Merge rules

Each output field allows you to set certain properties, such as name, type, and merge rule. The merge rules allow you to specify how values from two or more input fields (subfields) are merged into a single output value. Null values are excluded from all statistical calculations. There are several merge rules that you can use:

Setting the Field Map parameter in scripting

In scripting, field mapping can be done in one of the following ways:

  1. Entering a string value for a field map parameter—this is recommended only when the number of fields is small and little (if any) alteration of the fields is required.
  2. Create and use a FieldMappings object.