FME Version
Introduction
FME has a variety of encoder/decoder transformers available. These include:
While these transformers all modify attribute encoding systems, they differ in their approaches. This article will provide an overview of each FME encoding/decoding transformer to help you select the best one for your workspace needs. This article assumes prior knowledge of character encoding systems. For more information on character encoding, please visit the related Wikipedia page.
AttributeEncoder
The AttributeEncoder modifies an input attribute’s character encoding system by either marking the attributes with the new encoding system without performing character conversion or by both marking the attributes with and converting the characters to the new encoding system. The encoding systems most familiar to users are those that are ASCII-based. ASCII character encoding systems use 128 different character values (codes) to represent the most common English letters, numbers, and symbols. However, non-ASCII encoding systems are also available with this transformer to allow the encoding of non-English characters.
This transformer can receive any type of data. It also includes more options for output encoding systems than do the other transformers, allowing you to easily convert between a variety of different language encodings or even Unicode. If incoming attributes are Null, the encoding system of the attribute will still be modified to the specified output system; however, the attribute values will remain Null.
There are two ways that the encoding is handled by the transformer, specified by the Incoming Attribute parameter:
- If Honor Encoding is chosen, the transformer will attempt to convert the input attributes from one encoding system to another. If a character from the input attribute cannot be found in the target encoding system, the transformer will fail with an error.
- If Use Bytes is chosen, the transformer will change the encoding system of the attribute but will make no attempt to convert the characters into those represented by the new encoding system. This option is best when input attributes contain characters that are not found in the target encoding system, as it allows the translation to continue despite missing/unidentified target characters.
Additionally, there are two included encoding systems to be aware of which provide instructions to FME:
- Binary (fme-binary) - this setting labels the attribute as binary data that should not be interpreted as characters. FME will display these attributes with Hex equivalents of the byte values.
- System Default (fme-system) - this setting tells FME to interpret the characters in the default operating system encoding scheme, which can differ between different language versions of Windows.
BinaryEncoder and BinaryDecoder
The BinaryEncoder converts binary data into encoded text by making use of either Base64 or Hexadecimal encoding systems (both ASCII-related). This is useful when a binary file (such as an image or email attachment) needs to be included/embedded within a text file (e.g. an HTML document). This transformer is useful when transmitting or receiving data from web services, whose protocols often limit the type of data exchanged to text-based content. The BinaryEncoder transformer can accept any data type and will output a new attribute containing the encoded text values.
The BinaryDecoder performs the reverse operation of the BinaryEncoder, decoding Base64 or Hexadecimal encoded text attributes into binary data. This transformer offers output options similar to the AttributeEncoder; however, the input attributes must be encoded in Base64 or Hex for use in the BinaryDecoder.
TextEncoder and TextDecoder
Web URLs, XML, and HTML have a number of characters with specific meanings within their text. For example, a ? within a web URL represents the end of the main page address and the beginning of a query. These unique, meaningful characters must be encoded properly within the attributes of a dataset to avoid being misinterpreted as plain text. The TextEncoder will do just that by encoding text strings such that they are properly interpreted for inclusion in a URL, in HTML or XML documents, or the like. For more information on these encoding systems, please see:
The TextEncoder also offers the same output Base64 and Hex options that the BinaryEncoder offers. However, the TextEncoder will convert input attribute text to UTF-8 first, and then proceed to encode the character bytes as Base64 or Hex encoded text. Since the TextEncoder specializes in working with web-compatible encoded text formats, and with UTF-8 being such a commonly used encoding system for the web, it is advantageous for the TextEncoder to first encode to UTF-8. If this behavior is undesirable, consider instead using the BinaryEncoder for Base64 and Hex encoding. The output of the TextEncoder is a new attribute containing encoded text values.
The TextDecoder transformer performs the reverse operation of the TextEncoder. It decodes a string attribute from encoded text to plain text. This transformer supports a number of encoded text types as input, including URL, Unicode, XML, HTML, Base64, Hex, and Octal. As output, the TextDecoder produces a new attribute containing plain text values.
Comments
0 comments
Please sign in to leave a comment.