Introduction
Sometimes an XML file cannot be processed because it contains an invalid encoding or is marked with the incorrect character encoding scheme. FME may not recognize the file as XML, and when viewing the file in a browser or other tool, an error message may appear, such as "Encoding error" or "invalid character was found in text content."
Fixing this problem means re-encoding the XML data with the proper encoding scheme. To do this, create an FME Workspace to convert the XML file using a Text File Reader/Writer and set the encoding parameters. This article walks through the steps to building the workspace from scratch and includes a completed version in the files section
Note that FME's Text File Reader/Writer is used and not the built-in XML Reader/Writer because FME may not recognize a file with an invalid encoding as XML. This is therefore a helpful preliminary step in repairing an XML file before working with it in FME.
Step-by-Step Instructions
Fixing an XML file with an invalid encoding can be done by creating a basic FME Workspace that has a Text File Reader and a Text File Writer.
In this example, we are working with an XML file that is incorrectly encoded. When viewing it in Notepad, we can see that it is marked as UTF-8 but contains accented characters in ISO-8859-1:
The file displays an error when we try to view it in a browser or work with it in other tools:
We will use FME to convert the file to a UTF-8 encoding.
1. Add a Text File Reader
Open FME Workbench. Click the Reader icon to open the Add Reader dialog and set the following parameters:
- Format: Text File
- Dataset: C:\<Path to file>\XMLEncodingError.xml
- When choosing the dataset that you might need to change the filter to "All Files", as the default is to search for .txt files only.
Click on the Parameters button.
- File Contents
- Character Encoding: Latin-1 Western European (iso-8859-1)
Click OK to add the Reader to the workspace.
2. Add a Text File Writer
Click the Writer icon to open the Add Writer dialog and set the following parameters:
- Format: Text File
- Dataset: C:\<Path to file>\FixedXMLFile.xml
Click the Parameters button.
- File Contents
- Character Encoding: UTF-8
- Write UTF-8 Byte Order Mark: Yes
Click OK twice to add the writer to the workspace.
3. Connect the Reader and Writer
Drag a connection line from the Reader feature type to the Writer feature type. The finished workspace should look like this:
4. Run the Workspace
Click the Run icon to run the workspace. The XML file will be converted to the UTF-8 encoding set in the Writer parameters.