How to Fix XML Files with Bad Encoding

Liz Sanderson
Liz Sanderson
  • Updated

FME Version

  • FME 2021.0

Introduction

Sometimes an XML file cannot be worked with because it contains an invalid encoding or is marked with the wrong character encoding scheme. FME might not recognize the file as XML, and when viewing the file in a browser or other tool, the error might appear with a message like "Encoding error" or "invalid character was found in text content."

Fixing this problem means re-encoding the XML data with the proper encoding scheme. To do this, create an FME Workspace to convert the XML file using a Text File Reader/Writer and set the encoding parameters. This article walks through the steps to building the workspace from scratch and includes a completed version in the attachments.

Note that FME's Text File Reader/Writer is used and not the built-in XML Reader/Writer because FME may not recognize a file with an invalid encoding as XML. This is therefore a helpful preliminary step in repairing an XML file before working with it in FME.
 

Step-by-Step Instructions

Fixing an XML file with an invalid encoding can be done by creating a basic FME Workspace that has a Text File Reader and a Text File Writer.

In this example, we are working with an XML file that is marked with the wrong encoding. When viewing it in Notepad, we can see that it is marked as UTF-8 but contains accented characters in iso-8859-1:

sourcefile.PNG

The file displays an error when we try to view it in a browser or work with it in other tools:

error.PNG

We will use FME to convert the file to a UTF-8 encoding.

1. Add a Text File Reader
Open FME Workbench. Click the Reader icon to open the Add Reader dialog and set the following parameters:

  • Format: Text File
  • Dataset: C:\<Path to file>\XMLEncodingError.xml
  • Parameters > File Contents > Character Encoding: Latin-1 Western European (iso-8859-1)

Note when choosing the dataset that you might need to change the filter to "All Files", as the default is to search for .txt files only.

reader.PNG
readerparams.PNG

Click OK to add the Reader to the workspace.

 

2. Add a Text File Writer
Click the Writer icon to open the Add Writer dialog and set the following parameters:

  • Format: Text File
  • Dataset: C:\<Path to file>\FixedXMLFile.xml
  • Parameters > File Contents > Character Encoding: UTF-8
  • Parameters > File Contents > Write UTF-8 Byte Order Mark: Yes

writer.PNG
writerparams.PNG

Click OK to add the Writer to the workspace.

3. Connect the Reader and Writer
Drag a connection line from the Reader feature type to the Writer feature type. The finished workspace should look like this:

Screen Shot 2021-04-05 at 12.31.11 PM.png

4. Run the Workspace
Click the Run icon to run the workspace. The XML file will be converted to the UTF-8 encoding set in the Writer parameters.

Capture.PNG

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.