How to Consume and Produce XML using Application Schemas with the XSD-Driven XML Writer

Files

2b. Using the XSD Driven XML Writer.zip
- 60 KB
- Download
XSDDrivenXMLWriter_FeatureTypes.fmwt
- 70 KB
- Download
2b. Using the XSD Driven XML Writer.zip
- 60 KB
- Download

Introduction

If you are new to XML, this tutorial series describes how to read, process, and write XML in FME. XML Schema Definition, commonly abbreviated as XSD, was created to standardize the way elements are described in XML documents.

As you may be aware, Extensible Markup Language (XML) is extremely versatile, making it appealing to many applications. However, XML can become complex, especially if documents are highly nested. XSD is one of multiple methods adopted to combat this issue.

XSD was introduced in 2001, after the World Wide Web Consortium (W3C) recognized it as a beneficial way to structure, define, and restrict elements. Uniquely, in FME, XSD-Driven XML Reading/Writing gives you the advantage of interpreting XML elements as FME features.

Where is XSD-Driven XML Used?

XSD-Driven XML is beneficial when an application schema is available, and the user wants to preserve it.

When FME accesses the XSD, XML elements become schema-driven rather than data-driven. Though the XML reader is frequently used to convert XML to GIS, it doesn’t support complex schemas (no schema support). Similarly, previously, complex XML writing used to require an XMLTemplater, but now XSD-Driven XML can comply solely with the respected schema. Note that for GML, it is usually preferable to use the GML reader/writer unless there is some aspect of the schema that is not supported, in which case XML XSD might help (such as for unsupported element or geometry types).

Format Similarities:	Format Differences:
XSD is written with XML XSD and XML are both object-oriented (flat schema potential) Both formats satisfy to W3C recommendations Both are human and machine-readable Store a variety of structured information	XSD defines elements and structures, where XML describes data XSD focuses on XML interpretation XSD can be used for XML validation XSD can restrict node values XSD can be used to create custom data types XML generally requires more configuration if an XSD is not present.

Step-by-step Instructions

The purpose of this exercise is to demonstrate writing XML using the new XSD-driven XML writer. In this scenario, we are creating a natural disaster alert.

1. Start FME Workbench

Open FME Workbench on your machine and open a blank workspace. Note: if you are short on time, open XSD-DrivenWriter-Begin.fmw and skip to step 4. This starting workspace has the demo data already created.

2. Add a Creator

Add a Creator transformer to the canvas. In the parameters set, Geometry Object to Box. Leave all other parameters as default.

3. Add an AttributeCreator

Now we are going to create information content. Add an AttributeCreator, and connect it to the Creator. Double-click the transformer to access the parameters. In the parameters, add the following attributes and corresponding values:

New Attribute:	Attribute Value:
language	en-US
category{0}	Safety
responseType{0}	Monitor
event	Flood
urgency	Expected
severity	Moderate
description	Test Message: Monitor flood levels on the Fraser River near Fort Langley.
certainty	Observed
contact	lizard@safe.com

4. Add a DateTimeStamper

Next, add a DateTimeStamper and connect it to the AttributeCreator. In the parameters, change the Time Zone to Local and the Output Attribute Name to sent.

5. Convert to ISO Datetime Format

For the XML document to be validated, the datetime value must conform to the ISO 8601 format. Add a DateTimeConverter and connect it to the DateTimeStamper. In the parameters, set the Datetime Attributes to sent. Then, for Input Format select FME, and for Output Format select %Y-%m-%dT%H:%M:%S (ISO datetime).

6. Add a GeometryExtractor

One limitation with the XSD XML reader/writer is that there is no geometry support out of the box - any geometry handling needs to be managed as XML fragments. To this end, we will start generating geometry XML using the GeometryExtractor. We use GML 2.1.2 since it uses an XML coordinate syntax similar to what is required for this application schema. Add a GeometryExtractor and connect it to the DateTimeConverter. In the parameters, set the Geometry Encoding to GML v2.1.2; there are no other parameters to set.

7. Flatten XML

We are going to use an XMLFlattener to flatten the content of XML element(s) into feature attributes. Add an XMLFlattener to the canvas and connect it to the GeometryExtractor. In the parameters, set the XML Source Type to Attribute with XML document, then select _geometry as the XML Attribute. For Elements to Match, type in coordinates. Expand Expose Attributes, then for the Attribute to Expose type in coordinates.

Make sure to set the XML Source Type as Attribute with XML document, and the XML Attribute to ‘_geometry’ (the Destination Geometry Attribute we created in the GeometryExtractor).

8. Add Area Properties

Add another AttributeCreator to the canvas and connect it to the XMLFlattener. We will use this transformer to add area properties to our emergency alert. In the parameters, we will define two new attributes to store area coordinates and description.

New Attribute: area{0}.polygon{0}
- Attribute Value: coordinates
New Attribute: area{0}.areaDesc
- Attribute Value: Fraser River near Fort Langley

9. Add an Aggregator

Next, we will need to generate lists for the document's nested elements. Add an Aggregator to the canvas and connect it to the AttributeCreator_2.

In the parameters, set the Aggregation Mode to Geometry - Assemble One Level. Next, set the Accumulation Mode to Use Attributes From One Feature. Lastly, we will generate a list. Enable Generate List by clicking the checkbox, then set the List Name to info. Click on the ellipsis for Selected Attributes and click Select all, then unselect _creation_instance and _geometry.

10. Define Root Alert Content

Before writing out, add a final AttributeCreator to define the root alert content. Use the table below to add the alert property values (feel free to have fun with the values):

New Attribute: identifier
- Attribute Value: @UUID()
New Attribute: sender
- Attribute Value: Lizard
New Attribute: addresses
- Attribute Value: 9272 Glover Road, Langley, BC
New Attribute: source
- Attribute Value: FME
New Attribute: status
- Attribute Value: Exercise
New Attribute: msgType
- Attribute Value: Altert
New Attribute: scope
- Attribute Value: Public

11. Add an XSD-Driven XML Writer

The data is now ready to be written out. Add an XSD-Driven XML writer to the canvas, and browse to an output folder. Name the dataset Alert_Output.xml, then change the Feature Type Definition to Import from Dataset. Before clicking OK, open the Parameters.

In the parameters, set the Application Schema to …\cap.xsd. Then, for Feature Paths, click on the ellipsis and select cap:alert. This is also a beneficial time to validate the XML output. Set Validate Dataset to Yes. Also, enable Pretty Print to Yes. Click OK twice.

In the Import Writer Feature Types dialog, browse to the cap.xsd dataset, then click OK. You may need to switch the file type to All Files (*) in your file browser to see the cap.xsd file.

Connect the alert writer feature type to the AttributeCreator_3.

12. Run and Inspect

Run the workspace, then input the output using Notepad++ or a similar text editor. Another method of validating the output is by reading it back in using FME Data Inspector.

Alternative Approach with Feature Types

The XML document created in the preceding sections is a hierarchical structure in the form:

alerts
info
area

For any given alert, there might be several info objects, and similarly, each info object might consist of several areas. The approach illustrated above used FME list attributes to mimic the XML hierarchy. Generating the FME list attributes can be confusing. If your source data is more relational in nature, then an alternative approach is to use the XSD Driven XML Writer's ability to represent each XML object as a separate FME Feature Type and build the associations (and XML hierarchy ) using xml_id and xml_parent_id format attributes. The attached workspace (XSDDrivenXMLWriter_FeatureTypes.fmw) uses the same cap.xsd as before. But in this case, when adding the XML Driven XSD Writer, import the feature types, using the cap.xsd as a template: using the ‘alert’, ‘info’, and ‘area’ as the feature paths:

When adding the XSD Driven XML Writer, use Import Feature Types:

and point at the same objects for the Feature Paths.

This will result in each selected object appearing as a Feature Type in the FME Workspace. You can now directly map the source data to the target XML. If the source data is relational, you might already have parent/child relationship IDs on your input features. The xml_id and xml_parent_id are used by the FME XML writer to construct the XML association and hierarchy.

Note: the xml_id must be unique for the entire dataset. In this example, we use a prefix on the ‘alertIdentifier’ attribute to create a unique ID between the info & area objects

The final workspace looks like this;

Validation

The purpose of validation is to ensure documents comply with XML syntax and XSD standards. Typically, to validate XML files in FME, the process is enabled in the writer (Validate Dataset: Yes), or include an XMLValidator transformer in your workspace. If the XML is not valid, the log will briefly report errors, including the line and column numbers and a brief description of the error. Common errors include missing IDs, incorrectly formatted date fields, missing elements, and namespace elements. Remember, for both reading and writing XML, element order matters!

Unfamiliar with XML error notation? XML error messages come from the Apache Xerces parsing library (open source). Let’s understand the partial example below:

ERROR | … for content model ‘(identifier,name,phoneNum?,email?,gender+,references?,note*)

Elements in the model list that have no special trailing notation are ‘identifier’ and ‘name’ - these are required properties. Optional items in the content model are followed by a ‘?’, like phoneNum and email. The remaining notations denote multiple (*), list (*), or restricted-domain (picklist) (+) data types. For example, gender could have the following picklist values: Female, Male, or Other.

The next examples are specific to XML validation. The error below will appear if you are missing an element.

ERROR |XML Validation: Error in ‘6. EmergencyAlters_CAP1.2 XML\capOut.xml’ on line 26, column 13: ‘element ‘certainty’ is not allowed for content model ‘(language?, category+,event,responseType*,urgency,severity,certainty,audience?,eventCode*,effective?,onset?,expires?,senderName?,headline?,description?,instruction?,web?,contact?,parameter*, resource*,area*)

This message may seem misleading at first. It sounds like the element ‘certainty’ is not allowed. This is not quite accurate. What it really means is that in this case, ‘certainty’ is not allowed in that order, because a required element that precedes it is missing. In this case, the severity property is missing, and so when the parser encounters the certainty element, it flags this warning.

We see a similar problem below:

ERROR |XML Validation: Error in ‘6. EmergencyAlters_CAP1.2 XML\capOut.xml’ on line 27, column 13: ‘element ‘source’ is not allowed for content model ‘(identifier,sender,status,msgType,source?,scope,restriction?,addresses?,code*,note?,reference?,incidents?,info*)

Here, ‘msgType’ is missing, and so when the validator encounters this ‘source’ element, the above warning is generated.

As mentioned earlier, XML can be highly nested, which means both parent and child elements need to be included to access information throughout the document. Often this is modeled by the presence of parent and child XML IDs, such as xml_parent_id and xml_id. It is important to check the structure of your output to make sure it is correct. You may have orphaned features, and depending on the schema, this may or may not generate a warning. However, when you look at the XML output, you may notice features at the root level that really belong within a parent.

One good way to understand this better for a given schema is to find a valid sample XML dataset, read it with FME, and log some features with a Logger. The resulting features will show the parent/child IDs that result from FME serializing the nested structure into FME features. Once your XSD-Driven XML document is successfully validated, turning off validation can improve writing performance.

Considerations

After completing this tutorial, you will have successfully written a disaster alert using CAP1.1 XSD. As you just witnessed, no XMLTemplator or XMLValidator was required in our workspace. The XSD-Driven XML format reader/writer can be used to optimize workflows and replace a more manual approach to working with XML schemas.

Search