FME Version
Easier approaches to reading XML are now available in FME. See XML Reader Configuration or Reading Complex XML or GML using the XMLFlattener
Introduction
Many users have problems reading complex xml or gml. The way to do this is with FME’s XML reader, either with Feature Paths, which queries the XML at a given node with the option of flattening, or with an xfMap which gives you a wide range of options for both querying the XML and building features. The basic idea with xfMaps is that you specify the which node within the xml structure you want to make into a feature type in the feature mapping section. Then you specify what each of these features contains in the feature content map section.
XFmap Sequenced Attributes
However, sometimes this can be difficult because your source xml might not have any schema. Rather, in some cases the schema is embedded within the data itself.
Consider the following xml:
<Feature> <property typeName="attribute1" type="string">John</property> <property typeName="attribute2" type="string">Vancouver</property> <property typeName="activeDate_from" type="string">11-22-99</property> <property typeName="activeDate_to" type="string">12-11-09</property> </Feature>
We might identify the Feature node as the element we want to capture as our feature type. However, simply having one attribute called property would not be very useful as we might get repeating columns or a list such as:
property1 = John property2 = Vancouver etc
A better approach would be to dynamically build the schema from the name-value pairs, so that the name becomes the attribute name and the value becomes the attribute's value. We can do this with the following 'sequenced' xfmap:
<mapping match="property"> <attributes> <attribute type="sequenced"> <name> <extract expr="@typeName"/> </name> <value> <extract expr="."/> </value> </attribute> </attributes> </mapping>
The 'mapping match="property"' just selects each property element. 'name extract @typeName' creates an attribute whose name is stored in the @typeName within the property tag, and assigns the value associated with the same property element.
So for
<property typeName="attribute1" type="string">John</property>
the xfmap creates a field called attribute1 and stores the value of property in it which is 'John'.
This xfmap can then read the above xml and generate the following feature from it:
attribute1 = John attribute2 = Vancouver activeDate_from = 11-22-99 activeDate_to = 12-11-09
Considerations
If you do have optional or multiple geometry, then you will need to enable aggregates. I will explain how to do this in a separate example soon.
Source XML example
Suppose we want to read the source xml below. The dynamic schema approach above would allow us to read all the attributes below the <Feature> tag.
<?xml version="1.0" encoding="UTF-8"?> <FeatureCollection> <Feature identifier="1001"> <Coordinate_BOX identifier="101"> <coords>-123.1,49.25 -122.9,49.15</coords> </Coordinate_BOX> <property typeName="attribute1" type="string">John</property> <property typeName="attribute2" type="string">Vancouver</property> <property typeName="activeDate_from" type="string">11-22-99</property> <property typeName="activeDate_to" type="string">12-11-09</property> </Feature> <Feature identifier="1002"> <Coordinate_BOX identifier="102"> <coords>-122.8,49.12 -122.5,49.0</coords> </Coordinate_BOX> <property typeName="attribute1" type="string">June</property> <property typeName="attribute2" type="string">Surrey</property> <property typeName="activeDate_from" type="string">02-25-05</property> <property typeName="activeDate_to" type="string">9-15-10</property> </Feature> </FeatureCollection>
Adding Geometry to Dynamic Schema XFMaps
How to add geometry? In most of the dynamic schema cases I have seen, the geometry itself is not completely dynamic. There are usually known set of geometries for that type of xml. Each feature may or may not contain each geometry, but there has to be some predefined way that the geometries are stored, or it would be too difficult to work with. Of course if each feature just had coordinate values embedded in the attributes, FME could always convert those to points from within Workbench using a 2dPointReplacer.
For the example here we just have a bounding box similar to what we had in the basic example. Note that we do not need to define an exception since the first mapping match only matches with property and not with Coordinate_BOX, so all we have to do is explicitly match on Coordinate_BOX and then build the geometry the same way as we did in the 'Basic' example:
<mapping match="Coordinate_BOX"> <geometry activate="xml-box"> <data name="data-string"> <extract expr="./coords"/> </data> </geometry> </mapping>
Reference Maps to add Geometry Traits
Before we wrap up, how about adding some geometry traits? We can see from the source data above that the geometries have unique identifiers associated with them:
<Coordinate_BOX identifier="101">
This can become particularly important to preserve when you have more than one geometry per feature and you need to be able to identify them individually as is the case in GML 3.2.1.
We can capture this value using a reference map to the feature map as follows:
<feature-map> <mapping match="Feature"> <feature-type> <matched expr="local-name"/> </feature-type> <references> <reference> <name> <literal expr="identifier"/> </name> <value> <extract expr="@identifier"/> </value> </reference> </references>
This just tells FME to create a reference called identifier and store the value of identifier in it whenever the reader encounters one as it traverses the source xml. Then, whenever we want to retrieve the value of identifier, we will get the last one stored (LIFO). This allows a child element to retrieve a value set while scanning a parent element.
So then, we can modify the geometry section to include traits as follows:
<mapping match="Coordinate_BOX"> <geometry activate="xml-box"> <trait> <name> <literal expr="identifier"/> </name> <value> <refexpr expr="identifier"/> </value> </trait> <data name="data-string"> <extract expr="./coords"/> </data> </geometry> </mapping>
Instead of defining an attribute, we define a trait, and instead of an extract expression, we use a refexpr or reference expression to retrieve the value we set earlier in the reference creation above. Note that we could have defined a new reference map section instead of placing it in the feature map, but this would only be needed if we wanted to be able to capture identifiers from elements that were not Feature elements or within them.
Putting it all together, we now have an xfmap which will create a new feature type for <Feature>, dynamically capture all the elements defined within the <property> tag as attributes, and add box geometry to this.
Completed 'Dynamic Schema Example' XFmap
<?xml version="1.0"?> <xfMap xmlns:xlink="<a target="_blank" href="http://www.w3.org/1999/xlink">http://www.w3.org/1999/xlink</a>"> <feature-map> <mapping match="Feature"> <feature-type> <matched expr="local-name"/> </feature-type> <references> <reference> <name> <literal expr="identifier"/> </name> <value> <extract expr="@identifier"/> </value> </reference> </references> </mapping> </feature-map> <feature-content-map> <mapping match="property"> <attributes> <attribute type="sequenced"> <name> <extract expr="@typeName"/> </name> <value> <extract expr="."/> </value> </attribute> </attributes> </mapping> <mapping match="Coordinate_BOX"> <geometry activate="xml-box"> <trait> <name> <literal expr="identifier"/> </name> <value> <refexpr expr="identifier"/> </value> </trait> <data name="data-string"> <extract expr="./coords"/> </data> </geometry> </mapping> </feature-content-map> </xfMap>
Comments
0 comments
Please sign in to leave a comment.