Easier approaches to reading XML are now available in FME. See XML Reader Configuration or Reading Complex XML or GML using the XMLFlattener
Introduction
For some recent work in FME where we added the ability to call the GEOSisValid and isSimple geometry validation tests on FME features (via the GeometryValidator transformer and the @OGCGeometry function), we decided it would be nice to use the existing GEOS test data for these cases to ensure that our implementation correctly reflected the GEOS one. Now, GEOS is a C++ port of the Java Topology Suite (JTS), and shares with it a xmlTester program that reads tests expressed in xml. So the idea was to set something up in FME which would read the JTS/GEOS test files that were used for testing the isSimple/isValid so that we could just easily reuse those tests, plus easily take advantage of any future tests they may make.
So we set out to "eat our own dogfood" and use the generic XML reader in FME to read the JTS/GEOS test files, which look like this:
<run> <precisionModel scale="1.0" offsetx="0.0" offsety="0.0"/> <case> <desc>L - linear-ring bowtie</desc> <a>LINEARRING(0 0, 100 100, 100 0, 0 100, 0 0)</a> <test> <op name="isValid" arg1="A">false</op> </test> </case> <case> <desc>L - linestring bowtie</desc> <a>LINESTRING(0 0, 100 100, 100 0, 0 100, 0 0)</a> <test> <op name="isValid" arg1="A">true</op> </test> </case> ... </run>
Where to Start?
Naively, the initial starting point was to just see if our XML (Generic) reader in FME could swallow this, without any adjustments or configuration. So we fired up workbench and tried to add a source dataset that was XML (Generic) as its format, with no xfMap or other settings specified.
No go.
So we had to make an xfMap.
Having never done this, and without access to any of those in the know at Safe due to it being the night before Christmas, we rolled up the sleeves and opened up the documentation. In the XML reader section there is a big subsection with lots of topics on the xfMap, with some examples in them. (You have to click the "show" button once you get in that documentation in order to see the subsections and visit through them.) But there wasn't a particularly simple example to work through or look at. And so the idea was born to write this very article if (spoiler alert: and when) success was reached.
So to further the game, a visit to the FME_HOME/xfmap directory was in order. There's nothing like a good example to sink your teeth into. And there are plenty of quite complex examples in that directory. Remembering that GPX was a relatively simple format, that xfMap was looked at first. It too was more complex that what we needed here, but with that and the doc, some progress could be made.
First Attempts
Bravely then we made our first xfMap. We put it in the same directory as the data (spoiler alert: bad idea), and soon it looked like this:
<?xml version="1.0"?> <!DOCTYPE xfMap SYSTEM "xfMap.dtd"> <xfMap> <feature-map> <use-mappings/> </mapping> </feature-map> </xfMap>
Then we once again went to workbench, tried to add a source dataset of XML Generic, and this time pointed our settings box to the xfMap file we'd just made (and saved as jts_tests.xmp).
No go.
Somehow it couldn't find the xfMap.dtd, so rather than figure that out, we decided to just put the xfMap into the FME_HOME/xfmap directory with its cousins (and xfMap.dtd). And then we tried again to add the source dataset, pointing at the newly relocated xfMap.
Bingo.
Describing Schema
Sort of.
We at least didn't get an error, but we also got an oddly named feature type as a source schema, with no attributes.
A bit of reading and poking around in the xfmap directory revealed that it is possible to specify the schema of the feature you'd like returned by the XML reader right in the xfMap. In our case, if we remember again what we'd like returned, we basically want one feature per case in the xml file, and each one would have a description, a geometry string (which is in OGC Well Known Text (WKT), a test type (isValid or isSimple), and a test result. These features would not themselves come out with any geometry, so we would indicate for FME's schema processor that they had no geometry (xml_no_geom). And the feature type might as well be TestCase (we could pick anything for it, but that sounds good). So, with this in mind, here's what the schema portion of the xfMap looks like:
<?xml version="1.0"?> <!DOCTYPE xfMap SYSTEM "xfMap.dtd"> <xfMap> <schema-type> <inline> <schema-feature type="TestCase"> <schema-attribute name="fme_geometry{0}" type="xml_no_geom"/> <schema-attribute name='description' type='xml_buffer' /> <schema-attribute name='geometry_wkt' type='xml_buffer' /> <schema-attribute name='test_type' type='xml_char(10)' /> <schema-attribute name='test_result' type='xml_char(10)' /> </schema-feature> </inline> </schema-type> <feature-map> <use-mappings/> </mapping> </feature-map> </xfMap>
The attribute types came from the vocabulary allowed by the FME_HOME/metafile/xml_common_attr_map.fmi. xml_buffer just means an arbitrary long hunk of stuff, in this case text.
So once again, we go back to Workbench to add the source dataset, pointing at the newly revised xfMap.
Now we're talking...
This time, our schema came through just right. So we slapped in a Logger right after the newly read source feature type definition, and hit the Run button.
Reading Features
Oops. There's no data being read. Why not? Well, we didn't yet specify how we are going to move data from the case XML objects into FME features.
And that's where the meat of the xfMap comes to play.
We have to build our feature-map. Now, this is a very simple situation because we really have only one kind of feature we're going to be reading (in many cases, life would not be so simple). Giving thanks, we dive in and find that there are some things that you can specify in the xfMap that do just what is needed.
One Feature Per Case
The first thing noticed is that you tell the xfMap to build yourself a feature by using the mapping match directive. So if we want to build one feature per case we bump into, we set it up as so:
<mapping match="case">
and if we want its feature type to be TestCase, we say that like this:
<feature-type> <literal expr="TestCase"/> </feature-type>
So we now have this as our xfMap:
<?xml version="1.0"?> <!DOCTYPE xfMap SYSTEM "xfMap.dtd"> <xfMap> <schema-type> <inline> <schema-feature type="TestCase"> <schema-attribute name="fme_geometry{0}" type="xml_no_geom"/> <schema-attribute name='description' type='xml_buffer' /> <schema-attribute name='geometry_wkt' type='xml_buffer' /> <schema-attribute name='test_type' type='xml_char(10)' /> <schema-attribute name='test_result' type='xml_char(10)' /> </schema-feature> </inline> </schema-type> <feature-map> <mapping match="case"> <feature-type> <literal expr="TestCase"/> </feature-type> <use-mappings/> </mapping> </feature-map> </xfMap>
(notice how we even added a comment like those cool XML guys do: )
We save it and run the workspace again, and now we are getting features out and logged. Wow!
Fishing Out Attributes
All is looking good, but our features have no attributes.
No problem. The GPX xfMap dealt with this situation, and we can set up the extraction of xml chunks into FME attributes. Recall that the individual "case" elements looked like this:
<case> <desc>L - linear-ring bowtie</desc> <a>LINEARRING(0 0, 100 100, 100 0, 0 100, 0 0)</a> <test> <op name="isValid" arg1="A">false</op> </test> </case>
Its easy enough to see that the desc and a (which hold the geometry in all these examples) will be easy to fish out -- here's the xfMap syntax for doing that:
<attribute required='false'> <name> <literal expr='description'/> </name> <value> <extract expr='./desc'/> </value> </attribute> <attribute required='false'> <name> <literal expr='geometry_wkt'/> </name> <value> <extract expr='./a'/> </value> </attribute>
These indicate that we will make FME attributes with the < name> specified from the values extracted from the values of the XML subtrees named at desc and a.
But how to extract the stuff from the test subtree -- there we'd like to pull out the op name as well as the value (which is false in the above example).
Getting the value is easy -- its the same as we did for the desc, only now we have an additional level to specify in the extract expression:
<attribute required='false'> <name> <literal expr='test_result'/> </name> <value> <extract expr='./test/op'/> </value> </attribute>
Note the test/op in the extract expr -- this grabs the false out.
All great. Last trick -- how do we get the value from the op name, which is inside of a tag specification. The documentation for xfMaps indicates that the way is to use the @ to fish out the name of an XML attribute. After some fighting with the syntax, this was found to be right:
<attribute required='false'> <name> <literal expr='test_type'/> </name> <value> <extract expr='./test/op[@name]'/> </value> </attribute>
Note that the [@ ] enclose the identifier of the tag attribute.
NB: to get values from inside the parent tag (eg case xxxx="1234") then simply use:
<value> <extract expr='@xxxx'/> </value>
Putting it all together, we end up with this as our final, working, wonderful xfMap:
<?xml version="1.0"?> <!DOCTYPE xfMap SYSTEM "xfMap.dtd"> <xfMap> <schema-type> <inline> <schema-feature type="TestCase"> <schema-attribute name="fme_geometry{0}" type="xml_no_geom"/> <schema-attribute name='description' type='xml_buffer' /> <schema-attribute name='geometry_wkt' type='xml_buffer' /> <schema-attribute name='test_type' type='xml_char(10)' /> <schema-attribute name='test_result' type='xml_char(10)' /> </schema-feature> </inline> </schema-type> <feature-map> <mapping match="case"> <feature-type> <literal expr="TestCase"/> </feature-type> <attributes> <attribute required='false'> <name> <literal expr='geometry_wkt'/> </name> <value> <extract expr='./a'/> </value> </attribute> <attribute required='false'> <name> <literal expr='description'/> </name> <value> <extract expr='./desc'/> </value> </attribute> <attribute required='false'> <name> <literal expr='test_result'/> </name> <value> <extract expr='./test/op'/> </value> </attribute> <attribute required='false'> <name> <literal expr='test_type'/> </name> <value> <extract expr='./test/op[@name]'/> </value> </attribute> </attributes> <use-mappings/> </mapping> </feature-map> </xfMap>
Wrapping it up in a Custom Format
It being the night before Christmas and all, the last step is to 'wrap all this up' in a Custom Format to make it the most convenient to use. What this nicely does is package the xfMap together with the format, so that an end user doesn't have to pick the xfMap when they try to read the format.
The details of doing this are left as an exercise to whichever readers have made it this far.
Comments
0 comments
Please sign in to leave a comment.