Using the XQueryExtractor Transformer to Extract XML Text Using XQuery Expressions

Liz Sanderson
Liz Sanderson
  • Updated

FME Version

Introduction

This transformer extracts portions of XML text using XQuery expressions. The 'XQuery Input' parameter identifies the type of the XQuery to be executed. The possible values each correspond to ways of specifying the XQuery. It can either be an attribute on a feature ("Attribute specifying an XQuery"), a path to a file ("XQuery file"), or directly specified ("XQuery expression").

While the XML document is optional, the 'XML Input' parameter identifies either an attribute which contains an XML document, or specifies a file that contains an XML document. This parameter is optional, since an XQuery can directly refer to a filepath itself. If this parameter is set, the context document for the query will be set to the value of the parameter (either as a file or a string as appropriate).

The 'Write XML Header' parameter specifies whether or not the XML header should be written into the results of the XQuery or not. Note that for unicode files, the BOM is not written, and should be added by an additional process if desired.

The 'Result Attribute' parameter determines which attribute the XQuery results will be written to. If the 'Return Value' parameter is set to "Separated Values" then the results will be written out as a delimited string, with the separator character determined by the value set for 'Separator Character(s)'. Setting the 'Return Value' parameter to "Single value" results in a simple concatenation of the results together.

FME Factory Used: XQueryFactory
 

Example

The attached workspace shows an example use of the XQueryExtractor transformer.

This example demonstrates an extremely simple use for this transformer in extracting information from a random GML file.

 

Scenario

As an FME user I have been given an XML dataset which represents traffic events in the City of Austin. It has this structure:

 <fme:Entry gml:id="id799feed0-ab58-429e-8bca-0e7ec9d4aaed"><fme:Title>Accident on E 45TH ST at CASWELL AVE</fme:Title><fme:Content>CRASH</fme:Content><gml:pointProperty><gml:Point srsName="EPSG:4326" srsDimension="2"><gml:pos>30.30712 -97.725121</gml:pos></gml:Point></gml:pointProperty></fme:Entry>


Without a schema document to go with it, I would need to use the Textfile reader and StringSearcher transformers to extract information - which would involve juggling with global variables and feature order to make sure I am attaching the right attributes to the right geometry.

So I decide to use XQuery.
 

Technique

The key information I want from this is the event title and the event location. So firstly I'll deal with location by placing an XQueryExtractor transformer and pumping a null feature in to trigger it.

From the XML I see that I want the tag: <gml:pos> and that the "gml" namespace is defined by xmlns:gml="http://www.opengis.net/gml" so I set up the XQuery transformer to read the XML file and retrieve that information using this simple script:

 declare namespace gml="<a target="_blank" href="http://www.opengis.net/gml">http://www.opengis.net/gml</a>";//gml:pos

It doesn't get much easier than that. This returns me an attribute with a comma separated list of event coordinates.

I assume I could find the event title in the same script, but we're doing simple here, so I just place another XQueryExtractor and use it to find the tag <fme:Title> in the namespace xmlns:fme="http://www.safe.com/gml/fme"

 declare namespace fme="<a target="_blank" href="http://www.safe.com/gml/fme">http://www.safe.com/gml/fme</a>";//fme:Title

Incidentally, by using two slashes before fme:title I am saying, I don't care what level of XML hierarchy this is on, just find all fme:Titles and return them.

And that's all I need to do to get the info into Workbench. Now I can start to process that data.

 

Workspace

  • StringReplacer: Strips out all XML tags from the returned information
  • AttributeSplitters: Split the two results up into an FME list attribute
  • ListExploder: Explode the first list to create one feature per record
  • ListIndexer: Retrieve the correct information from the second list for each feature
  • AttributeSplitter: Split the coordinate attribute in separate X/Y attributes
  • 2DPointReplacer: Turn the X/Y attributes into a real spatial feature
  • AttributeKeeper: Clean up unwanted attributes before outputting to the Visualizer

 

Summary

It took me about an hour (maybe 90 minutes) to create this workspace. This is hardly a complex example or one which uses anywhere near the full capacity of XQuery. However, it does match the most likely scenario for an FME user: you can get the information out of a random XML/GML file, with minimum fuss, and start processing the data using your existing FME skills.

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.