How to Expose Feature Attributes from KML Tags or HTML Tables

Files

ExposeAttributeFeatures.zip
- 20 KB
- Download

Introduction

FME can read in Google Earth KMLs, but it requires a few additional steps to expose the attributes contained within the description balloons displayed in Google Earth. This tutorial will demonstrate two different methods for exposing those attributes, depending on how your data is structured.

Source Data

Viewing the source data in Google Earth, you can see that the data is contained within a table for each point feature.

Then, if we open the data in a text editor, we can see that the attribute table data is embedded within a <description> tag, and the entire file is written in HTML.

Step-by-step Instructions

Part 1: Convert to XHTML

There are two methods you can try to expose your attributes. Both methods involve converting the HTML contained within the KML balloon to XHTML.

1. Add an OGC/Google KML Reader

Open FME Workbench and start a blank workspace. Add an OGC/Google KML reader to the canvas and browse to the doc.kml dataset that can be downloaded from the Files section of this article. Open the reader parameters.

In the reader parameters, expand the Schema Attributes section, then click the ellipsis next to Additional Attribute to Expose. The list of attributes is long; type in kml_description into the Filter. Select kml_description, then click OK three times to add the reader.

The kml_description contains all of the feature attributes, which we will parse out.

In the Select Feature Type dialog, deselect all feature types except Placemark, and then click OK.

2. Convert HTML to XHTML

Next, add an HTMLToXHTMLConverter transformer to the canvas and connect it to the Placemark reader feature type. This transformer ensures that the tag contains valid XHTML and the elements are properly nested. In the parameters, set the Attributes with HTML Text to kml_description, then change the Input Encoding to Unicode 8-bit (utf-8). Next, set the Output Attribute to kml_description so that it will overwrite our input.

Part 2: Extract Attribute Information

Once the attribute table has been converted to XHTML, you can continue processing the XHTML to extract attribute information using one of the two methods listed below.

Method 1: XQuery

1. Use an XQuery Expression

Add an XQueryExtractor to the canvas and connect it to the HTMLToXHTMLConverter passed output port. In the parameters, set the XQuery Expression to:

declare default element namespace "http://www.w3.org/1999/xhtml";
for $x in /html/body/table/tr/td/table/tr
return fme:set-attribute($x/td[1]/text(),$x/td[2]/text())

Next, set the XML Input to Attribute Specifying XML, then select kml_description for XML Attribute. Then, finally, expand the Expose Attributes section and type in CH_LINING as an Attribute to Expose. To learn more about XQuery, please see the w3schools tutorial.

If the above XQuery Expression does not work for you, you can try:

declare default element namespace "http://www.w3.org/1999/xhtml";
for $x in //tr where (exists($x/td[1]) and compare($x/td[2]/text(),"&lt;Null&gt;"))
return fme:set-attribute($x/td[1]/text(),$x/td[2]/text())

This was suggested by user Marcp with the following explanation:

“The //tr extracts rows, regardless of what comes before, which is very handy so that you don't really need to figure out the structure. The where(exists()) part is necessary for unknown reasons. Anything I tried without a WHERE clause returned no results. The compare clause removed NULL parameters, which were formatted as <> in my data.”

2. Run Workspace and Inspect Output

Either run the workspace with feature caching enabled or add an Inspector to the QueryResults output port on XMLXQueryExtractor, then run the workspace. View the output data in Data Preview (formerly Visual Preview).

The CH_LINING attribute shows that an attribute from the table was properly exposed, but if you click on a single feature and open the Feature Information window, you will see all of the available attributes.

If you would like to use your attributes later in the workspace or write them out, you will need to expose them by either adding them to the Attributes to Expose field in the XMLXQueryExtractor or by using an AttributeExposer transformer.

Method 2: XML

Instead of using the XQueryExtractor transformer, it might be easier to use an XMLFragmenter based on your knowledge of XQuery or how your data is structured.

1. Create a Feature ID

First, we will need to create a Feature Id which we will use to aggregate features at the end. Add a Counter to the canvas and connect it to the HTMLToXHTMLCoverter passed output port. We can accept the defaults.

2. Fragment XML

Next, add an XMLFragmenter to the canvas and connect it to the Counter. In the parameter, set the XML Source Type to Attribute with XML Document, then select kml_description as the XML Attribute. For Elements to Match type in the following: html/body/table/tr

Modify this structure if your document is structured differently.

Now set Merge Attributes from Input Feature to Yes, then click on the Flattening Options button, then enable Flattening.

3. Expose Attributes

Now add an AttributeExposer to the canvas and expose the following attribute: td.table.tr{}.td{}

This attribute can also be exposed in the XML Fragmenter.

4. Explode List

With the td.table.tr{}.td{} attribute exposed, we need to explode the list into its individual list parts. Add a ListExploder to the canvas and connect it to the AttributeExposer. In the parameters, set the List Attribute to td.table.tr{}

5. Create an Attribute with the List{0} Item

If we inspect the data, we will notice that every feature has a different td{0} value, which is the table headers (attribute titles). Let’s create attributes with these values.

Add an AttributeCreator to the canvas and create a new attribute called @Value(td{0}) with the Attribute Value of td{1}

Be sure to type'@Value' before the' td{0} ' when creating the new attribute, as this will pull in the value of each list item instead of the list itself.

New Attribute: @Value(td{0})
Attribute Value: td{1}

6. Aggregate Features

Now we have the Attribute Name and the corresponding value, we can now build the feature again. Add an Aggregator to the canvas and connect it to the AttributeCreator. In the parameters, enable Group Processing, then select _count as the Group By. Then set the Accumulation Mode to Merge Incoming Attributes.

7. Expose Additional Attributes

From here you should have the feature with its' corresponding attributes and values. Add another AttributeExposer and attach it to the Aggregator. Expose the CH_LINING attribute.

8. Run Workspace and Inspect Output

Either run the workspace with feature caching enabled or add an Inspector to the AttributeExposer_2, then run the workspace. View the output data in Visual Preview.

Additional Resources

How to Use XQuery Expressions to Query XML Data Within FME
Using the XQueryExtractor Transformer to Extract XML Text Using XQuery Expressions
Tutorial: Getting Started with XML