FME Version
Introduction
This tutorial illustrates how to quickly translate XML data into a series of Microsoft Excel spreadsheets by first converting the XML return of an RSS feed into distinct FME features. The source XML is an RSS feed from DriveBC, which provides up-to-date information about all of the major and non-major events occurring on the highways of British Columbia, Canada.
Step-by-step Instructions
These instructions are specific to the DriveBC RSS feed but can be readily adjusted to accommodate any input XML dataset(s).
1. Read the XML RSS Feed Into the Workspace
Open FME Workbench, and start a new workspace. Type “XML (Extensible Markup Language)” into the blank canvas, and double-click on the reader format that pops up. In the Add Reader dialog that opens, type or paste the following DriveBC RSS feed URL into the Dataset parameter input box:
http://www.drivebc.ca/api/events/district/mainland?format=rss
This URL accesses the DriveBC RSS feed for the Lower Mainland region of British Columbia. If you are interested in accessing other DriveBC RSS feeds, simply change the URL to one of the other available DriveBC RSS feeds.
Now select Parameters to open the reader parameters dialog. When you add an XML reader to FME Workbench, you need to tell the reader how to interpret the XML elements (tags) as individual features with attributes. This is done using the Elements to Match reader parameter.
In the Elements to Match input box, type or paste in the following:
rss/channel/item
This element path tells the reader to interpret each <item> element as a new feature, and all of the elements between each <item> … </item> pair will be flattened into attributes. In other words, each <item> becomes a row in your Excel file, and the elements within each <item> …</item> pair become the Excel columns.
The screenshot below shows one <item> … </item> pair, followed by the next <item> … </item> pair of a DriveBC RSS feed return. These elements will become two features (rows) in the Excel output. The attributes of each feature will consist of the names and values of the 12 elements between each <item> … </item> pair.
Make sure the workspace has feature caching enabled by selecting the drop-down arrow to the right of the Run option. Now click on the XML reader, and select the Run Just This icon that pops up.
Once the translation runs successfully, click on the green magnifying glass at the right of the reader to view the results in the Visual Preview window. Make sure the Table View is toggled on to view how the XML data was translated into individual features.
2. Rename and Remove Attributes
Add an AttributeManager transformer to the canvas, and connect it to the output port of the XML reader. The AttributeManager will be used to remove extraneous attributes and rename the remaining attributes to more descriptive names. Click on the gear icon of the AttributeManager to open its parameters dialog.
Set the following attributes’ Action values to Remove by clicking the Action value and then selecting Remove from the drop-down list:
- guid
- title
- Xml_fragment
- Xml_id
- Xml_matched_element
- Xml_parent_child_pos
- xml_parent_id
Next, rename the following attributes as follows, by selecting the Output Attribute cell adjacent to the Input Attribute name and typing the new attribute name:
- category → EventCategory
- description → Description
- district → District
- link →Description.hyperlink
- pubDate → PublicationDate
- route → Route
- severity → Severity
- type → Type
- lat → Latitude
- long → Longitude
One attribute adjustment to take note of is the new Description.hyperlink attribute. This attribute, formerly the link attribute, has a URL value that points to a detailed description of each highway event returned by the RSS feed. In order for the Microsoft Excel writer to write this URL value into a spreadsheet as an active hyperlink, the .hyperlink suffix is added to this attribute’s new name. More information about formatting data for the Microsoft Excel writer can be found here.
Once all of the attribute adjustments are complete, click OK on the AttributeManager Parameters dialog. Now click on the Run to This option from the icons above the AttributeManager in the canvas.
Once the translation succeeds, click on the green magnifying glass at the right of the AttributeManager transformer, and examine the results in the Visual Preview window with the Table View toggled on. Notice how the attribute names now differ from those in the output of the reader and are much more descriptive of the values they contain.
3. Write the XML Data to Individual Excel Spreadsheets by an XML Element
The final object needed in this workflow is a Microsoft Excel writer. Type “Microsoft Excel” into the FME Workbench canvas, and double-click the Microsoft Excel writer option that appears. In the Add Writer dialog that opens, specify where you would like to save your Excel output in the Dataset parameter input box, and then set the Sheet Definition parameter to Automatic. This Automatic value for Sheet Definition tells the writer to define written attributes based on all of the attribute manipulations made throughout the translation. In this case, the manipulations made by the AttributeManager will be captured in the written Excel document. If the Sheet Definition is set to any of the other options, the attribute manipulations done throughout this translation will not be applied to the written output.
Click OK on the Add Writer dialog. A FeatureType dialog will open next. For now, click OK to accept the default settings. You’ll notice a new object will appear on your canvas called Sheet1. This is the Excel spreadsheet that will be created by the Excel writer. Connect the writer to the output port of the AttributeManager, and then select the writer’s gear icon to open its parameters. In the dialog that opens, select the User Attributes tab.
Notice how the attributes listed here reflect the attribute manipulations performed by the AttributeManger. If further adjustments to attribute definitions are needed, the Attribute Definition parameter just above the definition table can be changed from Automatic to Manual. Switching to Manual will change the attributes from being grayed out to active, and you can then make finer adjustments to each attribute definition as needed.
Now select the Parameters tab, to the left of User Attributes. Right now, the Sheet Name parameter is set to an arbitrary value. However, if you adjust this value to an attribute of the input dataset, then separate Excel sheets will be written for each value of this attribute.
Select the drop-down arrow next to the Sheet Name input box, then select Attribute Value, and finally click on EventCategory. Your Sheet Name value should now look like the following:
The written Excel document will contain all of the returned DriveBC highway events sorted into separate Microsoft Excel sheets by the category of the event.
Select the writer on the canvas, and click the Run to This icon that pops up. In the dialog that opens, verify the Destination Microsoft Excel File location and file name, and select Run. Once the translation runs successfully, select the writer on the canvas and click the View Written Data icon that pops up. The written dataset will appear in Visual Preview. Ensure Table View is toggled on to view the written Excel spreadsheets.
Visual Preview now shows two separate tables for the written Excel dataset. One table contains the DriveBC highway events categorized as Construction, while the second table contains those events categorized as Incidents. These two tables appear as two separate spreadsheets in the output .xlsx file.
Comments
0 comments
Please sign in to leave a comment.