Converting multiline records into single features

Liz Sanderson
Liz Sanderson
  • Updated

FME Version

Introduction

Sometimes, a text-based data file might have values from the same record spread across multiple lines. Multiline records can be a problem when we are processing the data and each line is treated as a unique feature.

In this scenario, we will use FME Workbench to convert multiline records into a single feature.
We are given a CSV file that has X/Y columns, but the file is structured like this:
 

X,Y
!MAJOR, AIRPORT BLVD
3125519.412,10078088.000
3125502.250,10078139.000
3125396.750,10078703.000
!MAJOR, ED BLUESTEIN BLVD
3139057.500,10088845.000
3139034.750,10088914.000
3138822.750,10089350.000
etc.


Note how the file is broken into sections that include the road category (e.g. MAJOR), road name (e.g. AIRPORT BLVD), and several coordinates representing points in a line. The problem is that rather than having one row representing one data feature, we have several rows representing one data feature. We want to clean up this data so that one row represents one feature.
 

Method Comparison

There are a few ways to convert multiline records into single features, and the best method depends on the nature of the source data.


Global Variables

Global variables store information about the current feature as the workspace runs. FME can set a global variable during runtime and make it available to the entire workspace, which lets the workspace pull information from past records as it processes a dataset.

You can set a global variable using the VariableSetter transformer and retrieve it using the VariableRetriever. This pair of transformers is useful for passing information between features while a workspace is running.

This method can be preferable if the structure of the data is predictable—for instance, we know that the CSV file in this example will start with a header, then the road name and category, then several lines of X/Y points, and then another road name and category, and so on. Where this method might not be a good choice is if the order of rows is unpredictable and we can’t reliably tell FME which variables to store.


Adjacent Feature Attributes

On the other hand, using adjacent feature attributes can be preferable if the data is less structured. We can simply tell FME to remember the values from the previous rows it has processed. You can achieve this with the AttributeManager transformer, where there is an Enable Adjacent Feature Attributes parameter. Normally, when an FME workspace runs, each feature is processed separately—i.e. there is a stream of data and each feature has its own attributes. By setting this parameter, we can let FME access attributes in adjacent features in the stream.

The drawback of this method is that FME will use more memory to remember the previous features while the workspace runs, so if records span many rows, this method is not a good choice.

To see an example of using adjacent feature attributes on a structured text file, please see the blog post FME Adjacent Feature Attributes: An Example of Reading Structured Text Files.


Aggregator and “Group By”

An Aggregator transformer is useful if the rows have an ID field—for example, a location or number to specify which record a row belongs to, like this:
 

ID,X,Y
0,!MAJOR, AIRPORT BLVD
0,3125519.412,10078088.000
0,3125502.250,10078139.000
0,3125396.750,10078703.000
1,!MAJOR, ED BLUESTEIN BLVD
1,3139057.500,10088845.000
1,3139034.750,10088914.000
1,3138822.750,10089350.000


In this situation, you could use an Aggregator and enable Group By on the ID field. The transformer would then group rows by the ID field.

This method only works if the dataset has an ID field to group by.
 

Step-by-Step Instructions

In this scenario, the CSV file has an indeterminate number of rows per feature—sometimes more than a dozen—and the data is predictably structured, starting with a header, then road information, and then coordinates. We will therefore use global variables to store information about the current feature while FME processes the file.

1. Open a new FME Workspace
Download the zip file attached to this tutorial. Open FME Workbench and start with a new workspace, or open multilineRoadsToFeatures.fmwt and follow along in the finished workspace.

2. Add a CSV reader
Click Add Reader. Add a CSV reader and set the following parameters:

  • Format: CSV (Comma Separated Value)
  • Dataset: <Tutorial Download>/Roads.csv

01.png
3. Add a StringSearcher
The first step is to check whether the current row is one that contains road category and name values. From looking at the data, we know that the rows containing the road category and names start with “!”, so we will search for this in the current feature using a StringSearcher transformer.

Add a StringSearcher to the canvas and connect it to the CSV feature type. Open the parameters and set the following:

  • Search In: select the down arrow on the right, then choose Attribute Value > X
  • Contains Regular Expression: ^!

02.png
Click OK. Rows that start with “!” will pass through the Matched output port, and those that don’t will pass through NotMatched.

The workspace should look like this:

02b.png

4. Add a Counter
When the workspace encounters a row that starts with “!”, this indicates the start of a new record. This row contains a road category and type, and the following rows contain coordinates. Therefore, when we encounter a “!” row, we will tell the workspace that this is the start of a new road record. We will do this by creating a unique Road ID attribute.

Add a Counter transformer to the canvas and connect it to the Matched port of the StringSearcher. Open the parameters and set them as follows:

  • Advanced > Count Scope: Global
  • Count: RoadID

03.png

Click OK. Now the workspace is configured to give each unique road record its own ID.

03b.png

5. Add VariableSetters
Next, we will use VariableSetter transformers to create global variables. We will do this three times: one variable to store the road category, one to store the road name, and one for a new value that will be the unique road ID.

Add a VariableSetter and connect it to the Counter. Set the following parameters:

  • Variable Name: RoadCategory
  • Value: select the down arrow, then Attribute Value > X
  • Variable Scope: Global


04.png


Click OK.

Add another VariableSetter and connect it to the first one. Set the following parameters:

  • Variable Name: RoadName
  • Value: Attribute Value > Y
  • Variable Scope: Global


05.png

Finally, add and connect a third VariableSetter. Set the following parameters:

  • Variable Name: RoadID
  • Value: Attribute Value > RoadID
  • Variable Scope: Global


06.png

The workspace should look like this:

07.png

6. Add VariableRetrievers
Now that we have set global variables to store the road category, name, and ID of the current record, we can apply these values to subsequent rows that pass through the workspace.

Add a VariableRetriever and connect it to the NotMatched port of the StringSearcher. When we retrieve the global variables, we want to do this on rows that don’t start with “!”, i.e. rows that are coordinates.

Open the VariableRetriever parameters and set them as follows:

  • Variable Name: RoadCategory
  • Variable Scope: Global
  • Attribute Receiving Value: RoadCategory


08.png

Click OK.

Add another VariableRetriever and connect it to the first one. Set the following parameters:

  • Variable Name: RoadName
  • Variable Scope: Global
  • Attribute Receiving Value: RoadName


09.png

Finally, add and connect a third VariableRetriever. Set the following parameters:

  • Variable Name: RoadID
  • Variable Scope: Global
  • Attribute Receiving Value: RoadID


10.png

The workspace should look like this:

11.png

7. Attach an Inspector and run the workspace
Attach an Inspector transformer to VariableRetriever_3. Run the workspace and view the output in the Visual Preview pane.

12.png

Now every row in the dataset has coordinates and attribute values, along with a unique identifier that we can use to process the road data. You could also write the output to CSV to see the difference in the CSV files.

At this point, we have successfully transformed the multiline records into individual features. Every row of coordinates has the road category and name appended to it as attribute values, rather than having those values be in rows of their own.

8. Optional: transform and output the data
We have now cleaned up the CSV data and turned multiline records into single features. Download the attached workspace, multilineRoadsToFeatures.fmwt, to see an example of using this cleaned-up data to generate a MapInfo TAB file. The workspace turns each feature into a point geometry, transforms the points into lines based on the unique road ID, and generates an output dataset that has lines and labels for each road.

13.png
 

Additional Resources

Pivot Tables and FME

FME Training Manual: Adjacent Feature Attributes example

FME Documentation: CSV (Comma-Separated Value) Reader/Writer

FME Documentation: Text File Reader/Writer

 

Data Attribution

The data used here originates from data made available by the Government of Austin, Texas. It contains information available to the public domain.

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.