Converting multiline records into single features

Files

multilineRoadsToFeatures.fmwt
- 50 KB
- Download
Roads.csv
- 10 KB
- Download

Introduction

Sometimes, a text-based data file might have values from the same record spread across multiple lines. Multiline records can be problematic when we process the data, since each line is treated as a unique feature.

In this scenario, we will use FME Workbench to convert multiline records into a single feature.

We are given a CSV file that has X/Y columns, but the file is structured like this:

X,Y
!MAJOR, AIRPORT BLVD
3125519.412,10078088.000
3125502.250,10078139.000
3125396.750,10078703.000
!MAJOR, ED BLUESTEIN BLVD
3139057.500,10088845.000
3139034.750,10088914.000
3138822.750,10089350.000
etc.

Note how the file is broken into sections that include the road category (e.g., MAJOR), road name (e.g., AIRPORT BLVD), and several coordinates representing points in a line. The problem is that, rather than having one row per data feature, we have several rows per feature. We want to clean up this data so that each row represents a single feature.

Method Comparison

There are a few ways to convert multiline records into single features, and the best method depends on the nature of the source data.

Global Variables

Global variables store information about the current feature as the workspace runs. FME can set a global variable at runtime and make it available to the entire workspace, allowing the workspace to pull information from past records as it processes a dataset.

You can set a global variable using the VariableSetter transformer and retrieve it using the VariableRetriever. This pair of transformers is useful for passing information between features while a workspace is running.

This method can be preferable if the data structure is predictable—for instance, we know that the CSV file in this example will start with a header, then the road name and category, then several lines of X/Y points, and then another road name and category, and so on. Where this method might not be a good choice is when the row order is unpredictable, and we can’t reliably tell FME which variables to store.

Adjacent Feature Attributes

On the other hand, using adjacent feature attributes can be preferable if the data is less structured. We can simply tell FME to remember the values from the previous rows it has processed. You can achieve this with the AttributeManager transformer, which includes an Enable Adjacent Feature Attributes parameter. Normally, when an FME workspace runs, each feature is processed separately—i.e. there is a stream of data and each feature has its own attributes. By setting this parameter, we can allow FME to access attributes of adjacent features in the stream.

The drawback of this method is that FME will use more memory to remember the previous features while the workspace runs, so if records span many rows, this method is not a good choice.

To see an example of using adjacent feature attributes on a structured text file, please see the blog post FME Adjacent Feature Attributes: An Example of Reading Structured Text Files.

Aggregator and “Group By”

An Aggregator transformer is useful if the rows have an ID field—for example, a location or number to specify which record a row belongs to, like this:

ID,X,Y
0,!MAJOR, AIRPORT BLVD
0,3125519.412,10078088.000
0,3125502.250,10078139.000
0,3125396.750,10078703.000
1,!MAJOR, ED BLUESTEIN BLVD
1,3139057.500,10088845.000
1,3139034.750,10088914.000
1,3138822.750,10089350.000

In this situation, you could use an Aggregator and enable Group By on the ID field. The transformer would then group rows by the ID field.

This method only works if the dataset has an ID field to group by.

Step-by-Step Instructions

In this scenario, the CSV file has an indeterminate number of rows per feature—sometimes more than a dozen—and the data is predictably structured: a header, then road information, then coordinates. We will therefore use global variables to store information about the current feature while FME processes the file.

1. Open a new FME Workspace

Download the zip file attached to this tutorial. Open FME Workbench and start with a new workspace, or open multilineRoadsToFeatures.fmwt and follow along in the finished workspace.

2. Add a CSV reader

Click Add Reader. Add a CSV reader and set the following parameters:

Format: CSV (Comma Separated Value)
Dataset: <Tutorial Download>/Roads.csv

3. Add a StringSearcher

The first step is to check whether the current row contains the road category and name values. From looking at the data, we know that the rows containing the road category and names start with “!”, so we will search for this in the current feature using a StringSearcher transformer.

Add a StringSearcher to the canvas and connect it to the CSV feature type. Open the parameters and set the following:

Search In: select the down arrow on the right, then choose Attribute Value > X
Contains Regular Expression: ^!

Click OK. Rows that start with “!” will pass through the Matched output port, and those that don’t will pass through NotMatched.

The workspace should look like this:

4. Add a Counter

When the workspace encounters a row that starts with “!”, this indicates the start of a new record. This row contains a road category and type, and the following rows contain coordinates. Therefore, when we encounter a “!” row, we will tell the workspace that this is the start of a new road record. We will do this by creating a unique Road ID attribute.

Add a Counter transformer to the canvas and connect it to the StringSearcher's Matched port. Open the parameters and set them as follows:

Advanced
- Count Scope: Global
Count: RoadID

Click OK. Now the workspace is configured to give each unique road record its own ID.

5. Add VariableSetters

Next, we will use VariableSetter transformers to create global variables. We will do this three times: one variable to store the road category, one to store the road name, and one for a new value that will be the unique road ID.

Add a VariableSetter and connect it to the Counter. Set the following parameters:

Variable Name: RoadCategory
Value: select the down arrow, then Attribute Value > X
Variable Scope: Global

Click OK.

Add another VariableSetter and connect it to the first one. Set the following parameters:

Variable Name: RoadName
Value: Attribute Value > Y
Variable Scope: Global

Finally, add and connect a third VariableSetter. Set the following parameters:

Variable Name: RoadID
Value: Attribute Value > RoadID
Variable Scope: Global

The workspace should look like this:

6. Add VariableRetrievers

Now that we have set global variables to store the current record's road category, name, and ID, we can apply these values to subsequent rows that pass through the workspace.

Add a VariableRetriever and connect it to the NotMatched port of the StringSearcher. When we retrieve the global variables, we want to do this on rows that don’t start with “!”, i.e., rows that are coordinates.

Open the VariableRetriever parameters and set them as follows:

Variable Name: RoadCategory
Variable Scope: Global
Attribute Receiving Value: RoadCategory

Click OK.

Add another VariableRetriever and connect it to the first one. Set the following parameters:

Variable Name: RoadName
Variable Scope: Global
Attribute Receiving Value: RoadName

Finally, add and connect a third VariableRetriever. Set the following parameters:

Variable Name: RoadID
Variable Scope: Global
Attribute Receiving Value: RoadID

The workspace should look like this:

7. Attach an Inspector and Run the Workspace

Attach an Inspector transformer to VariableRetriever_3. Run the workspace and view the output in the Visual Preview pane.

Now, every row in the dataset contains coordinates and attribute values, along with a unique identifier we can use to process the road data. You could also write the output to CSV to see the difference in the CSV files.

At this point, we have successfully transformed the multiline records into individual features. Every row of coordinates includes the road category and name as attribute values, rather than having those values in rows of their own.

8. Optional: Transform and Output the Data

We have now cleaned up the CSV data and turned multiline records into single features. Download the attached workspace, multilineRoadsToFeatures.fmwt, to see an example of using this cleaned-up data to generate a MapInfo TAB file. The workspace converts each feature to point geometry, transforms the points into lines based on the unique road ID, and generates an output dataset containing lines and labels for each road.

Additional Resources

Pivot Tables and FME

FME Documentation: CSV (Comma-Separated Value) Reader/Writer

FME Documentation: Text File Reader/Writer

Data Attribution

The data used in this article originates from open data made available by the City of Austin, Texas. It contains data licensed under the Public Domain Dedication License, as provided by the City of Austin.

Search

Converting multiline records into single features

Files

Introduction

Method Comparison

Global Variables

Aggregator and “Group By”

Step-by-Step Instructions

Additional Resources

Data Attribution

Was this article helpful?