Writers and Performance

Files

coreconcepts6generalworkspace.fmwt
- 1 MB
- Download

Introduction

This example is intended to show how data is dealt with by FME writers, how it affects performance, and how you can control it.

As an analogy, compare it to an airport departure lounge; like having multiple writers, an airport can have multiple flights - and since FME is not multi-threaded it means our airport only has a single departure gate. That means we must use specific techniques to avoid having the departure lounge become too overcrowded.

Step-by-step Instructions

This exercise starts with a single reader of OpenStreetMap data. The data is divided into groups and each group is assigned to a specific writer.

Currently the writers are not ordered in any particular way. The task is to assess how much data each group contains and to reorder the writers accordingly.

Note the Cloner transformers and a published parameter to control them. These transformers are there to multiply the source data features to slow the workspace and help identify differences caused by writer order. With only small amounts of data any changes in the performance might just be random fluctuations.

1. Start FME Workbench and open the workspace (or template)

Run the workspace. Inspect the log window. Make a note of the time taken to run the workspace and the maximum memory used. On my computer I get:

FME Session Duration: 29.0 seconds. (CPU: 24.1s user, 0.6s system)

END - ProcessID: 7568, peak process memory usage: 220316 kB

By default writers are executed in the order they appear in the Navigator window, so the original order is Landuse, Buildings, Highways, Environment. However, the order can be changed by dragging one writer above another in the Navigator window:

2. Assess data by feature types

Check how many feature types exist for each group of data. Reorder the writers so that the group with the most feature types is the first writer, followed by the remaining writers in order.

Run and recheck the log. On my computer the result is:

FME Session Duration: 29.6 seconds. (CPU: 23.8s user, 0.5s system)

END - ProcessID: 4468, peak process memory usage: 219976 kB

Did this reduce the time or resources on your computer? Do you think the number of feature types is a good assessment of data amounts for our purposes?

3. Assess data by features

In the workspace that you just ran, make a note of the feature counts (i.e. how many features were sent to each writer). Reorder the writers so that the group with the most features is the first writer, followed by the remaining writers in order.

Run and recheck the log. On my computer the result is:

FME Session Duration: 25.2 seconds. (CPU: 23.5s user, 0.5s system)

END - ProcessID: 4513, peak process memory usage: 219236 kB

Did this reduce the time or resources on your computer? Do you think the number of feature types is a good assessment of data amounts for our purposes?

4. Assess data by file size

Browse to the output data files and check the file size for each. Reorder the writers so that the group with the largest file size is the first writer, followed by the remaining writers in order.

Is there any difference in writer order? If so, run and recheck the log.

5. Assess full writer order

In all of the above, does the full order of writers matter, Or is it just important to get the largest dataset first? Try adjusting the file order to be largest, smallest, 3rd largest, 2nd largest.

Rerun the workspace and check the log. Is there a difference? Why do you think this is/isn't the case?

6. Check first feature setting

In the Navigator window, locate a workspace setting called Order Writers:

Notice how you can change this parameter to allow writer order to be set by the order in which features arrive (i.e. in the airport analogy, the first passenger to arrive determines which flight is boarded first).

How might you control the order of features in a workspace? Remember from what we have learned that blocking transformers could affect the order of data and so the order of writing.

Why might you use this method?

Data Attribution

The data used here originates from open data made available by OpenStreetMap and its contributors. It contains information licensed under the Open Data Commons Open Database License (ODbL) by the OpenStreetMap Foundation (OSMF).