Introduction
While authoring and designing your workspace there are various tools and patterns that help improve performance. For other performance tuning tips & tricks, see Performance Tuning FME.
Workspace Authoring
There are a few tools available in FME to help a workspace run faster, while it is being authored and debugged, thus speeding up the authoring process.
Feature Caching
Best Practice for authoring (and debugging) a workspace is to make small changes and run the workspace often to test the results. This way it's easier to pinpoint a problem when it occurs, rather than having a large number of untested transformers to check.
Frequently running a workspace can be a time-consuming process, but this is mitigated by Feature Caching.
Feature Caching stores the results of a workspace, at every step of the translation. Having this data at hand helps you to inspect the results at each stage. However, it also means that running the translation does not mean having to run the entire workspace. Instead, cached results can be used in place of running prior parts of the workspace.
This technique saves time in re-running a workspace, although it does consume disk space and resources when writing the cache. Additionally, it can give a false impression of performance when a workspace is put into production without caching.
Feature Caches are not stored within the workspace, it is instead a setting in FME Workbench. If you are using FME Workbench to run your workspaces once they are authored, turning off Feature Caching will improve the performance. Performance tools, such as Parallel Processing also won’t work if Feature Caching is enabled. For optimal performance, it is recommended to run the workspace in either the FME Quick Translator or via the Command-Line. Feature Caches are not published to FME Flow.
Feature Counts
Feature Counts are the numbers that appear on connections when you run a workspace in FME Workbench. The real-time animation of these counts often gives an indication of which transformers are blocking your data or slowing the workspace.
A combination of Feature Caching and Feature Counts both show the feature counts and caches the data at each link on the workspace, which can be particularly useful. In newer versions of FME Feature Counts can be turned off, which might improve performance.
Workspace Design Patterns
It's important to design your workspace for maximum performance as you create it. There are various patterns of design that promote better performance.
Attribute Cleaning
During a translation, FME holds your data either in memory (physical or virtual) or cached to a disk. Obviously, performance is improved by reducing the amount of unnecessary data being held, both features and the components of those features.
One particular aspect is an excess of attributes. Often not all attributes on the source data are required for processing or output. In this scenario, it's better to avoid reading those attributes or to use a transformer to remove those attributes as soon as possible in your translation.
Design Pattern:
- Read Data
- Remove Excess Attributes
- Process Data
In other words, only carry through the translation any geometry and attributes you intend to be available on the output. An AttributeManager, AttributeRemover, AttributeKeeper, or GeometryRemover transformer can be used to remove excess components and should be used as early in the translation as possible.
Note that the AttributeKeeper transformer has the option to Create Bulk Features, which will significantly speed up downstream processing. This feature is highly recommended to be used when there are more than two transformers following the AttributeKeeper.
In particular, lists (an attribute with multiple values) or geometry stored as attribute values, can take up resources, because they tend to carry larger amounts of data. For example, a feature joined to 1,000 records, storing those records as a list, is now the equivalent of 1,000 features!
Another method of reducing attributes is to not read them at all. See the section on databases (below) for more information.
Data Filtering
Similarly to excess attributes, excess features use valuable system resources and should be removed as early in the translation as possible.
Design Pattern:
- Filter Data
- Process Data
In other words, filter unwanted data so that it does not incur unnecessary processing.
In this example, the author measures the area of features and then filters the data:
This is entirely the wrong way around. The workspace wastes time measuring features that are later discarded (Tester:Failed).
Reducing Duplication
A common bad design pattern (sometimes called an Anti-Pattern) is a chain of duplicate transformers. This is often not as efficient as processing data entirely in a single transformer.
For example, chaining together ExpressionEvaluator transformers in this way, where each carries out a different step of one overall calculation, is not the most efficient way of processing data. It would be far better to condense the actions into a single AttributeManager transformer.
Tip: A chain of duplicate transformers is a sign of an anti-pattern, and you should investigate whether there is a better way to produce your goal.
Memory Reduction Strategies
The FME engine passes features through in a mixture of ways in order to maximize performance.
Some transformers in FME Workbench operate on one feature at a time. These are known as feature-based transformers. They can operate on one feature at a time because the process they carry out does not need different features to interact.
Other transformers work on groups of features. These are known as Group-Based Transformers. Group-based transformers are the ones that process multiple features simultaneously; for example, intersecting many line features to produce a topological network.
Creating bulk features ahead of a group-based transformer with an AttributeKeeper will improve performance because while the data is waiting to be processed, it will occupy less memory.
Obviously, any transformer that works on a group of features must hold them all in memory at a single time to do so, incurring processing costs. This is known as Feature Holding, and should ideally be considered when designing an overall FME strategy.
Tip: The FME transformers documentation contains information about whether a transformer is Group or Feature-Based, and whether it is a Feature Holding transformer.
Fortunately, many group-based transformers can reduce their memory footprint under certain conditions by holding onto fewer amounts of data. However, it is up to the workspace author to set a parameter to confirm that these conditions are being met.
For example, the DuplicateFilter transformer separates features containing a duplicate attribute key. The transformer can perform more efficiently if it knows that the data is already sorted into key order, but it is up to the workspace author to set the Input is Ordered parameter to confirm that this is the case.
Tip: Many feature-holding transformers have parameters that can improve performance under the right conditions. Group-By Mode is one such parameter and is common to many transformers. Clippers First is a parameter that only applies to one transformer (the Clipper). Inspect the documentation closely to look for ways to reduce the load on group-based transformers.
Writer Order
Multiple writers in a workspace have a set order in which they are executed. The first writer in the list is the first to write its data, while subsequent writers cache data until it is their turn.
Performance is hindered by caching data. Therefore it makes sense to promote the writer handling the most data to the top of the list so that its data is not cached.
Writers can be ordered by dragging them up and down the list in the Navigator window.
Additionally, the advanced Workspace Parameter Order Writers By, allows you to set the order that which writers are executed (either the order in the Navigator or the order in which features arrive).
Tip: When you have multiple writers in a workspace, always ensure the one getting the larger amount of data is the first writer in the list. It is also worth Creating Bulk Mode features with the AttributeKeeper prior to any writer except for the first writer to reduce the storage footprint.
Tip: For peak performance, tiles output from the RasterTiler or WebMapTiler should be written in the order they are output from these transformers.
Database Design Patterns
Reading from and writing to databases provide unique opportunities for performance tuning. In general, processing can be quicker in the database itself, rather than the FME Engine. See the tutorial series Let the Database Do the Work for step-by-step instructions.
When reading data from a database, the ideal pattern is to filter the data as it is read, rather than in FME:
Design Pattern:
- Filter Data
- Read Data
- Process Data
Or...
Design Pattern:
- Read Data (with a Filter)
- Process Data
A "filter" can be applied by using the SQL Statements and SQL WHERE clauses available in most FME database readers.
Another way to "filter" data is to not read unnecessary attributes from the database table. These can be turned off in FME feature types, by unchecking the attribute in the User Attributes tab:
When joining data, SQL Joins can be quicker than an FME FeatureJoiner transformer. Create a database materialized view for even better performance and to simplify your workspace (although sometimes Database Administrators won't allow you to do this).
For ArcSDE, SQL Statements are only supported for non-spatial tables. For spatial tables use: sdetable -o create_view to create a view that contains a spatial column in the join.
You can create complex table joins using a combination of sdetable -o create_view followed by SQL ALTER VIEW.
Tip: Performance improves in some cases by handing off processing to a database.
Maximize Use of System Resources
It may seem strange to suggest using as many resources as possible, but this is acceptable as long as they are doing useful work. For example, if your computer has an 8-core CPU, it makes sense to split your work into eight parts, so that each core can do its share. Restricting the process to a single core is not making the best use of system resources.
For FME Flow, the recommendation is to have one engine per core, although this can vary depending on the type and size of data being processed.
In terms of memory, it can make sense to read the entire contents of a database table in one query - regardless of whether every feature is required or not - in order to have the required records immediately available to FME. Reading individual records on demand, though it incurs lower network traffic, may overall be a slower strategy.
Partitioning and Load Balancing
An extremely large amount of data can be difficult to process in a single task. You may want to consider partitioning the data into groups (maybe on the basis of a geographic region) and processing each separately.
For example, you could divide the entirety of a Canada-wide dataset into a separate group per province, using a Where clause to select the required data. This way, the data is divided into ten different groups.
At this point, either each group is processed consecutively (so that approximately only 10% of the data is processed at any one time by the sole FME engine) or each group is processed concurrently over a number of engines.
Miscellaneous Design Tips
Tip: For maximum performance, in an arithmetic editor, use the functions @add(), @multi(), and @div() instead of using the equivalent operators (+ * and /). These functions work at a lower - much faster - level of processing. Plus, they have the added bonus of handling nulls better.
Comments
0 comments
Please sign in to leave a comment.