Performance Tuning FME: Database Performance Tuning

Introduction

Databases are an important component of many datasets, and the log file will help us determine how well our database performance is and how well FME interacts with the database. See the Let the Database Do the Work tutorial series for step-by-step instructions. For other performance tuning tips & tricks, see Performance Tuning FME.

Database Queries and Indexing

Here is one example of a query sent to a database:

2020-05-14 17:18:52| 476.1| 0.0|INFORM|Started SQL cache prefetch
2020-05-14 17:25:10| 476.2| 0.1|INFORM|Finished SQL cache

Note the difference in actual time on the left; you can see that the time between issuing the SQL prefetch and its completion is roughly 7 minutes. However, FME logs only 0.1 seconds of CPU time. From this, we can say that the remaining time was spent by the database in running the query it was given.

To improve performance, the user needs to review the database structure and the query. Perhaps the field being searched on isn’t indexed? Maybe the query supplied isn’t as efficient as it could be?

Check the log carefully to find out how much database-related time is spent outside of FME and see if you need to improve your database efficiency.

Database Indexing

Indexing is important not only for reading data but also for writing data. Writing to an unindexed table is quick because the database has no overhead work to do. Writing data to an indexed table takes much longer because, for each new record, the database must immediately index it.

As above, the reported CPU time doesn't change because it is the database server - and not FME - that is doing the indexing work.

Where possible, drop indexes before doing a bulk load into a table, then recreate them after the load is complete. It is often quicker than leaving the index in place during the data load.

Bulk Reads

Each query sent to a database incurs significant time and performance costs. This overhead cannot be emphasized enough. The fewer queries to a database, the faster a workspace will be. If necessary, read a larger-than-required set of data in a single query, rather than carrying out multiple, individual queries on single features.

Truncating and Dropping Tables

Related to indexing is the difference between truncating a table and dropping it. FME has settings to do either, but when you truncate a table, the index remains, and subsequent data loading is slower. When you drop a table, all indexes are also dropped; hence, data can be written faster because no indexing is taking place.

Tip: Consider using the Drop and Create table option rather than Truncate Existing to improve performance during a bulk data load.

Chunk Sizes and Transaction Intervals

Data written to a database is held in two places. FME holds the data until it reaches a set amount (the Chunk Size). Then it passes the data to the database. The database holds data until it reaches a set amount (the Transaction Interval). Then it commits the data.

A higher chunk size (also known as Features per Bulk Write) means that FME holds more data, but makes fewer requests to the database. Each database request incurs a performance cost, so it is best avoided.

A higher transaction interval means the database holds more data but performs fewer commit operations. Committing transactions is an expensive operation that it is best to avoid.

At their extreme low ends, smaller chunk sizes and transaction intervals mean you could be passing records to the database and committing them one at a time! This would be extremely costly.

But higher values can use more system resources and run the risk that a failed record means an entire set of data

Chunk size and transaction intervals are parameters that can - with some experimentation - be optimized to pass and commit data in the most efficient manner.

Search