Introduction
Databases are an important component of many datasets, and the log file will help us determine how good our database performance is, plus how well FME is interacting with the database. See the tutorial series Let the Database Do the Work for step-by-step instructions. For other performance tuning tips & tricks, see Performance Tuning FME.
Database Queries and Indexing
Here is one example of a query sent to a database:
2020-05-14 17:18:52| 476.1| 0.0|INFORM|Started SQL cache prefetch 2020-05-14 17:25:10| 476.2| 0.1|INFORM|Finished SQL cache
Note the difference in actual time on the left; you can see that the time between when we issue the SQL prefetch and until it’s done is roughly seven minutes. However, FME logs only 0.1 seconds of CPU time. From this, we can say that the remaining time was spent by the database in running the query it was given.
To improve performance, the user needs to look at how the database is structured and how the query is written. Perhaps the field being searched on isn’t indexed? Maybe the query supplied isn’t as efficient as it could be?
Tip: Check the log carefully to find out how much database-related time is spent outside of FME and see if you need to improve your database efficiency.
Database Indexing
Indexing is not only important when reading data, but it has a vast effect on writing data too.
Writing to an unindexed table is quick because the database has no overhead work to do.
Writing data to a table that is indexed takes a lot longer because - for each new record - the database has to index the data immediately.
As above, the reported CPU time doesn't change because it is the database server - and not FME - that is doing the indexing work.
Tip: Where possible, drop indexes before doing a bulk load into a table, then recreate them after the load is complete. It is often quicker than leaving the index in place during the data load.
Bulk Reads
Each query sent to a database incurs considerable expense in terms of time and performance. This overhead cannot be emphasized enough. The fewer the number of queries to a database, the faster a workspace will be. If necessary, read a larger-than-required set of data in a single query, rather than carrying out multiple, individual queries, on single features.
Truncating and Dropping Tables
Related to indexing is the difference between truncating a table and dropping it. FME has settings to do either, but when you truncate a table, the index remains, and subsequent data loading is slower. When you drop a table first, a side effect is that all indexes are also dropped; hence data can be written faster because no indexing is taking place.
Tip: Consider using the option to Drop and Create a table, rather than Truncate Existing, in order to get better performance from a bulk data load.
Chunk Sizes and Transaction Intervals
Data written to a database is held in two places. FME holds the data until it has a set amount of data (the Chunk Size). Then it passes the data to the database. The database holds the data until it has a set amount of data (the Transaction Interval). Then it commits the data.
A higher chunk size (also known as Features per Bulk Write) means that FME holds more data, but makes fewer requests to the database. Each request to the database comes at a performance cost, so is best avoided.
A higher transaction interval means that the database holds more data, but has to carry out fewer commit actions. Committing transactions is an expensive operation that it is best to avoid.
At their extreme low ends, smaller chunk sizes and transaction intervals mean you could be passing records to the database and committing them one at a time! This would be extremely costly.
But higher values can use more system resources and run the risk that a failed record means an entire set of data uploads need to be rolled back.
Tip: Chunk size and transaction intervals are parameters that can - with some experimentation - be optimized to pass and commit data in the most efficient manner.
Comments
0 comments
Please sign in to leave a comment.