Data Loading, Updating and Deleting

Files

googlesheetsdataloading.fmwt
- 1 MB
- Download

Introduction

One of the easiest ways of loading data is to bulk drop and load. However, this may not be the best method.
If you have datasets that need to be updated regularly or as a result or real-time data change, are there more efficient ways of propagating that change to datasets?

If you are dropping and loading data how long is it taking to process, and are their impacts on other services or users that need that data?

FME has better ways of handling changes to data.

FME Feature Operations

Some formats in FME allow you to control the feature operation of data being written. These operations may be: INSERT, UPDATE, DELETE, or UPSERT. These can be controlled at a feature type level or on a feature-by-feature basis.

To set the operation at a feature type level, open the feature type parameters and select the desired operation.

To set the operation on a feature-by-feature basis, create a new attribute called fme_db_operation and set the value to the correct, supported operation.

Any operation that relies on existing features will need to have a column with a unique value selected, so that the writer can match the features with the existing rows.

These options can also be used in conjunction with table or worksheet handling options. These may include: Use Existing, Create if Needed, Drop and Create or Truncate Existing.

What operation or handling option should I choose?

This will depend on:

Are you concerned about the speed at which your data is written?
Are you worried about transaction speed/volume?
Is your data important?

Do you need to overwrite your data?

Feature Type Handling Options

Use Existing:

Use if you have an existing table set up and wish to insert new rows, update, or delete existing rows.

Create if Needed:

Use if you are inserting data and are not sure if the table exists already. If it does, FME will write to it. If it doesn’t, FME can create it.

Drop and Create:

Use if you need to remove any existing data and remove the table structure, possibly to add or remove columns, change data types, etc.

Truncate Existing:

Use it if you want to remove all existing data while retaining the table structure.

Real-time implications:

If you are dropping or truncating tables, this option will be slower than inserting/updating/deleting select features, as extra time is needed to remove all features and write again. With few rows of data, this time may be insignificant, but for larger datasets, this will be more noticeable.

Demonstration Using Google Sheets

You will need a Google account to run through this exercise.

An easy way to demonstrate these different functions is using Google Sheets, which will allow you to view the sheet as it’s being written.

Try it:

Download googlesheetsdataloading.fmwt

You will need to replace the Google Sheets writer so that it connects using your credentials. Inside the Google Sheets parameters, you will need to set the Refresh Token (You will be prompted to sign in to your Google account) and the Spreadsheet Name or URL (the URL of an existing Google Sheet).

The workspace should look like this:

Ensure only the Sampler is enabled, and that you can see the Google Sheet whilst the workspace is running.

You should be able to see the Google Sheet as FME writes out cells. As the Google Sheets writer writes on a cell basis, rather than feature/row (like databases), the more attributes and features to write, the longer it will take.

If you disable the sampler and send features from the PostAddress Reader Feature Type directly, you should see that it takes a lot longer to write out. You will need to set the Worksheet Handling Option to ‘Truncate Existing’.
If something caused the workspace to fail writing, you may be left with a missing or incomplete dataset. You also may find that other systems or services trying to access data stored there may produce incorrect results if the data is unavailable for a period of time during the truncate and write process.

With Google Sheets, the option to update or delete existing rows is based on row number.
Enable the Sampler and turn on random sampling. Add a Counter before the Sampler. In the Google Sheets writer feature type, set the Feature Operation to ‘fme_db_operation’ and the Row Number Attribute to the ‘_count’ attribute. You should see that the Worksheet Handling option changes to ‘Use Existing’.

After the Sampler output Sampled port add an AttributeCreator and create the fme_db_operation attribute with a value of DELETE. For the NotSampled output port, add an AttributeCreate and create the fme_db_operation attribute with a value of UPDATE. Send the outputs of both AttributeCreators to the Google Sheets writer feature type.

If you watch the Google Sheet whilst FME runs, you should see some rows disappear, and some row values change as FME writes.