Extracting a Schema Subset for Dynamic Schemas

Files

ExtractingSchemaSubset.zip
- 40 KB
- Download

Introduction

Using FME, a schema can be extracted and then a subset repurposed in other workflows. This tutorial walks you through creating a custom format and then uses Python to extract a schema list. This is only an example, and the PythonCaller can easily be replaced with list transformers. Please see Tutorial: Dynamic Workflows and Tutorial: Getting Started with List Attributes for additional ideas and information.

Step-by-step Instructions

Part 1: Create a New Schema Subset

1. Create Custom Format Framework

Open a new workspace in FME Workbench, then click on the Add Reader button. In the Add Reader dialog, click on the drop-down for Format and select More Formats.

In the Reader Gallery dialog, click on New under Custom Formats. This will open the Custom Format Wizard.

In the Custom Format Wizard, click Next to progress to the Select Format section. Set the Format to Schema (Any Format).

Next, we will specify a source dataset, select the Parks.tab file that can be downloaded from the Files section of this article. Open the Parameters, and expand Schema Attributes. Click on the ellipsis next to Additional Attributes to Expose and select the following:

attribute{}.fme_data_type
attribute{}.name
attribute{}.native_data_type
fme_basename
fme_format_long_name
fme_format_short_name

Then click OK twice, and then click Next to proceed through the wizard.

The next section of the Custom Format Wizard is for additional parameters to expose. Since we want to create a dynamic schema reader, the Input Format. Select the Input Format, then click Next.

Now we will enter the Short Name of the custom format. This is the name that will appear in the Navigator pane. For our example, type in DYNAMICSUBSET. The Description is the long name that appears in the Formats section of a Reader/Writer, for that type in the Dynamic Subset Schema.

Finally, click Finish. A new instance of FME Workbench will open with an editable version of the newly created custom format. There will be a Schema reader and a Custom Format Output writer in the Navigator pane.

2. Inspect Schema

We only want to use a subset of the schema attributes, so let’s inspect the schema to figure out which ones we want to keep. Click on the Schema reader feature type to open the mini toolbar, then click View Source Data to view the data in Visual Preview.

In Data Preview (formerly Visual Preview), note the attributes (or fields) you want to use as the schema. For this example, we’ll use ParkId, ParkName, and NeighborhoodName.

3. Create Published Parameters

Now that we know which schema attributes we would like to keep as a subset schema, let’s create a published parameter to capture those. Right-click on User Parameters in the Navigator pane and select Manage User Parameters.

In the User Parameters dialog, click on the green plus sign (+) and select Text as the parameter type.

On the right-hand side of the dialog, fill in the following parameter properties:

Parameter Identifier: to_keep
Prompt: Fields to keep (comma separated)
Published: Enabled
Required: Enabled
Disable Attribute Assignment: Disabled
Editor Syntax: Plain Text (Uniline)
Trim Whitespace: Enabled
Default Value: ParkId, ParkName, NeighborhoodName

Click OK to close the User Parameter Manager.

One final parameter to create is for the input dataset. Expand the Parks [SCHEMA] reader in the Navigator pane. Right-click on the Source Dataset, then select Create User Parameter.

In the Add/Edit User Parameter dialog, click OK to accept the defaults.

4. Fetch Parameter

Add a ParameterFetcher to the canvas and connect it between the Schema reader and writer feature types.

In the parameters, select the $(to_keep) parameter, then set the Target Attribute to _to_keep.

5. Python to Select Attributes

Next, we’ll use a PythonCaller to extract the attributes selected in the to_keep User Parameter. Add a PythonCaller to the canvas and connect it between the ParameterFetcher and the Schema writer feature type. In the parameters, copy and paste the following Python script:

import fme
import fmeobjects


class FeatureProcessor(object):
    def __init__(self):
        pass

    def input(self, feature):
        att_name_list = feature.getAttribute('attribute{}.name')
        att_ntype_list = feature.getAttribute('attribute{}.native_data_type')
        att_ftype_list = feature.getAttribute('attribute{}.fme_data_type')

        keep_values = feature.getAttribute('_to_keep')
        keep_list = keep_values.split(',')

        if att_name_list != None:
            feature.removeAttribute('attribute{}.name')
            feature.removeAttribute('attribute{}.native_data_type')
            feature.removeAttribute('attribute{}.fme_data_type')
        
            count = 0
            for i in range(len(att_name_list)):
                if (att_name_list[i] in keep_list):
                    feature.setAttribute(('attribute{'+str(count)+'}.name'),att_name_list[i])
                    feature.setAttribute(('attribute{'+str(count)+'}.native_data_type'),att_ntype_list[i])
                    feature.setAttribute(('attribute{'+str(count)+'}.fme_data_type'),att_ftype_list[i])
                    count = count + 1

                if (att_name_list[i] == 'fme_geometry{0}'):
                    feature.setAttribute('fme_geometry{0}',att_ftype_list[i])
        self.pyoutput(feature)

    def close(self):
        pass

    def process_group(self):
        pass

    def has_support_for(self, support_type):
        if support_type == fmeobjects.FME_SUPPORT_FEATURE_TABLE_SHIM:
            return False
 
        return False

Next, set the Attributes to Hide to _to_keep, and click OK.

6. Inspect Python Output

Run the workspace with Feature Caching enabled for the PythonCaller, or connect an Inspector to the PythonCaller. In Visual Preview, click on the feature in the Table View and then open the Feature Information window. You should only see the attributes that were listed in the to_keep parameter.

7. Save the Custom Format

Since we used the Custom Format Wizard, the workspace/custom format is already pointing to the correct location for FME to find it: \Documents\FME\Formats. Save the workspace, then close FME.

Part 2: Use the Custom Format

1. Make DYNAMICSUBSET Accessible For FME

Before we can use the new custom format DYNAMICSUBSET in FME, we need to close all open FME instances, including FME Data Inspector.

If you are adding a custom format that was not created on your computer, navigate to \Documents\FME\Formats, then move the .fds file to this location.

2. Use DYNAMICSUBSET

Open FME Workbench, and start a new workspace. Click the Add Reader button, then, for Format, use the Description entered when creating the custom format; for this example, it is Dynamic Subset Schema. Then browse to a dataset.

If you inspect the DYNAMICSUBSET schema reader, only the attributes selected will be available. You can now use this schema in your workflow