Files
About this Article Series
In this two-part series, learn to build an agentic schema mapping workflow in FME. You will create a workflow that automatically maps datasets with varying structures into a standardized schema for easier ingestion.
Create an agentic workspace that uses AI to generate a field-matching lookup table between source and destination schemas.
Use the generated mapping to transform your dataset and write it to your desired output format.
Requirements
- FME Workbench 2025.0 or later.
- Access to an OpenAI API Key OR another AI Service such as Google Gemini, Amazon Bedrock, Azure AI Foundry, etc.
- Familiarity with AI concepts in FME. If you haven't already, consider reading through Tutorial: Getting Started with Any AI in FME.
Introduction
The Challenge: Variable Schemas
Standard schema mapping relies on pre-defined lookup tables to translate data. This works well when sources are consistent. However, organizations often aggregate data from multiple sources with unpredictable schemas.
Example: A provincial government collecting address data from various municipalities will receive different field names and formats from each town. Creating a lookup table manually for every source is time-consuming and inefficient.
The Solution: Agentic AI
In this tutorial, you will build an Agentic AI Schema Mapping workspace. This workflow dynamically uses AI to generate the necessary lookup tables for the SchemaMapper transformer. By automating the mapping logic, you eliminate the need to manually configure schemas for every new dataset.
This workflow uses an ‘Any-AI’ approach, allowing you to utilize any of the AI connectors available on the FME Hub to achieve the workflow's results. You do not have to use the OpenAIConnector, which is used in this article. If you’d like to learn more, check out Getting Started with Any AI in FME.
Step-by-Step Instructions
Data Preparation
Data must be prepared by sampling and aggregating records before sending them to the AI service. In this first section, you will use sampling to prepare a representative set of rows to determine the schema. Records will then be aggregated into a single input, ensuring the AI Connector is initiated only once.
Sample datasets can be located in the SampleData.zip file attached to this article.
1. Add a Generic Reader
Open FME Workbench and create a new workspace. The first step in the workflow is to read in the source data using a dynamic reader. If the input data will always be in the same format, you can use a hard-coded format for your reader. In this example, we will use a Generic Reader, which will allow the workspace to accept any input file.
Add a Reader to the workspace and set the parameters as follows:
- Format: Generic (Any Format)
- Workflow Options: Single Merged Feature Type
Click OK to add the Generic Reader to the workspace.
2. Add a Sampler
Add a Sampler to the workspace and connect it to the Generic Reader. The Sampler will control the number of sample records sent to the AI connector.
Open the Sampler parameters and set the following:
- Sampling Rate (N): 5
- Sampling Type: First N Features
-
Randomize Sampling: No
- Tip: Increasing the sample size could improve the AI's accuracy, but it consumes more tokens. Balance the rate based on your budget and the complexity of the data.
Click OK to accept the new parameters.
3. Add a Counter
Add a Counter to the workspace and connect it to the Sampler. Open the parameters and set the following:
- Count: tid
Click OK to accept the new parameters.
4. Add an AttributeExploder
Add an AttributeExploder to the workspace. Connect it to the Counter. Open the AttributeExploder parameters and set it as follows:
- Keep Attributes: Yes
-
Ignore Attributes Containing:
tid|^multi_|^fme_|^shapefile|^csv|^json- This attribute enables us to specify a regular expression that identifies attributes not desired in the output. In this case, we want to ignore format attributes that may be present in the dataset the user uploads
Click OK to accept the new parameters.
5. Add a StringConcatenator
Add a StringConcatenator to the workspace. Connect it to the AttributeExploder. Open the parameters and set it as follows:
- Expression Results: Create New Attribute
- New Attribute: SampleValues
-
String Parts:
-
String Type: Attribute Value
-
String Value:
_attr_name
-
String Value:
-
String Type: Constant
-
String Value:
:
-
String Value:
-
String Type: Attribute Value
-
String Value:
_attr_value
-
String Value:
-
String Type: Attribute Value
The Concatenated Results will show as: @Value(_attr_name): @Value(_attr_value)
Click OK to accept the new parameters.
6. Add an Aggregator
Before you send your data to the AI Service, you must aggregate your sampled data into a single record.
In the case of the OpenAIConnector, an HTTPCaller is used to send out an API request for each record that enters (and triggers) the OpenAIConnector. By aggregating our data into a single record, we only trigger the OpenAIConnector once.
This ensures the OpenAIConnector is triggered only once, preventing multiple API calls and reducing the number of tokens consumed.
Add an Aggregator to the workspace and connect it to the StringConcatenator. Open the parameters and set them as follows:
- Group By: tid
- Accumulation Mode: Merge Incoming Attributes
- Attributes to Concatenate: SampleValues
- Separator Character: click the drop-down arrow to open the Text Editor. Remove the existing comma and hit the enter button on your keyboard. This will create a newline-separated list.
Click OK to accept the new parameters.
7. Add an AttributeManager
Add an AttributeManager to the workspace and connect it to the Aggregator. Open the parameters and set the following:
-
Update Attribute:
-
Input Attribute: SampleValues
-
Value:
--Start Sample Feature @Count()-- @Value(SampleValues) --End Sample Feature–
-
-
Input Attribute: SampleValues
-
Remove Attribute:
-
Input Attribute: tid
- Action: Remove
-
Input Attribute: fme_feature_type
- Action: Remove
-
Input Attribute: _attr_name
- Action: Remove
- Input Attribute: _attr_value
- Action: Remove
-
Input Attribute: tid
Click OK to accept the parameters.
8. Add an Aggregator
Add another Aggregator and connect it to the AttributeManager. Open the parameters and set them as follows:
- Accumulation Mode: Merge Incoming Attributes
- Attributes to Concatenate: SampleValues
- Separator Character: click the drop-down arrow to open the Text Editor. Remove the existing comma and hit the enter button twice on your keyboard. This will create a double-newline-separated list.
Your workspace should now look like this:
With Data Caching enabled, run the workspace with the SpatialExample1.json file.
After finishing, inspect the output of Aggregator_2 in the Visual Preview window. You should see the following result:
You have successfully transformed the raw, individual features into a single aggregated record. The data is now ready to be passed to the AI service.
Configuring the AI Connector
In this next section, you will set up the AI connector to ingest the sample data we’ve prepared and output the resulting lookup table as structured JSON.
In this article, we will use the OpenAIConnector. However, you’re free to use any AI connector you have access to. The general steps still apply here, regardless of the AI connector used.
1. Add an OpenAIConnector
Add an OpenAIConnector to your workspace. Open the parameters and set them as follows:
- API Key: enter your OpenAI API Key
- Action: Text Generation
- Model: gpt-5
- Prompt:
You are a bot that specializes in transforming address data into usable information. The submitted data may arrive in many forms including a single address attribute, multiple attributes, etc. Your task is to determine the attribute mapping required based on the attribute name and sample values to fit the new schema based on the description and example values as described below:
FIELD,DESCRIPTION,EXAMPLE
ST_NUM,"Street Number (i.e. house number, address number, etc)",125
APT_NUM,Apartment Number,#101
BLDGNAME,Building Name,BLDG D
ST_NAME,"Street Name and Type (Street, Avenue, etc., can be abbreviated but expansion of abbreviations is preferred)",Powell St
NEIGHBH,Neighborhood Name,Union Square
CITY,City Name,San Francisco
STATE,State (Two Letter Abbreviation),CA
ZIP,5-digit zip code,94108
CNT_NAME,County Name,San Francisco
CNT_FIPS,County FIPS 6-4 code (refer to Information Technology Laboratory),06075f5
ID,A unique id for the feature,"B1, B2, B3, etc"
-- Source Data--
@Value(SampleValues)
The response JSON should provide the attribute name that corresponds to the description above, not the values from the sample features provided above. Only output a single mapping, don't give me multiple arrays.
-
Structured Output: Enabled
- JSON Schema:
{
"additionalProperties": false,
"properties": {
"Mapping": {
"items": {
"additionalProperties": false,
"properties": {
"APT_NUM": {
"type": "string"
},
"BLDGNAME": {
"type": "string"
},
"CITY": {
"type": "string"
},
"CNT_FIPS": {
"type": "string"
},
"CNT_NAME": {
"type": "string"
},
"ID": {
"type": "string"
},
"NEIGHBH": {
"type": "string"
},
"STATE": {
"type": "string"
},
"ST_NAME": {
"type": "string"
},
"ST_NUM": {
"type": "string"
},
"ZIP": {
"type": "string"
}
},
"required": [
"APT_NUM",
"BLDGNAME",
"CITY",
"CNT_FIPS",
"CNT_NAME",
"ID",
"NEIGHBH",
"STATE",
"ST_NAME",
"ST_NUM",
"ZIP"
],
"type": "object"
},
"type": "array"
}
},
"required": [
"Mapping"
],
"type": "object"
}Click OK to accept the new parameters.
The most critical step in configuring the OpenAIConnector is crafting an effective prompt. There’s no perfect way to prompt AI, however there are techniques that can help you improve the response you get from the AI service.
We recommend that users take an iterative approach of testing and adjusting their prompt based on the response that the AI provides.
If you’re interested in learning more, check out this article on prompt engineering: Getting Started with AI in FME: Prompt Engineering With Text Generation and Structured Outputs
With data caching enabled, run the workspace & inspect the output from the OpenAIConnector. You should see a structured JSON output like the following:
{
"Mapping": [
{
"APT_NUM": "",
"BLDGNAME": "",
"CITY": "City",
"CNT_FIPS": "",
"CNT_NAME": "",
"ID": "",
"NEIGHBH": "",
"STATE": "State",
"ST_NAME": "Street Address",
"ST_NUM": "",
"ZIP": "Postal Code"
}
]
}Creating the Lookup Table
1. Add a JSONFragmenter
This final step creates the lookup table by extracting mappings from the AI's JSON response. Add a JSONFragmenter to the workspace and connect it to the OpenAIConnector. Open the parameters and set them as follows:
- Input Source: JSON Attribute
- JSON Attribute: Response
-
JSON Query:
json["Mapping"][*][*] - Flatten Query Results into Attributes: Yes
- Recursively Flatten Objects/Arrays: Yes
Click OK to accept the new parameters.
2. Add an AttributeManager
Add an AttributeManager to the workspace and connect it to the JSONFragmenter. Open the parameters and set them as follows:
-
Update Attribute:
-
Input Attribute: Response
- Output Attribute: SOURCE
-
Input Attribute: Response
-
Update Attribute:
-
Input Attribute: json_index
- Output Attribute: DESTINATION
-
Input Attribute: json_index
-
Remove Attribute:
- Input Attribute: SampleValues
-
Remove Attribute:
- Input Attribute: json_type
Click OK to accept the new parameters.
Your workspace should now resemble this.
3. Add a FeatureWriter
Add a FeatureWriter to the workspace and connect it to the AttributeManager. Open the FeatureWriter parameters and set them as follows:
- Format: CSV (Comma Separated Value)
-
Dataset: select the drop-down arrow > User Parameter > Create User Parameter. Set the parameters as follows:
- Parameter Identifier: LUT_Output
-
Prompt: Lookup Table Output Folder
Click OK to add the user parameter.
- CSV File Name: lookuptable
Click on the User Attributes tab and set the Attribute Definition to Manual.
We only want the SOURCE and DESTINATION fields in our output dataset, so remove any others that may be present.
Make sure that the Type for both attributes is set to varchar 200.
Click OK to accept the new parameters.
The complete workspace should look something like this.
4. Configure the User Parameters
Before running the workspace, the user parameters need to be cleaned up so that users are only prompted for the Source Dataset and the Lookup Table Output Folder.
- In the navigator window, right-click on User Parameters and select Manage User Parameters.
- Select the Feature Types to Read parameter and delete it.
- You should be left with just the two parameters mentioned previously. Source Dataset and Lookup Table Output Folder
Click OK to accept the parameters. Save your workspace.
5. Test the workspace
The final step is to test the workspace to ensure that the OpenAIConnector generates the correct format for the lookup table and that the rest of the workspace can extract data from the JSON output and write it to a CSV file.
Click Run. When prompted, specify a Source Dataset and a Lookup Table Output Folder. For the source dataset, you can use one of the sample datasets available to download in this article.
Once the workspace has finished running, locate the output folder and open the CSV file in the FME Data Inspector. Depending on the dataset you selected, you may get different results.
If a matching field is not found in the source dataset, then it will be left as <null> in the output.
Conclusion
In this article, you learned how to build a flexible FME workspace that can take any input dataset using the Generic reader and send it to an AI service—in this case, OpenAI. This approach automates what would typically be a time-consuming manual task: creating a lookup table to map your dataset to a target schema.
By letting AI handle the heavy lifting, you can skip the manual field-by-field matching process, saving both time and effort while improving consistency.
In Part 2 of this series, we’ll take things a step further by automating the schema mapping process using the lookup table generated by the workspace you created here.