Using Data Virtualization With Generative AI

Files

Weather_Stations_API.zip
- 10 MB
- Download

FME's Data Virtualization allows users to create and expose AI-ready APIs that dynamically serve data by connecting to FME workspaces. Many generative AI models now offer native API integration, which allows an AI to submit a request to an endpoint on your behalf. Data Virtualization supports the OpenAPI standard, which allows an AI model to easily understand the structure and required parameters of your API.

Why would you use Data Virtualization with generative AI?

Interrogate a database without needing to know a specialized query language like SQL or the data structure.
Allow non-technical users to generate self-served insights without relying on developers or data engineers.
Quickly identify trends and generate visuals from your data.

Requirements

This tutorial will use ChatGPT Actions, which requires a ChatGPT Plus subscription. However, the core concepts should be similar across platforms.
2025.1+ FME Flow (b25562)
2025.1+ FME Form (b25562)

Data Virtualization is currently in technical preview within the 2025.1 beta and should not be used for production. Note that documentation may change rapidly and not reflect the current build. This article was written with FME 2025.1 b25562. Both versions of FME Form and FME Flow must be the same build.

Step-by-Step Instructions

In this scenario, you'll interact with two Data Virtualization endpoints from the Weather Stations API: GET /stations, which provides general information about specific fabricated weather stations in British Columbia, and GET /data, which lets users query a station for its records from the year 2021.

Part 1: Import the API to FME Flow

In this section, we will bring the sample API into FME Flow and test that its endpoints are functioning properly. For more details on how the responses are generated, you can explore the workspaces and data files provided in the sample data folder.

1. Bring the Sample API into Flow

Import the sample project attached to this article into FME Flow. Once imported, you’ll see the Weather Stations API listed under the Data Virtualization menu.

Open the API and navigate to the API details page. Click on the View Documentation button.

A Swagger documentation page will open, displaying two endpoints.

2. Test the Endpoints

Open the GET /stations endpoint and click Try it out. This will reveal an Execute button that you can use to test the endpoint.

Executing the endpoint returns a list of weather stations, including attributes that describe each station's location and surrounding environment.

Next, open the GET /data endpoint. This endpoint requires a station name and a date; it will return hourly weather records for the specified date. Enter LYCHEE as the station name and 20210504 as the date, then click Execute.

24 records are returned for each hour of the day.

The underlying WeatherObservations.sqlite database file only contains data for the year 2021. Additionally, some weather stations may have missing entries for certain dates.

3. Download the OpenAPI Specification

Navigate back to the Data Virtualization page. Click the checkbox next to the Weather Stations API and select Actions > Export OpenAPI Specification. This will download the API’s OpenAPI specification file to your local machine.

Once downloaded, open the OpenAPI Specification JSON file in a text editor and observe its contents.

This file provides information about our API, including:

API Metadata (title, description, version, etc.)
Paths
Operations
API Components (schemas, responses, parameters, request bodies, headers, examples)
Tags

AI models can use this information to understand how to structure our API calls and gain context about the purpose of each endpoint.

Part 2: Connect to Chat GPT

Now that we've tested the API and confirmed it's working, we can connect our endpoint to ChatGPT and begin evaluating its capabilities. A custom GPT is a version of ChatGPT that you can personalize with specific instructions, knowledge, or actions. By default, you cannot change the model that a custom GPT uses; typically, the platform automatically assigns the most recently released model. In this article, the custom GPT uses GPT-4.1.

1. Configure a Custom GPT

Open ChatGPT in a web browser. On the sidebar, select GPTs and then Create.

Your page may look different depending on your organization’s settings.

A new draft GPT is created. Select Configure and fill in the following fields:

Name

FME Data Virtualization Connector

Description

Connects to the FME Flow Weather Stations API to analyze and visualize weather data.

Instructions

This GPT acts as a data operations assistant that allows users to upload OpenAPI specifications. It will interpret the OpenAPI specification to make GET requests, enabling it to read from databases through RESTful APIs. The assistant is responsible for identifying patterns in the data, providing insights, and extracting analytical summaries. It ensures that all requests conform to the provided API specification and interacts safely with data endpoints. When querying the internet, it gathers relevant external information to contextualize database contents or support decision-making. If any part of the API schema or required endpoint usage is ambiguous, the assistant will ask the user for clarification or make reasonable assumptions to proceed.

The assistant must never fabricate data or endpoints and should always treat the OpenAPI specification as the source of truth for available actions and their proper execution. If a request falls outside the defined capabilities of the spec, the assistant will inform the user accordingly.

Responses should be accurate, technical when necessary, and clearly presented, including summaries of actions taken and results found. When suitable, visualizations or concise tables should be used to highlight data patterns.

When interacting with the /data endpoint, the GPT must adhere to the following rules:

station_name parameter: Use a valid station name exactly as it appears in the response from the /stations endpoint. This ensures the query targets a recognized weather station.
date parameter: Format the date as yyyymmdd (e.g., April 19, 2021 becomes 20210419). This format must be followed exactly for the endpoint to return accurate data.

Example:
If the user asks, “What was the weather at lychee on April 19th, 2021?”, GPT should query the /data endpoint with:
station_name=lychee
date=20210419

The assistant communicates professionally, focuses on task execution, and engages in clarifying questions only when necessary to perform a correct operation.

The model should be able to understand how to use your endpoint based solely on the OpenAPI specification. However, providing an example can boost the model's accuracy and reduce the likelihood of errors. While adding examples for every function would be impractical for large APIs, we encourage you to test endpoints both with and without them to see the performance difference.

Next, enable Code Interpreter & Data Analysis. This isn’t necessary to retrieve data from the endpoint, but it will allow the model to generate visuals from the data.
Once finished, select Create new action.

Leave authentication to none. If you would like to generate a token for your endpoint, see: Secure Data Virtualization Endpoints with Authentication.

Paste the contents of your OpenAPI specification file into the Schema section.

You should see the endpoints we saw previously in the Swagger documentation appear below.

2. Interrogate the Endpoint

Under Available actions, click the Test button beside the /stations path. The preview window will ask for permission to send requests to the server. Click Always Allow to grant access.

Chat GPT will return some information about the response. Your response may differ from the example shown:

Try asking your GPT some questions about the data. For example, which station has the lowest elevation?

Next, let’s create a visualization of the data from the Lychee weather station for July 21st, 2021.

If Code Interpreter & Data Analysis is not enabled, the model may generate an image of a graph using fabricated data.

In this tutorial, we leveraged FME’s Data Virtualization and generative AI to democratize a standard database into an intelligent, conversational partner. Using the OpenAPI standard as a universal translator, generative AI can understand your API and allow any user to ask complex questions in plain language.