Getting Started with AI in FME: Image and Visual AI Tools

Files

Introduction

This article is for FME users who want to automate image analysis using AI vision models. It covers how to configure the GoogleGeminiConnector to extract structured data from images, validate AI responses using regex, and improve accuracy by cropping images to a region of interest using bounding boxes. No prior AI experience is required, but a working Google AI OAuth Web Connection is needed before getting started.

Modern data workflows increasingly rely on extracting information from images, whether for classification, object detection, or text recognition (OCR). FME makes it easy to integrate visual recognition capabilities into your pipelines by connecting directly to industry-leading AI services such as OpenAI Vision, Google Gemini, and Azure Computer Vision. These integrations allow users to transform unstructured image data into structured, actionable insights with minimal effort.

Using transformers like the OpenAIConnector, GoogleGeminiVisionConnector, AzureAIVisionConnector, FME can process images stored locally or referenced by URLs. This includes dynamic workflows, such as listing objects in an S3 bucket using the S3Connector and passing those image URLs to an AI vision model for analysis.

These workflows help automate manual data interpretation tasks, reduce errors, and accelerate decision-making across departments.

Common use cases include:

Classifying field-collected photos
Detecting assets in aerial imagery
Extracting text from inspection photos (for example, reading water or gas meter values submitted through Survey123)

This article introduces not only the transformers involved but also best practices for enforcing structured output, validating AI responses, and improving accuracy in production workflows.

Scenario

A practical example of this capability comes from a user story shared by our partner Avineon Tensing, starting at 25:45. In their workflow, field crews collect photos of electrical meters during inspections, and FME uses Google Gemini to automatically extract and validate meter readings.

In the original implementation, the meter photo was submitted through Survey123 and delivered to FME via a webhook. For details on configuring Survey123 to push images in real time, refer to this article on pushing Survey123 data via webhooks.

For the purpose of this article, we will focus on a scenario where the photo is already stored and accessible within your system, allowing you to begin directly at the image‑processing stage.

Requirements

FME Workbench- 2025.1.0.0 (Build 25606) or higher. Download the newest version of FME here.
A working Google AI OAuth Web Connection. This requires a Google Cloud project with a configured OAuth client (Client ID and Client Secret) and a Google AI OAuth Web Connection set up in FME. See our AI Service Authentication article for step-by-step instructions.

AI model availability and naming may evolve over time. Confirm the supported models here.

Section 1: Getting Started with AI and Images

What You'll Build: A workspace that takes an image of a water meter, sends it to Google Gemini for analysis, extracts a structured JSON response containing the meter reading and image clarity, and validates the result using regex.

Step-by-step Instructions

In this case, we are using an AI-generated image as an example; however, this could work with any image.

1. Open FME Workbench

2. Add a Creator

3. Select a Gemini Model
Visit the Google Gemini page and decide which model to use. Select the model you want to use and note its model code.

For testing purposes, use a lightweight model such as: gemini-2.5-flash-lite

4. Add a GoogleGeminiConnector
Back in FME Workbench, attach a GoogleGeminiConnector to the Creator.

To use this transformer, you’ll need to configure the Google AI OAuth Web Connection. If you have not used this connection before, please set it up using the instructions in the OAuth Authentication section of this article.

Authentication:
- Region: <the region of your data>
- Project ID: <the project ID from the app created>
- Google AI OAuth: <Google Web Connection>
General:
- Model: gemini-2.5-flash-lite
- Prompt:

Analyze the provided image of a water meter. Extract the exact numerical meter reading and assess the image clarity. Provide your response as a valid JSON object with the keys 'reading' (for the number) and 'clarity' (for the assessment: Clear, Blurry, or Obscured).

Path to File / URL: <path to file downloaded in the files section of this workspace>
MIME Type: image/png
Structured Output: Selected
- JSON Schema:

{
  "properties": {
    "clarity": {
      "description": "An assessment of the image's legibility regarding the meter digits.",
      "enum": [
        "Clear",
        "Blurry",
        "Obscured"
      ],
      "type": "STRING"
    },
    "notes": {
      "description": "Optional brief context if there are anomalies (e.g., 'Glass reflection makes the last digit difficult to read').",
      "type": "STRING"
    },
    "reading": {
      "description": "The exact numerical reading displayed on the water meter. If digits are partially obscured, provide the best estimate. Do not include units (e.g., kWh).",
      "type": "STRING"
    }
  },
  "required": [
    "reading",
    "clarity"
  ],
  "type": "OBJECT"
}

5. Run the Workspace

Run the workspace with feature caching enabled.

To inspect the response, click on the GoogleGeminiConnector's output port cache and look for the Response attribute.

{
  "reading": "00124578",
  "clarity": "Clear"
}

In this case, the reading was quite accurate. However, we could provide additional context in our prompt to help reduce the chances of AI irregularities.

For example, you could incorporate historical customer information from a database and have the model evaluate whether the meter values fall within an expected range based on prior usage patterns, seasonal trends, or known operational thresholds. This context allows the AI to reason comparatively rather than treating each reading in isolation.

You could also include metadata, including meter type, location, recent maintenance events, or known anomalies (for example, outages or extreme weather) to further ground the evaluation. By constraining the model with domain-specific expectations and reference data, the AI can flag readings that are statistically unusual, request human review when confidence is low, or provide a clearer justification for why a reading is considered valid.

Overall, enriching the prompt with historical and operational context shifts the AI from simple pattern recognition toward informed validation, improving reliability and trust in the results.

6. Add a JSONFlattener
The AI response is returned as a raw JSON string. The JSONFlattener converts this into individual FME attributes so the values can be used downstream.

Add a JSONFlattener to the canvas and attach it to the GoogleGeminiConnector.

JSONDocument: Response
Recursively Flatten Objects/Arrays: Yes
Attributes to Expose: clarity reading

Now we have the readings, which can be written out to any format.

7. Add a Tester
Ideally, with every AI-driven workflow, you can include rules and manual checks to ensure the data is accurate. Or something to alert you if you are receiving incorrect values from the AI.

In our fake Water Meter, it looks like this water meter accepts 8 digits, with values from 0-9, there could be spaces in the values created by the AI reading. However, that is okay in our workflow.

We can use some regex in a Tester to filter out any value that does not make sense.

Add a Tester to the canvas.

Connect the Output port of the JSONFlattener to the Input port of the Tester.

Configure the Tester as follows:

Left Value	Operator	Right Value
reading	Contains Regex	^(?:\d\s*){8}$

This regex checks that the reading consists of exactly 8 characters, each of which is a digit (0–9), with optional spaces between them.

8. Attach a Terminator to the Failed Port
Attach a Terminator to the Failed port. This will cause the workspace to fail if it reaches this point.

The Terminator does not need any custom configuration. However, you could potentially include a custom message stating the reason for the failure.

Section 2: Improving Accuracy with Bounding Boxes or Reducing Image Area

What You'll Build: A workspace that reads a multi-page floodplain map PDF, tiles and samples a specific section, exports it as a PNG, and sends it to Google Gemini to extract structured metadata from the title block, including project title, drawing number, scale, and issue date.

Bounding boxes are a key technique for improving the accuracy of image-based AI workflows. They define a specific region of interest within an image, allowing AI models to focus only on the relevant area rather than processing the entire image.

If the entire image represents the region of interest, for example, a photo taken specifically of a single asset with no surrounding context. Bounding boxes may not be necessary, and passing the full image directly to the model is sufficient.

This is especially important when working with generative vision models, which analyze images holistically. Background elements such as glare, labels, shadows, or surrounding equipment can introduce noise and increase the risk of incorrect readings.

This is also helpful for images containing significant amounts of text. A higher quantity of text increases the probability of the AI model becoming confused.

For this next example, we’ll be using Floodplain maps from the province of British Columbia: https://www2.gov.bc.ca/gov/content/environment/air-land-water/water/drought-flooding-dikes-dams/integrated-flood-hazard-management/governance/flood-hazard-land-use-management/floodplain-mapping/floodplain-maps-by-region

In this example, the floodplain map has been standardized, so we clip it to a fixed region defined by a standardized layout. However, if the area is not standardized, the GoogleGeminiVisionBoundingBoxCreator can be used. This transformer passes images to Google Gemini's multi-modal models, which return bounding boxes and vector geometry identifying the region of interest. This bounding box can be used to clip a raster downstream.

Step-by-step Instructions

1. Download a Floodplain Map
Download the following map: https://www.env.gov.bc.ca/wsd/data_searches/fpm/reports/bc-floodplain-maps/SeymourR@NVancouver/3-93-5.pdf

2. Open FME Workbench
Open FME Workbench and select New to create a blank workspace.

3. Add a PDF Reader
Start typing PDF on the canvas and add a Adobe Geospatial PDF [Reader].

Select the PDF dataset downloaded in step 1. Click OK to add the reader to the canvas.

4. Add a RasterTiler
Add a RasterTiler to the canvas. Connect the output port of the reader to the Input port of the RasterTiler.

Open the RasterTiler, set:

Columns:1
Rows:6
Force Equal Size:Yes

Click OK.

This divides the PDF page into 6 horizontal strips of equal height, isolating the title block in the bottom section.

5. Add a Sampler
Add a Sampler to the canvas.

Attach the Tiles output port to the Input port of the Sampler.

Open the Sampler to configure the parameters.

Sampling Type: Last N Features

Click OK to continue.

Since the title block is in the last (bottom) tile, selecting Last N Features with N=1 ensures only that section is passed to Gemini.

6. Add a TempPathnameCreator
Add a TempPathnameCreator and attach the output of the Sampled port to the TempPathnameCreator input port.

This returns a temporary file or folder path that will be deleted.

7. Add a FeatureWriter
Add a FeatureWriter and connect the output port of the TempPathnameCreator to the input port of the FeatureWriter.

Configure the following parameters:

Format: PNG (Portable Network Graphics)
Dataset: _pathname (attribute)
World File Generation: No

Click OK.

8. Add a GoogleGeminiConnector
Add a GoogleGeminiConnector and attach the Summary port from the FeatureWriter to the Input port GoogleGeminiConnector.

Authentication:
- Region: <the region of your data>
- Project ID: <the project ID from the app created>
- Google AI OAuth: <Google Web Connection>
General:
- Model: gemini-2.5-flash-lite
- Prompt:

You are analyzing a scanned floodplain mapping sheet.

Carefully inspect the title block and all lower margin information. Extract the following fields exactly as written on the drawing:

Project Title

Drawing Number

Scale

Date Issued

Instructions:

Only extract information that is clearly visible in the image.

Do not infer or guess missing values.

If a field is not legible or not present, return: Not visible in provided image.

Preserve capitalization, punctuation, and formatting exactly as shown.

Look specifically in:

Bottom-right title block

Lower margin revision table

Approval / signature section

Notes area if date appears there

- Path to File / URL: @Value(_dataset)/Output.png
- MIME Type: image/png
- Structured Output: Selected
- JSON Schema:

{
  "properties": {
    "clarity": {
      "description": "An assessment of the legibility of the title block and metadata fields.",
      "enum": [
        "Clear",
        "Partially Legible",
        "Illegible"
      ],
      "type": "STRING"
    },
    "date_issued": {
      "description": "The issue date as written. If not legible, return an empty string.",
      "type": "STRING"
    },
    "drawing_number": {
      "description": "The unique drawing identifier listed in the title block.",
      "type": "STRING"
    },
    "notes": {
      "description": "Optional contextual notes regarding scan quality, missing fields, or ambiguity in interpretation.",
      "type": "STRING"
    },
    "project_title": {
      "description": "The official project title as written in the title block of the engineering drawing.",
      "type": "STRING"
    },
    "scale": {
      "description": "The map scale exactly as written on the drawing.",
      "type": "STRING"
    }
  },
  "required": [
    "project_title",
    "drawing_number",
    "scale",
    "date_issued",
    "clarity"
  ],
  "type": "OBJECT"
}

@Value(_dataset) dynamically references the temporary path generated by the TempPathnameCreator, pointing to the PNG exported by the FeatureWriter.

9. Add a JSONFlattener
Add a JSONFlattener to the canvas and attach it to the GoogleGeminiConnector.

JSONDocument: Response
Recursively Flatten Objects/Arrays: Yes
Attributes to Expose: clarity date_issued drawing_number notes project_title scale

In this article, you've seen how FME can connect to Google Gemini to extract structured data from images, from reading utility meters to parsing engineering drawing title blocks. By combining AI vision models with FME's transformation and validation tools, you can automate workflows that would otherwise require manual interpretation.

Once the data has been extracted and validated, FME can write the results to virtually any destination. Common targets include:

Microsoft Excel — for sharing results with non-technical stakeholders
Esri ArcGIS Online Feature Service — for publishing spatial results to a web map
Snowflake — for loading results into a cloud data warehouse for reporting and analytics
PostgreSQL (or any other database) — for storing structured results in a database

For a full list of supported destinations, see the FME Readers and Writers documentation.

As a next step, consider what images exist in your own workflows that currently require manual interpretation, inspection photos, scanned documents, field survey images and how structured AI output could feed directly into your existing pipelines.

Search

Getting Started with AI in FME: Image and Visual AI Tools

Files

Introduction

Scenario

Requirements

Section 1: Getting Started with AI and Images

Step-by-step Instructions

Section 2: Improving Accuracy with Bounding Boxes or Reducing Image Area

Step-by-step Instructions

Was this article helpful?