Getting Started with AI in FME: Extracting Insights from Unstructured Documents

Sienna Emery
Sienna Emery
  • Updated

FME Version

Introduction

Organizations often deal with unstructured documents—like PDFs, reports, and scanned forms—that contain valuable insights but are difficult to process using traditional tools. With FME and the OpenAIConnector transformer (or any of our other AI Transformers), you can apply the power of large language models to extract structured insights from this content automatically.

In this tutorial, you’ll learn how to analyze a PDF file and return key information such as summaries, insights, and structured metrics. We’ll walk through a real-world example using Nike Inc.’s annual financial report, but the same approach can be applied to a wide range of document types—whether you’re summarizing policy documents, extracting key metrics from reports, or flagging important content for further review.

AI Disclaimer
The results generated by the OpenAIConnector are based on predictions from a large language model and may contain inaccuracies, misinterpretations, or omissions. Always review AI-generated outputs before relying on them for decision-making or reporting. For critical use cases, validate insights against source data or consult a subject matter expert.

Requirements

Step-by-Step Instructions

In this section, you’ll build an FME workspace that reads a PDF document and uses the OpenAIConnector to extract meaningful insights in a structured format. From there, FME transforms the output into a clean Excel report, making it easy to review and share.

The workflow will:

  • Read a single PDF document
  • Send the document to OpenAI for analysis using a structured prompt
  • Extract a summary, top insights, and key metrics
  • Clean and restructure the response using standard FME transformers
  • Output the results to a formatted Excel spreadsheet

 

1. Get an OpenAI API Key

If you haven’t already, sign up for the OpenAI API, then generate an API key. You’ll need this key to authenticate requests made by the OpenAIConnector in FME.

 

2. Download the Example PDF

Click this link to download the PDF and save it in an easily accessible location.

 

3. Create a New FME Workspace

Open FME Workbench and create a blank workspace.

 

4. Add a Creator

Click on a blank space on the canvas, and start typing Creator to bring up the Quick Add dialog. Double-click the Creator transformer to initiate the flow.

 

5. Add an OpenAIConnector

Tips for prompt generation:

• Be clear and specific in your prompt. The more context you provide, the better the model performs.
• Role-based prompts (e.g., “You are a document classification assistant…”) help set expectations for the model.
• Use structured output whenever possible. This makes downstream processing much easier, especially in FME, where structured JSON can be parsed into attributes using the JSONFlattener.
• Enumerate options. If you want consistent classification (e.g., fixed categories), provide a defined list the model can choose from.
• Ask for justification. Including fields like explanation or confidence_score can help with auditing and quality assurance.

Add an OpenAIConnector and connect it to the Creator port. Double-click the OpenAIConnector to edit the following parameters:

  • API Key: Your OpenAI API Key
  • Action: File Search
  • File to Upload: The path to your PDF file downloaded in step 2
  • User Prompt:
You are a financial analyst assistant. Review the financial performance of Nike, Inc. for fiscal year 2024 based on the structured data provided.

Summarize Nike’s financial health in plain language, focusing on revenue growth, profitability, margins, returns to shareholders, and liquidity.

Provide a JSON response that includes:

- a concise summary

- top 3 insights

- key metrics with short commentary
  • Structured Output: Checked

Use the following JSON schema in the "Structured Output Schema" field:

{

  "additionalProperties": false,

  "properties": {

    "insights": {

      "items": { "type": "string" },

      "type": "array"

    },

    "key_metrics": {

      "additionalProperties": false,

      "properties": {

        "cash_end_of_year": { "type": "string" },

        "dividends_paid": { "type": "string" },

        "eps_diluted": { "type": "string" },

        "gross_margin": { "type": "string" },

        "net_income": { "type": "string" },

        "revenue": { "type": "string" },

        "roic": { "type": "string" },

        "share_repurchases": { "type": "string" }

      },

      "required": [

        "revenue", "net_income", "gross_margin", "eps_diluted",

        "roic", "cash_end_of_year", "dividends_paid", "share_repurchases"

      ],

      "type": "object"

    },

    "summary": { "type": "string" }

  },

  "required": ["summary", "insights", "key_metrics"],

  "type": "object"

}
  • Advanced > Connection Timeout (seconds): 360

Click OK.

The transformer should look like this.

 

 

6. Add a JSONFlattener

Add a JSONFlattener and attach it to the OpenAIConnector output port.  After the OpenAIConnector returns its structured response, the data is stored in a single JSON-formatted attribute named Response. To work with the individual fields (like summary), you'll need to parse that JSON into usable FME attributes. Flattening allows you to extract structured fields into FME attributes for further use.

Double-click the JSONFlattener and edit the following parameters:

  • JSON Attribute to Flatten: Response
  • Recursively Flatten Objects/Arrays: No
  • Attributes to Expose: summary, insights, key_metrics

You can select Response from the drop-down menu beside the parameter field. If you don't see it listed, try running the workspace with feature caching enabled once to populate available attributes, then return to the transformer settings to select it.

Click OK.

 

7. Add a Second JSONFlattener
Add another JSONFlattener and attach it to the output port of the first JSONFlattener. This will be used to flatten the key_metrics value.

  • JSON Attribute to Flatten: key_metrics
  • Recursively Flatten: Yes

 

8. Run with Feature Caching Enabled

Before proceeding, run the workspace with feature caching enabled. This is required in order to use the Import > From Feature Cache option in the next step. On the top toolbar (ribbon), click the green Run button.

9. Add an AttributeExposer

Add an AttributeExposer and attach it to the Output port of the second JSONFlattener. Expose hidden attributes using Import > From Feature Cache.

Select:

  • cash_end_of_year
  • dividends_paid
  • eps_diluted
  • gross_margin
  • insights
  • net_income
  • revenue 
  • roic
  • share_repurchases 

Click OK.

10. SubstringExtractor

Next, we’ll clean up the insights attribute returned by the OpenAIConnector. This attribute is an array of text items, but it arrives in a format that isn’t ideal for display or export.

We’ll use a SubstringExtractor followed by a couple of StringReplacer transformers to remove the square brackets, commas, and quotation marks—leaving us with a clean, readable string of insights.

This will make the summary easier to read in the final Excel report.


Add a SubstringExtractor and attach it to the Output port of the AttributeExposer. Double-click it and edit the following parameters.

  • Source String: insights
  • Start Index: 2
  • End Index: -2

11. Add a StringReplacer 

Add a StringReplacer and connect it to the output port of the SubstringExtractor.

Double-click the transformer and configure the following parameters:

  • Attribute to Modify: _substring
  • Mode: Replace Text
  • Text to Replace: ,
  • Replacement Text: (leave blank or use a space/newline depending on preference)

Click OK.

This step removes commas from the string to separate individual insights more cleanly.

 

12. Add a Second StringReplacer 

Add another StringReplacer and connect it to the output port of the first StringReplacer.

Double-click the transformer and configure the following parameters:

  • Attribute to Modify: _substring
  • Mode: Replace Text
  • Text to Replace: "
  • Replacement Text: (leave blank)

Click OK.

This removes any remaining quotation marks from the insights string, leaving a clean block of text ready for output.

13. Add an AttributeManager to Clean and Rename Attributes

Add an AttributeManager and connect it to the output port of the second StringReplacer.

Double-click the transformer and configure the following:

Remove the following attributes:

  • Response
  • _creation_instance
  • summary
  • insights
  • key_metrics

Rename the following attributes:

  • cash_end_of_yearCash at End of Year
  • dividends_paidDividends Paid
  • eps_dilutedEarnings Per Share (Diluted)
  • gross_marginGross Margin
  • net_incomeNet Income
  • revenueRevenue
  • roicReturn on Invested Capital (ROIC)
  • share_repurchasesShare Repurchases
  • _substringKey Summary

Click OK to save your changes.

The transformer should look like this.

This step prepares your data for final reporting by removing temporary fields and applying clear, human-readable names to key metrics.

14. Add an AttributeKeeper to Prepare for Exploding Attributes

Add an AttributeKeeper and connect it to the output port of the AttributeManager.

Double-click the transformer and configure it to keep only the following attributes (including any unexposed attributes):

  • Cash at End of Year
  • Dividends Paid
  • Earnings Per Share (Diluted)
  • Gross Margin
  • Net Income
  • Return on Invested Capital (ROIC)
  • Revenue
  • Share Repurchases

Click OK to apply.

This step filters out all remaining unnecessary attributes, leaving only the metrics you want to output.

 

15. Add an AttributeExploder

Add an AttributeExploder and connect it to the output port of the AttributeKeeper.

Double-click the transformer and leave the default settings as-is.

 

Click OK to apply.

This transformer converts the selected attributes into individual records—one per metric—so that each key metric appears in its own row. This is useful for transposing the data into a format suitable for reporting and export to Excel.

 

16. Add a Tester 

Add a Tester and connect it to the output port of the AttributeExploder.

Double-click the transformer and configure the following test clause:

  • Left Value: _attr_name
  • Operator: Contains
  • Right Value: fme

Click OK.

This step filters out any internal attributes that may start with fme, ensuring that only your custom metrics are included in the final report.

 

17. Add an AttributeManager 

Add an AttributeManager and connect it to the failed port of the Tester.

Double-click the transformer and configure the following renaming operations:

  • _attr_name → Metric
  • _attr_value → Value

Click OK to apply.

 

This step prepares the remaining attributes for export by giving them clear, user-friendly names. Connecting to the failed port ensures you're only transforming attributes that passed the test and are valid for output.

 

18. Add an Excel Writer 

Add an Excel Writer and connect it to the output port of the AttributeManager.

In the Add Writer dialog, configure the following:

  • Format: Excel
  • Dataset: Choose a location and filename for your Excel output (e.g., Nike_Financial_Summary.xlsx)

Click OK, then configure the Feature Type Parameters:

  • Sheet Name: Summary (or another name of your choice)
  • Set Column Order: Yes
  • Columns: Metric, Value

Click OK to finish setup.

 

Once complete, running the workspace will generate a clean, structured Excel report with your extracted metrics and insights—ready for review, sharing, or further analysis.

Want to style your Excel output using a predefined format? You can use a template Excel file to control fonts, colors, and more. Check out this guide: Using a Template File when Writing Excel Data

19. Run the Workspace

When you run the completed workspace, FME will generate an Excel file that includes your metrics

This output can be easily shared with stakeholders or integrated into broader reporting workflows.

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.