Introduction
File Search is an OpenAI tool for parsing documents (such as PDFs and Word files) and extracting relevant information based on a prompt. Similar capabilities are also available with Google Gemini, using FME’s GoogleGeminiConnector. With FME, you can employ the File Search capability on unstructured documents to automate the process of extracting information. This process is ideal for extracting structured answers from unstructured sources such as quarterly earnings reports, legal documents, or even internal documentation.
In this tutorial, you will learn how to use File Search to summarize a financial report PDF and extract specific values like quarterly revenue. Using the OpenAIConnector transformer, we can also structure the AI’s response for consistency and easy processing.
Requirements
- FME Workbench 2025.0 or later
- Access to the OpenAI API with File Search enabled
- OpenAI API Key
- A sample document (e.g., quarterly earnings PDF from Nike)
Step-by-Step Instructions
In this section, we will build an FME workspace that sends a PDF into the OpenAIConnector to perform a file search and outputs the response as attributes for use in our workspace.
Part 1: Reading the File and Sending to OpenAI
This article assumes you already have one or more PDFs available locally. If needed, files can be downloaded using additional Transformers such as OneDriveConnector or HTTPCaller.
For the sake of simplicity, this article will demonstrate using a PDF that is already available on your machine.
1. Open a New Workspace
Open FME Workbench and create a new blank workspace.
Add a Creator transformer to the workspace.
2. Add an OpenAIConnector Transformer
Add an OpenAIConnector transformer to the workspace. Connect the Creator transformer to the input port. Your workspace should look like this:
Add an OpenAIConnector transformer to the workspace. Connect the Creator transformer to the input port. Your workspace should look like this:
Double-click the OpenAIConnector transformer and set the following parameters:
- API Key: <Provide your own OpenAI API Key>
- Action: File Search
- File to Upload: <select the document to parse on your local machine>
-
User Prompt:
Role: You are a financial analyst assistant specialized in parsing corporate earnings reports. Your role is to extract structured key performance indicators (KPIs) and financial highlights from a Nike quarterly earnings PDF. -
Structured Output: Enabled
{ "additionalProperties": false, "properties": { "full_year_dividends": { "type": "string" }, "full_year_revenue": { "type": "string" }, "gross_margin_percent": { "type": "string" }, "q4_diluted_eps": { "type": "string" }, "q4_dividends": { "type": "string" }, "q4_revenue": { "type": "string" }, "wholesale_revenues": { "type": "string" } }, "required": [ "full_year_revenue", "q4_revenue", "gross_margin_percent", "full_year_dividends", "q4_dividends", "q4_diluted_eps", "wholesale_revenues" ], "type": "object" }
Save and Run your workspace. Ensure that your OpenAIConnector successfully outputs a response.
Tip: multiple files can be easily passed to the OpenAIConnector as well. For example, Step 3 in this article shows how to select all PDFs from a folder using the Directory and File Pathnames Reader.
Part 2: Extract Attributes from the JSON Response
Now that we have the response from OpenAI, let’s extract specific fields and prepare the data for output.
1. Flatten JSON Results
Because we requested a structured JSON response from the OpenAI API, we need to use a JSONFlattener to process the JSON into usable attributes in FME. This creates individual attributes for each field (e.g., revenue, quarter, source).
Add a JSONFlattener transformer and connect it to the output port of the OpenAIConnector. Set the following parameters:
- JSON Document: Response
-
Attributes to Expose:
- full_year_revenue
- q4_revenue
- gross_margin_percent
- full_year_dividends
- q4_dividends
- q4_diluted_eps
- wholesale_revenues
2. Add an AttributeManager Transformer
We can easily delete attributes that we don’t need with an AttributeManager transformer. Add one on the canvas and connect it after the JSONFlattener.
Remove the following attributes:
- Response
- _creation_instance
3. Run the Workspace
With feature caching enabled, run the workspace.
Inspect the AttributeManager feature cache. The information extracted from the financial report has been neatly packaged into attributes with values.
4. Add a JSON Writer
Now we need to write out the data. The output format depends on what we need the attributes for. For our example we will write it back out again to JSON, this time with only the necessary attributes kept.
Congratulations, you have successfully built a workspace to extract relevant information with OpenAI’s File Search in FME! Your finished workspace should look like this:
Additional Resources
Cognitive Systems (AI/ML): check out the landing page for Safe’s AI articles!
Getting Started with AI in FME: Web Searching: learn more about another OpenAI tool, web searching!
Using the Directory and File Pathnames Reader: learn more about the Directory and File Pathnames Reader!
Data Attribution
Example data used in this tutorial is publicly available from Nike’s investor relations site: https://s1.q4cdn.com/806093406/files/doc_financials/2025/q4/Q4-FY25_Press-Release_FINAL.pdf
Data used is for demonstration purposes only.