Offset-Based Pagination

Evie Lapalme
Evie Lapalme
  • Updated

FME Version

Introduction

In FME, there are many ways to connect and utilize APIs. We can make API calls using a FeatureReader, a JSONExtractor, or an HTTPCaller. In this tutorial, we will be covering how to make an FME Server API call and Yelp Fusion API call with the HTTPCaller where pagination is required because the amount of records exceeds the limit. 

 

What is a REST API?

API (Application Programming Interface) is a software intermediary that allows two applications to talk to each other over the web. The RESTful API allows the use of HTTP requests to access and use data such as read, update, or even create by calling a GET, a PUT, or a POST. You can create your own custom REST API workflows using the HTTPCaller with authorization with a token or web connection!

 

What is Pagination?

Pagination is the process of separating print or digital content into discrete pages. With an API call, most responses are in JSON where pagination refers to displaying a portion of data for a large dataset. This could mean the first 20 items, thousand items, or whatever the page limit is set to. 

 

When to Loop with FME?

A loop is often used to carry out iteration; where a process repeats to gradually narrow the process to the desired result. A loop is linked to a condition; i.e. the action continues until a certain condition is met. This is the most common use of a loop in FME because iterations are built into processing features automatically. You can only create loops with FME in custom transformers. For more information on loops see To Loop or Not to Loop webinar.

 

Requirements

 

Step-By-Step Instructions

Example 1: Cloner with Offset-Based Pagination 

This example uses the FME Server REST API. Since it returns the total count, there is no need to loop. The following example shows how to use a Cloner to get all results from the API

1. Open FME Workbench
Open FME Workbench and click New to create a new workspace.
Add a Creator to the canvas.

2. Add an HTTPCaller
Add an HTTPCaller to the canvas and connect it to the Creator. Click on the cog wheel to configure the parameters, they should look like this:

  • Request URL: http://localhost/fmerest/v3/transformations/jobs/completed
  • HTTP Method: GET
  • Query String Parameters:
    • limit: 1000
    • offset: 0
  • Headers:
    • Accept: application/json

Use a web connection to your server by selecting the checkbox “Use Authentication” and Authentication Method as Web Connection. Add a web connection.

image.png

Fill out your credentials for your FME Server and this will provide access to the FME Server API via a Token Authentication.

image.png
image.png

3. Extract the Total
Add a JSONExtractor to the canvas and connect it to the output port of the HTTPCaller. Open the parameters and set the Target Attribute to “totalCount” and the JSON Query to: 

json["totalCount"]


image.png

4. Test the totalCount
The response has a maximum limit of 1000, therefore if there are responses less than 1000 we only need one call and no pagination and can parse the JSON right away. If there are more than 1000 responses then we need pagination. Add a Tester to the canvas and connect it to the JSONExtractor. In the parameters, enter in this logic:

  • Left Value: totalCount
  • Operator: >
  • Right Value: 1000


image.png

5. Pagination through Cloning
Add an AttributeManager to the canvas and connect it to the “Passed” port of the Tester. Manage the attributes by removing any that are undesirable, keep totalCount, and create a new attribute called “cloner”. Set the value of cloner attribute to:

 @add(@int(@div(@Value(totalCount),1000)),1)

This ensures that the value has no decimals (int) after being divided, and includes the original plus the pagination (add, 1).

image.png

Add a Cloner to the canvas. The number of copies will equal the attribute totalCount.

image.png

Add another AttributeManager to the canvas and connect it to the Cloner output port. Set the value of _copynum to

@mult(@Value(_copynum), 1000)

which will multiply the value by a thousand. This will be the offset for the HTTPCaller.

6. Final Call
Duplicate the HTTPCaller, and connect it to the last AttributeManager. Parameters stay the same except change the offset to the attribute _copynum.

image.png

7. Add a JSONFlattener
Add a JSONFlattener to the canvas and connect the input port to the output port the second HTTPCaller and the failed output port of the Tester. This will parse your data into a list attribute. For more information on lists see the Getting Started with List Attributes article 

8. Run your workspace
Run your workspace and it should look like the image below. 

image.png

 

Example 2: Loop with Offset-Based Pagination 

The Yelp Fusion API does not provide a total number of objects or provide a page number; therefore, we do not know how many calls to make. We have to use a loop in FME to get all of the data from the API. In this example, the loop will run until we do not get a valid response from the API. 

1. Sign up for the Yelp Fusion App API
You can create a free Yelp App to obtain the Fusion API key. Once created, be sure to keep your API key in a place that you can use again. 

2. Open FME Workbench
Open FME Workbench and click New to create a new workspace.

3. Get Grocery Stores from Yelp API
Add a Creator to the canvas then connect  an HTTPCaller. Set your HTTPCaller parameters as follows:

  • Request URL: https://api.yelp.com/v3/businesses/search
  • HTTP Method: GET
  • Query String Parameters
    • location: Coquitlam
    • categories: grocery
  • Headers
    • Authorization: Bearer [YOUR API KEY]
  • Save Response Body To: Attribute 

image.png

Concurrent Requests should be set to 1.

image.png

For more information on using the HTTPCaller and using HTTP Requests, see the HTTP Requests with the HTTPCaller article. 

4. Create a Custom Transformer
Right-click on the HTTPCaller or click on the HTTPCaller and press CTRL+T to create a new Custom Transformer with the HTTPCaller inside. Name the transformer what you would like to call it.

image.png

5. Inputs and Outputs
First, rename your input and output ports to “Input” and “Output” ports to simplify your workspace. Right-click anywhere on the canvas (inside the custom transformer) and create a new input, name it “loop_input” and make sure to have it unpublished by unchecking the checkbox “Publish”. This is so that it does not appear in the main canvas as an input port.

image.png

Right-click again on the canvas and select “Insert Transformer Loop”, then select “loop_input” from the dropdown of “Loop to”. 

image.png

6. Attribute Management
Add an AttributeCreator to the canvas and put this between the input port and the HTTPCaller. Create an attribute called _copynum and set the value as 0. This will be used to count the pagination. 

Add an AttributeManager to the canvas and connect the input port to the “loop_input” and connect the input port to the HTTPCaller. Expose the attribute that you created and open the Text Editor to change the value of this attribute. Input this into the Text Editor: 

@add(@CurrentAttribute(),1)

You can find math functions in this documentation where it explains what each of them do.
Your workspace should look like this:

image.png

7. HTTPCaller Pagination
Open the HTTPCaller and add “offset” to the name and the value would be:

@mult(@Value(_copynum),20)

This is because each page of the Yelp Fusion API call is 20 results long and we want the next 20 results after each call. 

image.png

8. Add a Tester
Add a Tester to the canvas, connect it to the HTTPCaller, and open it to change the logic:

  • Logic: NOT
  • Left Value: _response_body
  • Operator: Contains
  • Right Value: “alias”: 
  • Comparison Mode: Case Insensitive

The right value can be anything in the response body because the last response will return with only the total entries.

image.png

Connect the “Failed” output port to the “Loop to loop_input” and connect both “Passed” and “Failed” output ports to the “Output”. 

Final custom transformer would look like this:
image.png

 

Data Attribution

The data used here originates from data made available by Yelp Fusion App and Safe Software’s FME Server. Yelp Fusion is an API App that is available to use without cost under 5,000 requests.

 

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.