Cursor-Based API Pagination

Matt Meeboer
Matt Meeboer
  • Updated

FME Version

Introduction

The HTTPCaller in FME Workbench lets you make REST API calls to get data from external applications. Often, API developers will limit the number of records returned in a response to make it easier to handle. A user must then make subsequent requests to get all records. This is called pagination. There are several different kinds of pagination, but this article will discuss cursor-based pagination.

APIs that use cursor-based pagination will return an ID in the response that points to the next item in the dataset, or ‘next cursor’. A user provides the next cursor in the subsequent request to get the next batch of records until no next cursor is returned by the API, meaning there are no further records to fetch. A user can also provide the number of records, or page size, they want to retrieve with each fetch. The page size is limited by the API developer to prevent the response from being too large and defeating the purpose of pagination.

Cursor pagination flow


Cursor-based pagination is more scalable than offset pagination, making it ideal for larger datasets. It simply uses the next cursor ID to query and get the next set of records. Offset pagination returns the total records, total number of pages, and current page with each request. While this information can be valuable, it requires far more processing.
In the following tutorial, we’ll look at an example of calling an API that uses cursor-based pagination, Slack’s conversation history API, using FME Workbench to recursively get all messages. This logic we’ll walk through has been published in the SlackMessageReader custom transformer to the FME Hub. You can use the transformer as a guide to create your own API calls that use pagination. It also contains more parameters and functionality than we will delve into in this guide.
 

Requirements

  • A Slack bot app that has been added to the Slack workplace and channel from which you will fetch messages
  • The Slack bot has been granted the necessary permissions and scopes to use the API
  • The bot token from your Slack app

 

Step-by-Step Instructions

Part 1: Create the API Call

1. Create Custom Transformer

In FME Workbench, create a new workspace. Add a Creator. 

Pagination is one of the few use cases that requires looping in FME, which can only be done within a custom transformer. Right-click the canvas and select Create Custom Transformer….
Create custom transformer

Give the custom transformer a name. We’ll use SlackMessageReaderExample here, because there is already a SlackMessageReader in the FME Hub.
SlackMessageReader

There are now two tabs in FME Workbench: Main for our main workspace and SlackMessageReaderExample for our custom transformer.
Workspace tabs


2. Add User Parameters

Let’s add some parameters to the customer transformer that will be passed from the Main workspace. From the Navigator, Right-click User Parameters and then select Manage User Parameters….
User parameter configuration

Click the ‘+’ button and add a text parameter.
Add user parameter

Enter token as the Parameter Identifier and Token as the Prompt.
Token parameter

Add a second text parameter. Enter channelID as the Parameter Identify and Slack Channel ID as the Prompt. Click OK to save and close the User Parameters.  
Parameters


3. Configure Slack API Call

Add an HTTPCaller to the SlackMessageReaderExample canvas and connect it to the input. Configure the HTTPCaller as follows:

  • Request URL: https://slack.com/api/conversations.history
  • HTTP Method: GET
  • Query String Parameters:
    • channel: channelID parameter
    • limit: 200 (the maximum page size recommended by Slack)
    • cursor: do not enter a value, we will set this later
  • Headers:
    • Authorization: Type the word Bearer, then a space, and then the token parameter $(token)
  • Response Boyd Attribute: _response_body
  • Maximum Number of Concurrent HTTP Requests: 1 (must be 1 when looping)

HTTP Caller

Click OK

Note: there are additional parameters that could be added to give more control over the results returned from the Slack API, but for the purpose of this example, we will just use these. For a more detailed example, see the SlackMessageReader in the FME Hub.


4. Test the API Call

Temporarily connect the HTTPCaller to the custom transformer output.
Workspace image

Back on the Main workspace, connect the SlackMessageReaderExample transformer to the creator and configure its parameters as follows:

  • Token: the bot token (starting with xoxb-) from your Slack app configuration
  • Slack Channel ID: the channel ID of the Slack channel that your bot has been added to

Custom Parameters

Run the workspace and inspect the SlackMessageReaderExample output in Visual Preview. There is one feature with a _response_body attribute containing the first 200 Slack messages in JSON format (since we set the API limit to 200). 
Visual Preview

Let’s inspect the structure of the JSON response a little further. Click the ellipses [...] next to the _response_body value to view the full response. 

Format as JSON.
View JSON format

We see that most of the JSON response consists of an array of the first 200 messages in the Slack channel.
JSON Preview

Scroll to the bottom of the response. There is another object called ‘response_metadata’ that contains the next_cursor ID. This is the value we need to pass to the API to get the next 200 messages.
Next_cursor JSON
Now, we need to create a loop to get all messages from Slack using cursor-based pagination.
 

Part 2: Create a Cursor-Based Pagination Loop

1. Separate the JSON Keys in the Response

Go back to the tab for the SlackMessageReaderExample custom transformer tab in FME Workbench.

Add a JSONFragmeter to the canvas and connect it to the HTTPCaller output. Configure the parameters:

  • JSON Attribute: _response_body
  • Flatten Query Results into Attributes: Yes
  • Recursively Flatten Objects/Arrays: No

Other fields can remain with their defaults.
JSON Fragmenter

Click OK. Run the workspace. Inspect the results in Visual Preview. There are seven JSON keys that have been output, but we only care about two: messages and response_metadata
Fragmented JSON keys


2. Filter the JSON Keys  

Add a TestFilter to the canvas and connect it to the JSONFragmenter’s output. Add two conditions to route the JSON keys to different outputs:

  • If @Value(json_index) = messages
    • Output Port: messages
  • If @Value(json_index) = response_metadata
    • Output Port: response_metadata

Test filter

Click OK. Your workspace should look like the image below. Run the workspace.
Workspace

Inspect the TestFilter’s output. There is one feature now for all the messages and one for the response_metadata that contains the next_cursor. It is this second feature we will use to drive the pagination loop. The remaining features are not needed and are sent to the unfiltered port.
Test filter

Let’s clean up the workspace. Add a bookmark around the HTTPCaller, JSONFragment and TestFilter and call it ‘Slack API call’.
Slack API call bookmark

3. Parse the Response Metadata JSON

Connect a JSONFlattener to the response_metadata output port. Set the JSON Document to _response_body and, in the Attributes to Expose, type next_cursor. Other fields can remain with their defaults. Click OK.
JSON flattener

Run the workspace again and inspect the output of the JSONFragmenter. We now have a next_cursor attribute on the response_metadata feature to pass to the API.
Parsed next cursor


4. Create the Pagination Loop 

Right-click the canvas above the HTTPCaller and select Insert Transformer Input.
Insert transformer input

Name the new transformer input Pagination and uncheck Publish. Click OK. Connect the Pagination input transformer to the HTTPCaller input.
Pagination input

Right-click the canvas again and insert a Transformer Loop. From the Loop to drop-down list, choose the Pagination input that was just created.
Pagination

Connect the Loop to Pagination transformer loop to the JSONFlattener output. The workspace should now look like this. 
Workspace

Now that the loop has been setup and the feature with the cached next_cursor attribute is connected to the HTTPCaller, open the HTTPCaller parameters again and set the cursor parameter to the next_cursor attribute that was exposed.
Cursor parameter

5. Test the Loop

We’re now ready to test the pagination loop! 

The Slack API will return the response_metadata JSON key containing the next_cursor as long as there are additional messages to fetch. Once there are no more, no response_metadata key will be returned from Slack, and thus no feature will be output from the TestFilter’s response_metadata port, ending the loop. Each time the API is called, we will get 200 messages (except on the last call) and output them all as a single feature, for now.

Run the workspace. Depending on how many messages are in your Slack channel, it may take a couple minutes to run. Once the workspace has finished running, inspect the SlackMessgeReaderExample output. Each feature contains a JSON array of 200 messages (except the last call). Take a look at the _response_body for some of the features; they should all contain a different set of messages. The count of features output is the total number of times that the Slack API was called. Our cursor-based pagination loop is working!
Message output

 

Part 3: Parse the JSON into Features

1. Parse the Message JSON

Go back to the SlackMessageReaderExample canvas. Add a JSONFragmenter and connect it to the messages output port of the TestFilter. Choose _response_body as the JSON Attribute. Set Flatten Query Result into Attribute to Yes and Recursively Flatten Objects/Arrays to No
JSON Fragmenter

Note: the reason we aren’t recursively flattening, in this case, is because Slack sends back a lot of data for each message, like attachments and reactions. We’ll leave those as JSON objects for now, and just extract the key message attributes.

Run the workspace. Each message in each of the API calls is now output as a feature. Our 56 features have become over 11,000.
Testing the reader

2. Clean up Attributes and Workspace

Add an AttributeManager to the canvas and connect it to the JSONFragmenter output. Remove all currently exposed attributes and expose the message text, user and team instead. 
Clean up attributes

Finally, let’s clean up the workspace a bit. Add a bookmark around the JSONFlattener and name it ‘Get the next cursor’. Add another bookmark around the second JSONFragmenter and AttributeManager and name it ‘Fragment Slack Messages’. The workspace should now look like this.
Cleanup workspace

Run the workspace. Inspect the SlackMessageReaderExample output in Visual Preview. The Slack message text, user and team are all displayed. Additional attributes can be exposed as needed.
Final output

This approach can be used for other API calls you need to make that require cursor-based pagination.
 

Additional Resources

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.