Automating Amazon S3 Workflows with FME Flow

Files

s3-automation-partb.fmw
- 90 KB
- Download
vancouverneighborhoods.kml
- 300 KB
- Download
S3-Automation-Part2.fmw
- 100 KB
- Download
s3-automation-partb.fmw
- 90 KB
- Download

Introduction

This article is separated into two parts. In the first part, we will set up an automation to monitor a folder on FME Flow (formerly FME Server) and upload any data that is created or modified in that folder onto an Amazon S3 bucket. The second part will set up an automation that will monitor an Amazon S3 bucket for when a KML file is created or modified and run a workspace to download any KML(s) onto a specified output folder on FME Flow.

Requirements

Amazon Web Services web console access
AWS Access Key ID
AWS Secret Access Key

Step-by-step Instructions

Part 1: Upload Files to Amazon S3

1. Create a New Automation
Log into FME Flow. Once logged in, expand Automations on the left-side menu then click Create Automations. You will be presented with a canvas with one un-configured Trigger.

2. Configure the Trigger to Monitor a Directory
On the Automations canvas, double-click the Trigger component to open the parameters. Under Trigger, select Resource or Network Directory (updated). Click the ellipsis button to the right of Directory to Watch and create a new folder in the Data folder called Amazon S3, then inside that folder create another folder called toUpload. The final path should be Resources/Data/Amazon S3/toUpload.

Since we are monitoring for any incoming data and making a copy available in an S3 bucket, select Yes for both the Watch Subdirectories and Watch Folders parameters. For Events to Watch, watch only CREATE and MODIFY events, but not DELETE. We want to make available in our S3 bucket all of the most up-to-date data products but don’t need to exactly mirror the contents of the monitored resource. For ease of testing the automation, set Poll Interval to 1 minute.

Click Validate and (if valid) Apply. If the Trigger fails to validate, it is likely the path specified under Directory to Watch is incorrect; re-enter the parameter using the ellipsis button.

3. Add an External Action
Next, add an External Action and connect it to the Resource and Network Directory Trigger component.

In the parameters, set the Action to Amazon S3 Bucket (upload to). Now, to be able to upload files to Amazon S3 from FME Flow, you must have been granted an Amazon Web Services Access Key ID and AWS Secret Access Key from your AWS account administrator. Enter the name of the S3 bucket you wish to upload in the Bucket parameter, then paste your AWS Access Key ID and AWS Secret Access Key into their respective fields. Leave the Region, Encryption, and Permissions parameters at default (Select Choice, None, and Private, respectively).

There are more options for this External Action, so use the scroll bar to expose them. Leave Enable File Versioning set to Default. This will cause files uploaded to the S3 bucket from FME Flow to behave the same way as any other objects in the bucket.
Under Source Path, click the drop-down arrow and choose Directory > File Path to indicate that the file that originally triggered the automation is the one to be uploaded to S3.

If you wish to upload the file to the root of the S3 bucket, leave Destination Path (optional) blank. For this tutorial though, we need to create a folder. In the Destination Path, enter: /AutomationsTutorial/fromFMEFlow/
Amazon S3 supports the creation of virtual folders in a bucket, and you may use this parameter to upload files under a specific folder.

Click Validate and (if valid) Apply. If the Trigger fails to validate, check your bucket name, paths, and Access Keys carefully.

4. Start the Automation
Click Menu above the Automations canvas, then Save As. Choose a name and add some descriptive tags (optional), then click OK. Then click Start Automation in the upper right corner.

5. Test the Automation
Under Resources navigate to the folder your automation is monitoring (Resources/Data/Amazon S3/toUpload), then click Upload > Files and navigate to any file you wish to upload. Vancouverneighborhoods.kml is available for download from the Files section of this article if you need sample data.

Wait for at least one polling interval (1 min), then go to Automations > Manage Automation, check the box next to the Automation you just built, and select Actions > View Log File.

As no workspaces were run by this automation, there will be no triggered jobs, but the automation log file will include details of what files and file actions triggered a workflow, and where they were stored in the S3 bucket.

The file upload was logged by FME Flow, but we can always verify its success by logging into Amazon S3 and checking the folder path (/AutomationsTutorial/fromFMEFlow/) we specified for the file uploaded to FME Flow earlier.

Well done! Your automation is now prepared to upload files to the Amazon S3 bucket you specified whenever a file is created or modified in the folder FME Flow is monitoring. Next, we’ll monitor an S3 bucket, and Trigger a workflow in FME Flow when files arrive in that bucket.

Part 2: Run a Workspace When Data Arrives in an Amazon S3 Bucket

In this exercise, we will imagine that users are uploading KML files of features digitized in Google Earth to an Amazon S3 bucket, and that we want to monitor that bucket for new KML uploads, convert the KMLs to GML, and reproject them for use with other project data.

1. Configure and Publish a Workspace to FME Flow
Download S3-Automation-Part2.fmw from the Files section at the top of this article. This workspace contains the KML to GML workflow. Open the workspace in FME Workbench. It may take a moment to load as FME will need to download the Amazon S3Connector package from FME Hub. Once it is open, you will need to populate the S3Connector transformer with your S3 connection details.

In the Navigator under the User Parameters set the Amazon S3 Web Connection to your pre-configured connection (if you have one) or define a new AWS S3 connection using the same credentials used in the FME Flow automation. Leave the other User Parameters as they are. These will be set in the automation using information that will be parsed as keys from the S3 Bucket that is being watched.

When the workspace is configured, publish it to FME Flow under the Automations Exercises repository (create this if it doesn’t exist). Upload your S3 connection information with the workspace, and register it with the Job Submitter service. You may also be prompted to upload a package with the S3Connector transformer; do so if prompted.

Save the workspace, then move back to FME Flow.

2. Create a New Automation to Monitor S3 Bucket
Back in FME Flow, navigate to Automations > Create Automation to create a new automation.

On the Automations canvas, double-click the Trigger component. Under Trigger, select Amazon S3 Bucket (updated). Specify your bucket name, then paste in your AWS Access Key ID and AWS Secret Access Key. Under Path to Watch, enter AutomationsTutorial/fromFMEFlow, which was the folder structure created in Part 1. Choose No under Watch Subdirectories, and remove DELETE from Events to Watch for. Set a Poll Interval of 1 minute.

Click Validate and (if valid) Apply. If the Trigger fails to validate, check the S3 folder path you are watching and try re-pasting your Access Keys, being careful not to highlight any leading or trailing whitespace.

3. Configure a Filter
Add an Action after the Amazon S3 Bucket Trigger, and select Filter messages from the list. Click the drop-down arrow for Key and select File Path. This will read in the file path whenever the Trigger registers a file CREATE or UPDATE in the S3 bucket. Under Contains String, enter .kml (do not use wildcards). We are monitoring this S3 bucket folder for features digitized from Google Earth, and we will need to do some further processing on them to make them compatible with the rest of our project.

4. Configure a Run Workspace Action
Add an Action downstream of the success port on the Filter. Choose a Run a Workspace as the Action, and select the S3-Automation-Part2.fmw workspace you published to FME Flow earlier.
Populate the published parameters. Click on the drop-down arrow beside S3 Bucket and select Amazon S3 > Bucket. Next, click on the drop-down arrow beside Path to Download and select Amazon S3 > File Path.

Click the ellipsis button under Folder for Output GML and navigate to (or create) the Resource folder on FME Flow you wish to house the output from this workspace. The workspace will automatically generate output file names based on input name; simply specify the directory: $(FME_SHAREDRESOURCE_DATA)/Amazon S3/GML From S3

5. Start the Automation
Click Menu above the Automations canvas, then Save As. Choose a name and add some descriptive tags (optional), then click OK. Finally, click Start Automation in the upper right.

6. Test the Automation
In another browser tab, open the Amazon S3 web interface, navigate to the bucket you are monitoring, and upload vancouverneighborhoods.kml from the Files section of this article (or another KML of your choice).

Wait a minute for the automation to trigger, then navigate back to your running automation. Select Menu > View Log File. You’ll see some lines related to the automation initializing, then “sending CREATE event for path: <S3 filepath>”, then details about your Filter and Run Workspace jobs.

To jump right into the job logs themselves and omit the automation’s coordinating activities, instead choose Menu > View Triggered Jobs from this automation’s canvas.

To confirm the workspace behaved as expected, you can also navigate to the Resource path specified for output GML to be written to in the workspace parameters and make sure both the GML and XSD files are there, with the same filename as the original KML that was uploaded to the S3 Bucket.

Conclusion

Nicely done! You are now prepared to automate workflows that upload files to a folder in an Amazon S3 bucket or monitor and respond to changes in an S3 bucket.

Data Attribution

The data used here originates from open data made available by the City of Vancouver , British Columbia. It contains information licensed under the Open Government License - Vancouver.

Search