FME Version
Files
Introduction
In this tutorial, we'll process a single CSV file and perform data validation and cleanup before loading it to Tableau.
Video
Step-by-Step Instructions
Part 1: Simple Translation
We will start by creating a simple translation, look at the results in Tableau, then come back to FME to do the data validation and cleanup.
1. Create new workspace.
Start FME Workbench and select the New option under Create Workspace.
2. Set up CSV Reader.
The first step is to read the CSV file. Drag the business license CSV file from the file explorer onto the blank canvas. Notice, that FME has already filled in the reader format and dataset. Alternatively, click Add Reader.
Reader Format: | CSV (Comma Separated Value) |
Reader Dataset: | ...\business_licenses.csv |
Coord. System: | LL84 |
Add a CSV Reader, add the business_licenses.csv, set the coord system, then click on Parameters...
Click on the Parameters button. The Database Parameter allows us to choose different naming schemes for the layers or feature types that end up on the canvas. Make sure it is set to "Feature Type from File Names".
If you are using FME 2020 or newer, confirm that the Attribute Definition has x and y, set to x_coord and y_coord, respectively. By default, FME will map X/Y or Latitude/Longitude attributes to the x_coord and y_coord data types. If you are using FME 2019.2 or older, set the attribute definition to Manual and manually set the data type accordingly. This will tell FME to display each row in the CSV as a point - for more information, see Using the Reader parameters in the Converting from CSV Data article.
If necessary, change the Attribute Definition to Manual and update the Latitude and Longitude
Click Ok, and then click Ok again to add the Reader to the canvas. This tells FME to create a point for each record with a latitude and longitude value.
When the CSV source data is added to the canvas, click on the arrow to see the full attribute list.
3. Confirm points are created as CSV file is read
Once the feature type has been added to the canvas, select the business_licenses feature type and click the run to this button. This will run the translation to the selected object and generate a feature cache which can be inspected in the visual preview window or the Data Inspector by clicking on the green feature cache icon. We can see that we do, in fact, have points.
Business Licenses with point data in the Visual Preview Window
4. Add Tableau writer
From the Writers menu, select Add Writer or using the Quick Add Menu, start typing Tableau, and select the Tableau Hyper format. For Dataset, specify a directory in which to write the .hyper file. Click OK to add the Tableau writer to the workspace. Then connect the Reader to the Writer.
5. Modify writer feature type properties
Open the properties of the Writer. We can now specify the name of the table we would like to write to. Call it BusinessLicenses.
Change the Writer Feature Type Properties by changing the Table Name
6. Run workspace
You may have noticed a number of blue warnings go by in the log file. These are related to problems with the CSV data, which we will soon fix with FME.
7. (Optional) Examine output Hyper file in Tableau
In Tableau, we can see that all of the columns correspond to the columns we read in from the CSV file with FME. Notice that the data types of the columns have all been set automatically by FME based on the best guess at what kind of data is inside each one. We can see that the LicenseRSN has correctly been set to a Number type and BusinessName is a String.
View the data in Tableau to ensure it has set the correct data types
Part 2: Data Validation and Cleanup
Let's take a closer look at the business license data.
Here is a list of tasks we will accomplish with FME. We have already completed the first one in our current workspace. We will modify the workspace to perform the rest of the tasks.
- Read Excel file and create points
- Filter out records that don't have latitude/longitude values
- Ensure PostalCode has a value; extract first 3 characters
- Set up conditional value to handle empty FeePaid values
- Create new BusinessDisplayName field; populate from existing fields
1. Filter out records that don't have latitude/longitude values
You may recall that some of the records in the CSV file did not have latitude and longitude values. Since FME would not have been able to create points for them, we want to filter them out; a GeometryFilter will help us accomplish this.
Click on the CSV Reader Feature Type and then start typing "GeometryFilter" click enter. Click enter again to enter the Transformer Parameters. Select "Point" for the "Geometry Types to Filter". Click Ok.
2. Run workspace with feature caching enabled
Run the workspace to generate feature caches on the GeometryFilter. If it's not already enabled, you can turn feature caching on in the Run submenu > Enable Feature Caching
We can confirm that 731, out of our 10,000 records did not have latitude or longitude values.
Workspace with feature caching turned on to view <Unfiltered> count
3. Write records with no latitude/longitude to "Data Cleanup" Excel file
Before we continue processing the points, let's write these records out to a "Data Cleanup" file, so that they may be fixed. We will write them out to Excel.
From the Writers menu, add an Excel writer. Write it to the output folder and call the file DataCleanup.xlsx . Connect the new writer feature type up to the unfiltered port of the GeometryFilter, then open up the Writer's properties and change the sheet name to "Missing Latitude Longitude".
Change the name of the sheet to identify what we are recording
Now that we have dealt with the missing values, let's continue processing the points.
4. Ensure PostalCode has values: AttributeValidator
Place an AttributeValidator on the canvas and connect it to the GeometryFilter. Open its properties and select PostalCode for the Attribute to Validate. The validation rule is that PostalCode MUST have a value. Also, validate that some of the string fields have string values and that the numeric fields have numeric values.
Set the AttributeValidator to validate if PostalCode has a value and other attributes are the proper type
5. Write records that fail validation to new sheet in "Data Cleanup" Excel file
Again, before continuing to process the valid data, write the data that fails validation out to a different sheet in the "Data Cleanup" excel file already set up. Right-click on the canvas and select "Insert Writer Feature Type". Call the new sheet "Failed Validation". Connect it to the Failed port of the AttributeValidator.
6. Extract first 3 characters of PostalCode: SubStringExtractor
Now that we know that the "records output" from the AttributeValidator all have a value for Postal Code, let's extract the first 3 characters. We do this because Tableau uses the first 3 characters of the postal code to automatically map the areas.
Place a SubstringExtractor on the canvas, connect it to the Passed port of the AttributeValidator, and configure it to extract the first 3 characters from the PostalCodeAttribute. We will call the resulting attribute PostCodeTrimmed.
FME indexes the first value as 0 then counts from there. As you can see in the table below, if we only want the first 3 characters in the postal code, we would go from 0 to 2. So in the SubstringExtractor our Start Index = 0 and our End Index = 2.
V | 3 | W | <Space> | 1 | J | 8 |
0 | 1 | 2 | 3 | 4 | 5 | 6 |
Setup the SubstringExtractor to extract the first 3 values from PostalCode
7. Set up conditional attribute values for FeePaid: AttributeManager
The AttributeManager is a transformer that allows us to do many attribute manipulations, including setting up conditional values. Add the AttributeManager after the SubstringExtractor
Create a Conditional Value in the AttributeManager for FeePaid
If the FeePaid attribute is empty, set the value to 0, otherwise, leave it as is.
Create a new attribute called BusinessDisplayName. Set its value to BusinessTradeName, but only if BusinessTradeName has a value, otherwise set it to BusinessName.
If BusinessTradeName has an attribute Value, keep it BusinessTradeName otherwise name it BusinessName
There is also an attribute we don't need in our final output, let's remove _fme_validation_message.
In the AttributeManager, create 2 conditional values for FeePaid and BusinessDisplayName, also remove _fme_validation_message
Connect the AttributeManager output to the Writer feature type.
Your workspace should now look like the image above.
8. Modify writer feature type properties
The attributes we removed just now are still on the output schema and have turned red. The original attribute schema was a copy of the source schema. It has changed, as a result of our data transformation, but we may update it to reflect what we have done.
Open the properties of the destination feature type, go to the User Attributes tab, and click on Automatic for Attribute Definition. The attribute schema reflects the changes we have made.
Since we want to overwrite the Tableau file we initially wrote, open the properties again and change the Table Handling to Drop and Create.
9. Run workspace and confirm .hyper file exists
Let's run our final workspace! Click on the Run button. Confirm the .hyper file and DataCleanup.xlsx files were created using Windows Explorer.
10. (Optional) Examine .hyper file in Tableau
In the "Data Source" view, notice all of the records that we have imported from the CSV file. We can also see that the PostCodeTrimmed attribute, that we created within our workspace, is present along with the other fields imported from the CSV file.
Now that our data is imported into Tableau, we can begin to create Data Views to explore this data.
For example, we can create a simple map view to see the data points overlaid on a map. Create a new sheet then double-click on geometry to see the individual points. Finally, color them by status to get an overall view of which businesses currently have active licenses.
We could also make use of the postal code field to show our data, create a new sheet. Add PostalCodeTrimmed, from "Dimensions" and color them by the unique count of business licenses in the area.
In the next exercise, we will modify the workspace we just built to process multiple CSV files.
Comments
0 comments
Please sign in to leave a comment.