Reading Simple PDF and Map Content

Liz Sanderson
Liz Sanderson
  • Updated

FME Version

  • FME 2023.0

Introduction

The following PDF Reading topics are covered: Part 1- Reading Simple PDF, covers how to read a simple PDF that contains common PDF content seen in reports. Part 2- Reading PDF Map Frame Content, covers how to inspect and extract the content of PDF map frames. Part 3- will cover features within a frame that can be described with page points and geospatial coordinates, and how to read these correctly. 
 

Step-by-step Instructions

Part 1: Reading Simple PDF

The Adobe Geospatial PDF Reader allows us to inspect and manipulate data from PDF. Let’s take a look at a simple PDF that contains a title, two maps, some text, and a table.

2023-03-22_15-02-34.png   
1. Add a Reader to Read the Adobe Geospatial PDF
Open FME Data Inspector and open a new dataset. Set the Format to Adobe Geospatial PDF, and the Dataset to WallaWalla.pdf, which can be downloaded from the Files section of this article.
 
2023-04-27_11-47-44.png

2. Set the Adobe Geospatial PDF Reader Parameters
As this PDF contains map frames - the main map and the inset map - we can retrieve geospatial coordinates and coordinate system information for the map features. However, for simplicity, let’s read everything in Page Points.
Open the Adobe Geospatial PDF reader Parameters and set Coordinate Units to Page points. Click OK twice to finish adding the reader.

2023-03-22_15-00-45.png

3. View the WallaWalla Dataset
Setting Coordinate Units to Page points will result in all features being read in page coordinates and will read feature types based on source PDF layers in Data Inspector:

2023-05-04_14-43-54.png
Viewing the PDF document in FME Data Inspector with Page points

 

Part 2: Reading PDF Map Frame Content

If the PDF contains map frames, chances are that geospatial coordinates and coordinate system information can be retrieved for map features. To achieve this, all spatial information should be read with the Geospatial (if possible) Coordinate Units setting.
In the same instance of FME Data Inspector, open a new Adobe Geospatial PDF dataset, reading in the same PDF as Part 1 (WallaWalla.pdf). Open Parameters and set Coordinate Units to Geospatial (if possible).

2023-04-06_14-45-09.png

View the WallaWalla Dataset
At first, the result might look very surprising in FME Data Inspector, for this example, as only the legend and other minor features are shown:

2023-05-04_14-49-40.png
Viewing the PDF document in FME Data Inspector with Geospatial (if possible) 

The map feature coordinates are in coordinate system LL-83 in the range of (-117, -120) degrees for latitude and (45, 48) degrees for longitude, while the page coordinates of other, non-map, features are in the range of (35, 740) for X and (430, 600) for Y. While displayed in the same FME Data Inspector view, map features are located far away from the rest of the PDF page content.
Chances are though, that not all map features will be displayed within the map frame. Some might be displayed in page points, together with the non-spatial features. 

However, sometimes some of the map features are read in page points even though the Adobe Geospatial PDF Coordinate Units parameter is set to Geospatial (if possible).
What’s wrong with the PDF reader? In the option Geospatial (if possible), if possible, is the key! The reader will attempt to read everything within the map frame with geospatial coordinates. However, if the map frame is not defined properly, some map features might happen to be outside of the frame - and therefore will be read in page points coordinates.
Let’s take a look at an example. This time, instead of reading WallaWalla.pdf, we will be reading WallaWalla_wrongWorldRectange.pdf. 

 

Part 3: Improvements for Reading the Map Frame

1. Add an Adobe Geospatial PDF Reader
Open a new  Adobe Geospatial PDF dataset in FME Data Inspector. Set the Dataset to WallaWalla_wrongWorldRectangle.pdf.
 
2. Set the Adobe Geospatial PDF Reader Parameters
In the Adobe Geospatial PDF Parameters and set Coordinate Units to Geospatial (if possible).
 
3. View the WallaWalla_wrongWorldRectangle Dataset
With Adobe Geospatial PDF Coordinate Units parameter set to Geospatial (if possible) in FME Data Inspector, we see some of the map features are displayed together with the non-map features, which are read in page points coordinates.

2023-04-18_11-47-40.png
Data view in FME Data Inspector reading in WallaWalla_wrongWorldRectangle.pdf with Geospatial (if possible) 

4. Set Adobe Geospatial PDF parameters to Read Map Frames
Let’s read everything in page points coordinates and add metadata Map Frames objects to inspect them later. Open a New Dataset, reading the WallaWalla_wrongWorldRectangle.pdf. This time, the Adobe Geospatial PDF parameter settings should be Coordinate Units as Page points, and Metadata Objects To Read  as Map Frames. Click OK twice to finish adding the data. 

2023-04-18_13-03-52.png

5. View the WallaWalla_wrongWorldRectangle Dataset with Map Frames
Selecting Map Frames for Metadata Objects To Read will result in adding a new pdf_frame_metadata feature type which contains the actual map frame:

2023-04-18_13-09-46.png
Viewing the PDF document in FME Data Inspector with map frame added

When reading with Geospatial (if possible) Coordinate Units, everything within the actual map frame will be read in geospatial coordinates while the rest of the PDF content will be read in page points. This will lead to “disappeared” or “misplaced” features.


If you have left FME Data Inspector open throughout this tutorial, switch between the View tabs at the top to see the differences between the different parameters and datasets. 


Data Attribution

The data used here originates from open data made available by the United States Census Bureau and the Berkeley School of Information .
 

 
 
 

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.