Tutorial: Getting Started with PDF Reading

Liz Sanderson
Liz Sanderson
  • Updated

FME Version

  • FME 2023.0

Introduction

FME’s Adobe Geospatial PDF Reader can extract much information from PDF documents. Imagery, rasters, vector data, text, spatial information and attributes can be read.

However, extracting information from a PDF document can be complex. One of the complications with PDF is that it is a document format. PDF document contents can vary greatly: you may have much information spread over many pages, or maps (basically an embedded picture), or maybe it contains a CAD drawing with many lines all over the place. So it’s hard to know how to read the PDF document before seeing it and knowing what you need to extract from it. Sometimes you may be concerned about where information is on the page of the PDF; other times, you may simply want to extract the content, and the location doesn’t matter.


ForTutorial.jpg

A PDF document in FME Data Inspector (left); the same PDF document in Adobe PDF Reader (right)

 

PDF Reader Options

The PDF Reader has many options for extracting data. Your PDF may contain:

  • Vector or Raster map data
  • Pages and pages of Text
  • Headers, Footers, Tables and more

The main choice is about whether to read the PDF as spatial or non-spatial (tabular). In other words, does the location of each feature on the page matter, or are you simply concerned about the page as a whole? Additionally, it is possible to select both Spatial and Non-Spatial (tabular) PDF reader options at the same time.

 

Spatial Parameters

Detailed information about the Spatial parameter options can be found in the documentation.

The Spatial section refers to the fact that the PDF document may contain information that has some sort of particular location on the page, which may translate to a specific location on the earth if there is a coordinate system or coordinate systems defined for the PDF document. PDF documents can contain multiple coordinate systems per page.

If you would like to display PDF data in the FME Data Inspector with a background map, it is necessary to set Coordinate Units to Geospatial (if possible). It’s only possible to display PDF data with a background map in FME Data Inspector if a coordinate system exists.

 

Non-Spatial Parameters

Detailed information about the Non-Spatial parameter options can be found in the documentation.

If your PDF document contains tabular data, it is possible to extract metadata, text and even rasterize the entire PDF page. The Non-Spatial Metadata parameter can be useful to extract information such as attributes, or information about the document, including creation date.

 

PDF Reading Articles

Reading Simple PDF and Map Content

This article covers the following: how to read a simple PDF that contains common PDF content seen in reports,  how to inspect and extract the content of PDF map frames, and it also covers features within a frame that can be described with page points and geospatial coordinates, and how to read these correctly. 

 

Creating PDF Cartographic Output
Learn how to read, style, and sort feature types, and then set up a page layout for output to a PDF file. 
 

Data Attribution

The data used here originates from OpenStreetMap, OpenStreetMap.

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.