PDFInfo Explained: Retrieve Author, Dates, and Page Count Fast

PDFInfo: Quick Guide to Extracting Metadata from PDFs

PDF metadata—title, author, creation date, page count, and more—helps you organize, audit, and automate workflows that involve PDF files. pdfinfo, a lightweight command-line tool from the Poppler (or Xpdf) suite, quickly exposes this metadata so you can inspect PDFs without opening them in a GUI. This guide covers installation, common commands, useful flags, output parsing, and automation tips.

What pdfinfo shows

pdfinfo reports common metadata and file details such as:

  • Title, Author, Subject, Keywords
  • Creator (software that generated the PDF)
  • Producer (PDF library that produced the file)
  • CreationDate / ModDate
  • Tagged (whether the PDF includes accessibility tagging)
  • Encrypted (encryption status)
  • Page count, Page size, and PDF version
  • File size and linearization / fast web view indicators

Install pdfinfo

  • macOS: brew install poppler
  • Debian/Ubuntu: sudo apt-get install poppler-utils
  • Fedora: sudo dnf install poppler-utils
  • Windows: install Poppler binaries (add to PATH) or use WSL and follow Linux steps.

Basic usage

Run pdfinfo against a PDF file:

Code

pdfinfo file.pdf

Typical output is a line-by-line list of metadata fields and values.

Useful flags

  • -meta
    Prints XML metadata block (XMP) if present: pdfinfo -meta file.pdf
  • -box
    Shows page box sizes (MediaBox, CropBox, BleedBox, TrimBox, ArtBox): pdfinfo -box file.pdf
  • -f -l
    Limit analysis to pages n–m (useful for very large files): pdfinfo -f 1 -l 5 file.pdf
  • -rawdates
    Show raw date strings from the PDF (no post-processing): pdfinfo -rawdates file.pdf
  • -enc
    Include encryption details (if any).

Check pdfinfo -help for the full list on your system.

Parsing pdfinfo output in scripts

pdfinfo output is plain text; use standard CLI tools to extract fields.

  • Extract page count (bash):

Code

pages=\((pdfinfo file.pdf | awk '/^Pages:/ {print \)2}‘)
  • Get title or fallback to filename:

Code

title=\((pdfinfo file.pdf | sed -n 's/^Title:[]*//p') </span>[ -z "\)title” ] && title=”\((basename file.pdf)" </code></div></div></pre> <ul> <li>Extract creation date and convert to ISO (example using GNU date):</li> </ul> <pre><div class="XG2rBS5V967VhGTCEN1k"><div class="nHykNMmtaaTJMjgzStID"><div class="HsT0RHFbNELC00WicOi8"><i><svg width="16" height="16" fill="none" xmlns="http://www.w3.org/2000/svg"><path fill="currentColor" fill-rule="evenodd" clip-rule="evenodd" d="M15.434 7.51c.137.137.212.311.212.49a.694.694 0 0 1-.212.5l-3.54 3.5a.893.893 0 0 1-.277.18 1.024 1.024 0 0 1-.684.038.945.945 0 0 1-.302-.148.787.787 0 0 1-.213-.234.652.652 0 0 1-.045-.58.74.74 0 0 1 .175-.256l3.045-3-3.045-3a.69.69 0 0 1-.22-.55.723.723 0 0 1 .303-.52 1 1 0 0 1 .648-.186.962.962 0 0 1 .614.256l3.541 3.51Zm-12.281 0A.695.695 0 0 0 2.94 8a.694.694 0 0 0 .213.5l3.54 3.5a.893.893 0 0 0 .277.18 1.024 1.024 0 0 0 .684.038.945.945 0 0 0 .302-.148.788.788 0 0 0 .213-.234.651.651 0 0 0 .045-.58.74.74 0 0 0-.175-.256L4.994 8l3.045-3a.69.69 0 0 0 .22-.55.723.723 0 0 0-.303-.52 1 1 0 0 0-.648-.186.962.962 0 0 0-.615.256l-3.54 3.51Z"></path></svg></i><p class="li3asHIMe05JPmtJCytG wZ4JdaHxSAhGy1HoNVja cPy9QU4brI7VQXFNPEvF">Code</p></div><div class="CF2lgtGWtYUYmTULoX44"><button type="button" class="st68fcLUUT0dNcuLLB2_ ffON2NH02oMAcqyoh2UU MQCbz04ET5EljRmK3YpQ CPXAhl7VTkj2dHDyAYAf" data-copycode="true" role="button" aria-label="Copy Code"><svg viewBox="0 0 16 16" fill="none" xmlns="http://www.w3.org/2000/svg"><path fill="currentColor" fill-rule="evenodd" clip-rule="evenodd" d="M9.975 1h.09a3.2 3.2 0 0 1 3.202 3.201v1.924a.754.754 0 0 1-.017.16l1.23 1.353A2 2 0 0 1 15 8.983V14a2 2 0 0 1-2 2H8a2 2 0 0 1-1.733-1H4.183a3.201 3.201 0 0 1-3.2-3.201V4.201a3.2 3.2 0 0 1 3.04-3.197A1.25 1.25 0 0 1 5.25 0h3.5c.604 0 1.109.43 1.225 1ZM4.249 2.5h-.066a1.7 1.7 0 0 0-1.7 1.701v7.598c0 .94.761 1.701 1.7 1.701H6V7a2 2 0 0 1 2-2h3.197c.195 0 .387.028.57.083v-.882A1.7 1.7 0 0 0 10.066 2.5H9.75c-.228.304-.591.5-1 .5h-3.5c-.41 0-.772-.196-1-.5ZM5 1.75v-.5A.25.25 0 0 1 5.25 1h3.5a.25.25 0 0 1 .25.25v.5a.25.25 0 0 1-.25.25h-3.5A.25.25 0 0 1 5 1.75ZM7.5 7a.5.5 0 0 1 .5-.5h3V9a1 1 0 0 0 1 1h1.5v4a.5.5 0 0 1-.5.5H8a.5.5 0 0 1-.5-.5V7Zm6 2v-.017a.5.5 0 0 0-.13-.336L12 7.14V9h1.5Z"></path></svg>Copy Code</button><button type="button" class="st68fcLUUT0dNcuLLB2_ WtfzoAXPoZC2mMqcexgL ffON2NH02oMAcqyoh2UU MQCbz04ET5EljRmK3YpQ GnLX_jUB3Jn3idluie7R"><svg fill="none" viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"><path fill="currentColor" fill-rule="evenodd" d="M20.618 4.214a1 1 0 0 1 .168 1.404l-11 14a1 1 0 0 1-1.554.022l-5-6a1 1 0 0 1 1.536-1.28l4.21 5.05L19.213 4.382a1 1 0 0 1 1.404-.168Z" clip-rule="evenodd"></path></svg>Copied</button></div></div><div class="mtDfw7oSa1WexjXyzs9y" style="color: var(--sds-color-text-01); font-family: var(--sds-font-family-monospace); direction: ltr; text-align: left; white-space: pre; word-spacing: normal; word-break: normal; font-size: var(--sds-font-size-label); line-height: 1.2em; tab-size: 4; hyphens: none; padding: var(--sds-space-x02, 8px) var(--sds-space-x04, 16px) var(--sds-space-x04, 16px); margin: 0px; overflow: auto; border: none; background: transparent;"><code class="language-text" style="color: rgb(57, 58, 52); font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; direction: ltr; text-align: left; white-space: pre; word-spacing: normal; word-break: normal; font-size: 0.9em; line-height: 1.2em; tab-size: 4; hyphens: none;"><span>raw=\)(pdfinfo -rawdates file.pdf | sed -n ’s/^CreationDate:[ ]*//p’) # raw might look like D:20220303120000-05’00’

convert with custom parsing or use a library in higher-level languages

For robust parsing, prefer using a scripting language (Python, Node.js) and a PDF library that reads XMP or Info dictionaries directly.

Examples in Python

Using PyPDF2 to read basic metadata:

python

from PyPDF2 import PdfReader reader = PdfReader(“file.pdf”) info = reader.metadata print(info.title, info.author, info.get(”/CreationDate”))

Note: PyPDF2 reads the document info dictionary; XMP metadata may require a different parser (e.g., pypdfium2 or direct XML parsing).

Automation tips

  • Batch-check PDFs for missing metadata:
    • Loop through files, call pdfinfo, and log missing Title/Author fields.
  • Integrate into CI: fail builds if PDFs lack required metadata or are encrypted.
  • Combine with exiftool or custom scripts to update metadata (some tools allow editing; pdfinfo is read-only).
  • Normalize dates and author names using a mapping file in scripts.

Troubleshooting

  • No metadata shown: PDF may lack an Info dictionary or XMP block; consider extracting XMP via pdfinfo -meta or using a PDF library.
  • Dates look odd: PDF dates use the “D:YYYYMMDDHHmmSSOHH’mm’” format; use parsing utilities or libraries to normalize.
  • Encrypted PDFs: pdfinfo will flag encryption; you may need to decrypt (if permitted) before extracting metadata.

Security and permissions

  • pdfinfo reads files locally—ensure you have permission to access the files.
  • Do not run pdfinfo on untrusted PDFs in an environment where opening the file (or parsing) could execute unvetted code; run in a sandbox if content is suspicious.

Quick checklist

  • Install poppler/poppler-utils.
  • Run pdfinfo file.pdf for a quick view.
  • Use -meta for XMP, -box for page boxes, -rawdates for raw timestamps.
  • Script parsing with awk/sed or use PyPDF2 for programmatic access.
  • Automate checks and integrate into CI for consistency.

Use pdfinfo whenever you need a fast, scriptable way to inspect PDF metadata without opening a viewer.

Comments

Leave a Reply