GitHub - lateshift/pdfscan: PDF security scanner for identifying potentially risky structures in PDF documents. · GitHub
Skip to content

lateshift/pdfscan

Folders and files

Repository files navigation

pdfscan

Technical PDF security scanner for identifying potentially risky structures in PDF documents.

"Since when is the internet about robbing people of their privacy?" — Fry

Overview

pdfscan parses a PDF with lopdf and reports security-relevant indicators such as:

  • JavaScript actions
  • automatic actions (/OpenAction, /AA)
  • AcroForm and XFA form content
  • embedded files and file attachments
  • launch actions
  • form submission / import actions
  • URI targets (strict mode)
  • rich media and multimedia-related structures
  • selected Type / Subtype based indicators

It produces:

  • a terminal report with severity tables
  • a per-finding detail section
  • AcroForm field/page inventory
  • optional JSON output for automation

Build

cargo build --release

Usage

cargo run -- <file.pdf>
cargo run -- <file.pdf> --strict
cargo run -- <file.pdf> --verbose
cargo run -- <file.pdf> --json

CLI flags

  • --strict — enable stricter checks, including /URI detection and additional file-related heuristics
  • --verbose — include extra diagnostic output, including extracted JavaScript payloads when available
  • --json — emit structured JSON instead of terminal tables

Output model

Each finding includes:

  • object reference
  • indicator key or derived type/subtype signal
  • severity
  • summary
  • resolved page numbers when available
  • detail notes
  • extracted JavaScript payloads when applicable
  • AcroForm field inventory for /AcroForm

Severity levels:

  • INFO
  • LOW
  • MEDIUM
  • HIGH
  • CRITICAL

Final verdicts:

  • CLEAN
  • SUSPICIOUS
  • HIGH RISK
  • CRITICAL

Notes

  • AcroForm page resolution is inferred from widget /P references and page /Annots membership.
  • Some malformed or highly non-standard PDFs may not expose enough structure for complete page attribution.
  • This tool is intended for triage and inspection, not full PDF sandboxing or exploit verification.

Dependencies

  • lopdf
  • clap
  • comfy-table
  • serde
  • serde_json

About

PDF security scanner for identifying potentially risky structures in PDF documents.

Resources

Stars

Watchers

Forks

Packages

Contributors

Languages