Skip to content

CAPESandbox/peepdf

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Linux Build Status Windows Build status Coverage Status

peepdf is a Python tool to explore PDF files in order to find out if the file can be harmful or not. The aim of this tool is to provide all the necessary components that a security researcher could need in a PDF analysis without using 3 or 4 tools to make all the tasks. With peepdf it's possible to see all the objects in the document showing the suspicious elements, supports all the most used filters and encodings, it can parse different versions of a file, object streams and encrypted files. With the installation of PyV8 and Pylibemu it provides Javascript and shellcode analysis wrappers too. Apart of this it's able to create new PDF files and to modify/obfuscate existent ones.

The main functionalities of peepdf are the following:

Installation: Here's what I did to make the extra libraries work

  • Note: This installs peepdf system wide.
  • This repo:
    git clone https://github.com/harakan/peepdf
    cd peepdf && sudo python3 setup.py install
  • Infamous PyV8 Library. This uses the new stpyv8 fork:
    git clone [email protected]:area1/stpyv8.git
    sudo bash install-ubuntu.sh
    sudo python3 setup.py install
  • Install the libemu:
    pip3 install pylibemu

... and hopefully that works!

Analysis:

  • Decodings: hexadecimal, octal, name objects
  • More used filters
  • References in objects and where an object is referenced
  • Strings search (including streams)
  • Physical structure (offsets)
  • Logical tree structure
  • Metadata
  • Modifications between versions (changelog)
  • Compressed objects (object streams)
  • Analysis and modification of Javascript (PyV8): unescape, replace, join
  • Shellcode analysis (Libemu python wrapper, pylibemu)
  • Variables (set command)
  • Extraction of old versions of the document
  • Easy extraction of objects, Javascript code, shellcodes (>, >>, $>, $>>)
  • Checking hashes on VirusTotal

Creation/Modification:

  • Basic PDF creation
  • Creation of PDF with Javascript executed wen the document is opened
  • Creation of object streams to compress objects
  • Embedded PDFs
  • Strings and names obfuscation
  • Malformed PDF output: without endobj, garbage in the header, bad header...
  • Filters modification
  • Objects modification

Execution modes:

  • Simple command line execution
  • Powerful interactive console (colorized or not)
  • Batch mode

TODO:

  • Embedded PDFs analysis
  • Improving automatic Javascript analysis
  • GUI

Related articles:

Included in:

You are free to contribute with feedback, bugs, patches, etc. Any help is welcome. Also, if you really enjoy using peepdf, you think it is worth it and you feel really generous today you can donate some bucks to the project ;) Thanks!

About

Powerful Python tool to analyze PDF documents

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%