Skip to content
This repository has been archived by the owner on Nov 10, 2022. It is now read-only.

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
cetfor committed Mar 6, 2018
0 parents commit 9d79d12
Show file tree
Hide file tree
Showing 24 changed files with 1,767 additions and 0 deletions.
2 changes: 2 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Auto detect text files and perform LF normalization
* text=auto
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
*.pyc
.DS_Store
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2017 Battelle Memorial Institute

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
62 changes: 62 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
____ __ __ _ _
| _ \ __ _ _ __ ___ _ __ | \/ | __ _ ___| |__ ___| |_ ___ ________
| |_) / _` | '_ \/ _ \ '__| | |\/| |/ _` |/ __| '_ \ / _ \ __/ _ \ /_______/
| __/ (_| | |_)| __/ | | | | | (_| | (__| | | | __/ || __/ \_______\
|_| \__,_| .__/\___|_| |_| |_|\__,_|\___|_| |_|\___|\__\___| /_______/
|_| @==|;;;;;;>

## About
Paper Machete (PM) orchestrates [Binary Ninja](https://binary.ninja) and [Grakn.ai](https://grakn.ai) to aid static binary analysis for the purpose of finding bugs in software. PM leverages the Binary Ninja MLIL SSA to extract semantic meaning about individual instructions, operations, register/variable state, and overall control flow.

PM migrates this data into Grakn - a knowledge graph that gives us the ability to define domain-specific ontologies for data and write powerful inference rules to form relationships between data we don't want to (or can't) explicitly store. [Heeh, how neat is that](https://www.youtube.com/watch?v=Hm3JodBR-vs)?

This project was released in conjunction with a DerbyCon 2017 talk titled "Aiding Static Analysis: Discovering Vulnerabilities in Binary Targets through Knowledge Graph Inferences." You can watch that talk [here](http://www.irongeek.com/i.php?page=videos/derbycon7/t116-aiding-static-analysis-discovering-vulnerabilities-in-binary-targets-through-knowledge-graph-inferences-john-toterhi).

Paper Machete's initial prototype and public codebase were developed by security researchers at the [Battelle Memorial Institute](https://www.battelle.org/government-offerings/national-security/cyber/mission-focused-tools). As this project matures, we hope that you will find it useful in your own research and consider contributing to the project.

## Why BNIL?
The BNIL suite of ILs is easy to work with, pleasantly verbose, and human-readable. At any point we can decide to leverage other levels and forms of the IL with little development effort on our part. When you add to that the ability to [lift multiple architectures](https://binary.ninja/faq/) and [write custom lifters](https://github.com/joshwatson/binaryninja-msp430), we have little reason not to use BNIL.

## Why Grakn?
Grakn's query language (Graql) is easy to learn and intuitive, which is extremely important in the early stages of this research while we're still hand-writing queries to model the patterns vulnerability researchers look for when performing static analysis.

The ability to write our own domain-specific ontologies lets us quickly experiment with new query ideas and ways of making our queries less complex. When we run into a case where we think "gee, if I just had access to the relationship between..." we can modify our ontology and inference rules to get that information.

While the end game for PM is to eliminate the need for human-written queries, the fact is we're starting from square one. Which means hand-jamming a lot queries to model the patterns human vulnerability researchers look for when bug hunting.

## Dependencies
Paper Machete requires [BinaryNinja v1.1](https://binary.ninja), [Grakn v1.0.0](https://github.com/graknlabs/grakn/releases/tag/v1.0.0), the [Grakn Python Driver](http://github.com/graknlabs/grakn-python), and the [Java JRE](http://www.oracle.com/technetwork/java/javase/downloads/index.html)


## Query Scripts
We've included some basic queries to get you started if you want to play around with PM. As you can imagine, there is no "silver bullet" query that will find all manifestations of a specific vulnerability class. Because of this, we've included versions for each CWE query. As we add new methods of finding the same CWE, we'll add scripts with incremented the version numbers to differentiate.

`cwe_120_v1.py` - Tests for use of unsafe 'gets()' function ([CWE-120](https://cwe.mitre.org/data/definitions/120.html))

`cwe_121_v1.py` - Tests for buffer overflows ([CWE-121](https://cwe.mitre.org/data/definitions/121.html))

`cwe_129_v1.py` - Tests for missing bounds checks ([CWE-129](https://cwe.mitre.org/data/definitions/129.html))

`cwe_134_v1.py` - Tests for format string vulnerabilities ([CWE-134](https://cwe.mitre.org/data/definitions/134.html))

`cwe_788_v1.py` - Tests for missing bounds check on array indexes ([CWE-788](https://cwe.mitre.org/data/definitions/788.html))

## How Do I Use It?

For basic use, run the `paper_machete.py` script and follow the prompts. For more advanced use, please [read the wiki](https://github.com/cetfor/PaperMachete/wiki).

Typically you'll start with option `[1]` and work your way down to option `[3]`. If you run into any issues with Grakn use option `[4]` to reset Grakn to a clean state and try again.
```
... banner ...
[1] Analyze a binary file
[2] Migrate a JSON file into Grakn
[3] Run all CWE queries
[4] Clean and restart Grakn
[5] Quit
```

Option `[1]` lists all executable files in the `/analysis` directory. So place any executables you want to analyze in `/analysis`. This option will run `pmanalyze.py` and generate a JSON file in the `/analysis` directory.

Once you've analyzed files with `[1]` and produced resulting JSON files, they will appear as a choice in option `[2]`. Selecting a JSON file in option `[2]` will migrate the data into Grakn.

Now that you have data in Grakn, you can use option `[3]`. This will kick off all scripts in `/queries` against the keyspace of your choice. If you write your own query patterns, just throw them in `/queries` and option `[3]` will run them too.
9 changes: 9 additions & 0 deletions analysis/about_this_folder
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
This folder serves two purposes:
1. It's where you put the binaries or Binary Ninja databases you want to analyze (PE, ELF, Mach-O, .bndb)
2. It's where analysis files (JSON) are stored after being processed by Paper Machete.

The Paper Machete CLI `paper_machete.py` enumerates this folder when presenting you with analysis/migration options.

FAQ:
Q: What if my target isn't a PE/ELF/Mach-O executable? It's a binary blob!
A: Analyze it with Binary Ninja and save your analysis as a .bndb file in this folder.
2 changes: 2 additions & 0 deletions config
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
[PATHS]
GRAKN=/home/user/PaperMachete/grakn-dist-1.0.0
Binary file added img/grakn-start.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/grakn_crash.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/grakn_crash_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
284 changes: 284 additions & 0 deletions paper_machete.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,284 @@
import sys
import subprocess
from os import listdir
from os.path import abspath, isdir, isfile, join, splitext
from ConfigParser import RawConfigParser
from mimetypes import guess_type
from urllib2 import urlopen
from ast import literal_eval
import pmanalyze

ENTER = '\nPress ENTER to continue'
MACHETE = abspath('.')
query_path = join(MACHETE, "queries")
configParser = RawConfigParser()
configParser.read('config')
GRAKN = configParser.get('PATHS', 'GRAKN')
ANALYSIS = join(MACHETE, "analysis")

MAX_ACTIVE = 25 # migration knob: max number of migration workers running at once
MAX_BATCHES = 1000000000 # migration knob: max number of rows to execute in one transation

MENU1 = "[1] Analyze a binary file"
MENU2 = "[2] Migrate a JSON file into Grakn"
MENU3 = "[3] Run all CWE queries"
MENU4 = "[4] Clean and restart Grakn"
MENU5 = "[5] Quit"

TEMPLATE_DESC = [
'', # n/a
'Migrating functions.', # template 1
'Migrating basic-blocks.', # template 2
'Linking basic-blocks to their functions.', # template 3
'Migrating instructions.', # template 4
'Linking instructions to their basic-blocks.', # template 5
'Migrating all AST nodes.', # template 6
'Linking AST nodes.' # template 7
]

def print_banner(title=""):
subprocess.call("clear")
print("""
____ __ __ _ _
| _ \ __ _ _ __ ___ _ __ | \/ | __ _ ___| |__ ___| |_ ___ ________
| |_) / _` | '_ \/ _ \ '__| | |\/| |/ _` |/ __| '_ \ / _ \ __/ _ \ /_______/
| __/ (_| | |_)| __/ | | | | | (_| | (__| | | | __/ || __/ \_______\\
|_| \__,_| .__/\___|_| |_| |_|\__,_|\___|_| |_|\___|\__\___| /_______/
|_| @==|;;;;;;>
""")
total_len = 80
if title:
padding = total_len - len(title) - 4
print("== {} {}\n".format(title, "=" * padding))
else:
print("{}\n".format("=" * total_len))

def run_script(query_path, query, keyspace):
try:
subprocess.call(["python3.6", join(query_path, query), keyspace])
except OSError:
print("It looks like you don't have Python3.6 installed. " \
"The Grakn Python driver requires it.")
return -1
return 0

def run_queries(query, keyspace):
if query == 'all_queries':
print("Running all CWE queries against the '{}' keyspace...".format(keyspace))
queries = [f for f in listdir(query_path) if isfile(join(query_path, f))]
for query in queries:
if ".py" not in query: continue
if run_script(query_path, query, keyspace): return
print("Script " + query + " complete.")
print("All queries complete.")
else:
if isfile(join(query_path, query)):
if run_script(query_path, query, keyspace): return
else:
print("Could not find the python script " + query)
print("Please make sure it is located in " + query_path)
return


def get_file_selection(types):
file_list = listdir(ANALYSIS)
filtered = []
for file in file_list:
if types == "json" and guess_type(join(ANALYSIS, file))[0] == "application/json":
filtered.append(file)
elif types == "bin":
filecmd = (subprocess.check_output(["file", join(ANALYSIS, file)])).lower()
filecmd = filecmd.split(": ")[1] # remove file path returned by 'file' utility
if "elf" in filecmd or "mach-o" in filecmd or "pe" in filecmd or ".bndb" in file.lower():
filtered.append(file)
else:
pass # not json or executable binary

# print file choices
if len(filtered) == 0:
if types == "json":
print("No json files were found in {}".format(ANALYSIS))
elif types == "bin":
print("No executable files were found in {}".format(ANALYSIS))
raw_input(ENTER)
return "quit"
else:
for i, file in enumerate(filtered):
print "[{}] {}".format(i, file)

index = raw_input("\nSelect a file number to analyze ([q]uit): ").lower()
if index == "q" or index == "quit":
return "quit"

try:
index = int(index)
if index in range(0, len(filtered)):
return filtered[int(index)]
except ValueError:
pass

if index != "":
print("\nThat is not a valid file selection. Try again.")
raw_input(ENTER)
if types == "bin":
print_banner(MENU1)
elif types == "json":
print_banner(MENU2)
else:
print_banner()

return False


def main():
menu = True
while menu:
print_banner()

# check directories
if not isdir(GRAKN):
if GRAKN == '':
print("Please set the path to your Grakn installation in the config file.\n")
print("Open the file called 'config' in your paper machete folder, and set")
print("the variable 'GRAKN' to the full file path of your Grakn installation.")
else:
print("Grakn directory not found\n")
print("Please ensure grakn is located in {}".format(GRAKN))
sys.exit()

if not isdir(MACHETE):
print("Paper Machete directory not found")
print("Please ensure Paper Machete is located in {}".format(MACHETE))
sys.exit()

if not isdir(ANALYSIS):
print("Creating directory '{}'".format(ANALYSIS))
subprocess.call(["mkdir", "analysis"])

menu_option = raw_input("{}\n{}\n{}\n{}\n{}\n\n>> ".format(MENU1,MENU2,MENU3,MENU4,MENU5))

try:
menu_option = int(menu_option)
except ValueError:
if menu_option != "":
print("'{}' is not a valid option.".format(menu_option))
raw_input(ENTER)
continue

# analyze a binary file
if menu_option == 1:

# display supported binary files in ./analysis
binary = False
while binary == False:
print_banner(MENU1)
binary = get_file_selection("bin")
if binary == "quit":
break
if binary == "quit":
continue

# check to see if the file exists, if it does, process it
if not isfile(join(ANALYSIS, binary)):
print("File '{}' not found.".format(binary))
else:
functions = str(raw_input('Specify a list of functions examine seperated by spaces (ENTER for all): ')).split()
if len(functions) == 0:
pmanalyze.main(join(ANALYSIS, binary))
else:
print functions
pmanalyze.main(join(ANALYSIS, binary), functions)
raw_input(ENTER)

# migrate a json file into Grakn
elif menu_option == 2:

# display supported binary files in ./analysis
json = False
while json == False:
print_banner(MENU2)
json = get_file_selection("json")
if json == "quit":
break
if json == "quit":
continue

# check to see if the keyspace already exists for this file
try:
keyspace = json.lower().replace('.json', '')
keyspaces = literal_eval(urlopen('http://127.0.0.1:4567/kb').read())

inc = 1
finding_name = True
while finding_name:
inc += 1
if keyspace not in keyspaces:
finding_name = False # keyspace name is not in use
else:
keyspace = "{}_{}".format(keyspace, inc) # add a _# suffix and try again
except:
print("Unable to query keyspace names. Is Grakn running?\nContinuing assuming keyspace '{}' is OK to use.".format(keyspace))

try:
# insert the ontology
print("Inserting ontology into the '{}' keyspace...".format(keyspace))
subprocess.call([join(GRAKN,"graql"),"console", "-f", join(MACHETE, "templates", "binja_mlil_ssa.gql"), "-k", keyspace])


# migrate data into Grakn
print("\nMigrating data from '{}' into the '{}' keyspace...".format(json, keyspace))

# loop over all 7 templates
for num in range(1,8):
print(">> Migration step {} of 7: {}".format(num, TEMPLATE_DESC[num]))
subprocess.call([join(GRAKN, "graql"), "migrate", "json", "--template", join(MACHETE, "templates", "binja_mlil_ssa_{}.tpl".format(num)), "--input", join(ANALYSIS, json), "--keyspace", keyspace])

print("Data successfully migrated into Grakn. You can now run CWE query scripts against '{}' to check for vulnerabilities".format(keyspace))
raw_input(ENTER)
except:
print("Upload failed... please try agin.")
raw_input(ENTER)

# run CWE queries
elif menu_option == 3:
keyspace = None
keyspaces = literal_eval(urlopen('http://127.0.0.1:4567/kb').read())['keyspaces']

print_banner(MENU3)

for i, ks in enumerate(keyspaces):
print("[{}] {}".format(i, ks['name']))

index = raw_input("\nSelect a keyspace to run all queries against ([q]uit): ").lower()
if index == "q" or index == "quit":
continue

try:
index = int(index)
if index in range(0, len(keyspaces)):
keyspace = keyspaces[int(index)]['name']
except ValueError:
continue

run_queries('all_queries', keyspace)
raw_input(ENTER)

# clean and restart Grakn
elif menu_option == 4:
print("Restarting Grakn. Press \"Y\" when prompted.\nWait until you see the Grakn banner before continuing!")
raw_input(ENTER)

subprocess.call([join(GRAKN, "grakn"), "server", "stop"])
subprocess.call([join(GRAKN, "grakn"), "server", "clean"])
subprocess.call([join(GRAKN, "grakn"), "server", "start"])

# quit
elif menu_option == 5:
menu = False

else:
print("Invalid option!\n")
raw_input(ENTER)

if __name__ == "__main__":
main()
Loading

0 comments on commit 9d79d12

Please sign in to comment.