Initial commit

cetfor · Mar 6, 2018 · 9d79d12 · 9d79d12
commit 9d79d12
Show file tree

Hide file tree

Showing 24 changed files with 1,767 additions and 0 deletions.
diff --git a/.gitattributes b/.gitattributes
@@ -0,0 +1,2 @@
+# Auto detect text files and perform LF normalization
+* text=auto
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,2 @@
+*.pyc
+.DS_Store
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2017 Battelle Memorial Institute
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/README.md b/README.md
@@ -0,0 +1,62 @@
+     ____                        __  __            _          _           
+    |  _ \ __ _ _ __  ___ _ __  |  \/  | __ _  ___| |__   ___| |_ ___     ________
+    | |_) / _` | '_ \/ _ \ '__| | |\/| |/ _` |/ __| '_ \ / _ \ __/ _ \   /_______/
+    |  __/ (_| | |_)|  __/ |    | |  | | (_| | (__| | | |  __/ ||  __/   \_______\
+    |_|   \__,_| .__/\___|_|    |_|  |_|\__,_|\___|_| |_|\___|\__\___|   /_______/
+               |_|                                                      @==|;;;;;;>
+
+## About
+Paper Machete (PM) orchestrates [Binary Ninja](https://binary.ninja) and [Grakn.ai](https://grakn.ai) to aid static binary analysis for the purpose of finding bugs in software. PM leverages the Binary Ninja MLIL SSA to extract semantic meaning about individual instructions, operations, register/variable state, and overall control flow.
+
+PM migrates this data into Grakn - a knowledge graph that gives us the ability to define domain-specific ontologies for data and write powerful inference rules to form relationships between data we don't want to (or can't) explicitly store. [Heeh, how neat is that](https://www.youtube.com/watch?v=Hm3JodBR-vs)?
+
+This project was released in conjunction with a DerbyCon 2017 talk titled "Aiding Static Analysis: Discovering Vulnerabilities in Binary Targets through Knowledge Graph Inferences." You can watch that talk [here](http://www.irongeek.com/i.php?page=videos/derbycon7/t116-aiding-static-analysis-discovering-vulnerabilities-in-binary-targets-through-knowledge-graph-inferences-john-toterhi). 
+
+Paper Machete's initial prototype and public codebase were developed by security researchers at the [Battelle Memorial Institute](https://www.battelle.org/government-offerings/national-security/cyber/mission-focused-tools). As this project matures, we hope that you will find it useful in your own research and consider contributing to the project.
+
+## Why BNIL?
+The BNIL suite of ILs is easy to work with, pleasantly verbose, and human-readable. At any point we can decide to leverage other levels and forms of the IL with little development effort on our part. When you add to that the ability to [lift multiple architectures](https://binary.ninja/faq/) and [write custom lifters](https://github.com/joshwatson/binaryninja-msp430), we have little reason not to use BNIL.
+
+## Why Grakn?
+Grakn's query language (Graql) is easy to learn and intuitive, which is extremely important in the early stages of this research while we're still hand-writing queries to model the patterns vulnerability researchers look for when performing static analysis. 
+
+The ability to write our own domain-specific ontologies lets us quickly experiment with new query ideas and ways of making our queries less complex. When we run into a case where we think "gee, if I just had access to the relationship between..." we can modify our ontology and inference rules to get that information.
+
+While the end game for PM is to eliminate the need for human-written queries, the fact is we're starting from square one. Which means hand-jamming a lot queries to model the patterns human vulnerability researchers look for when bug hunting.
+
+## Dependencies
+Paper Machete requires [BinaryNinja v1.1](https://binary.ninja), [Grakn v1.0.0](https://github.com/graknlabs/grakn/releases/tag/v1.0.0), the [Grakn Python Driver](http://github.com/graknlabs/grakn-python), and the [Java JRE](http://www.oracle.com/technetwork/java/javase/downloads/index.html)
+
+
+## Query Scripts
+We've included some basic queries to get you started if you want to play around with PM. As you can imagine, there is no "silver bullet" query that will find all manifestations of a specific vulnerability class. Because of this, we've included versions for each CWE query. As we add new methods of finding the same CWE, we'll add scripts with incremented the version numbers to differentiate. 
+
+`cwe_120_v1.py` - Tests for use of unsafe 'gets()' function ([CWE-120](https://cwe.mitre.org/data/definitions/120.html))
+
+`cwe_121_v1.py` - Tests for buffer overflows ([CWE-121](https://cwe.mitre.org/data/definitions/121.html))
+
+`cwe_129_v1.py` - Tests for missing bounds checks ([CWE-129](https://cwe.mitre.org/data/definitions/129.html))
+
+`cwe_134_v1.py` - Tests for format string vulnerabilities ([CWE-134](https://cwe.mitre.org/data/definitions/134.html))
+
+`cwe_788_v1.py` - Tests for missing bounds check on array indexes ([CWE-788](https://cwe.mitre.org/data/definitions/788.html))
+
+## How Do I Use It?
+
+For basic use, run the `paper_machete.py` script and follow the prompts. For more advanced use, please [read the wiki](https://github.com/cetfor/PaperMachete/wiki).
+
+Typically you'll start with option `[1]` and work your way down to option `[3]`. If you run into any issues with Grakn use option `[4]` to reset Grakn to a clean state and try again.
+```
+... banner ...
+[1] Analyze a binary file
+[2] Migrate a JSON file into Grakn
+[3] Run all CWE queries
+[4] Clean and restart Grakn
+[5] Quit
+```
+
+Option `[1]` lists all executable files in the `/analysis` directory. So place any executables you want to analyze in `/analysis`. This option will run `pmanalyze.py` and generate a JSON file in the `/analysis` directory.
+
+Once you've analyzed files with `[1]` and produced resulting JSON files, they will appear as a choice in option `[2]`. Selecting a JSON file in option `[2]` will migrate the data into Grakn.
+
+Now that you have data in Grakn, you can use option `[3]`. This will kick off all scripts in `/queries` against the keyspace of your choice. If you write your own query patterns, just throw them in `/queries` and option `[3]` will run them too.
diff --git a/analysis/about_this_folder b/analysis/about_this_folder
@@ -0,0 +1,9 @@
+This folder serves two purposes:
+	1. It's where you put the binaries or Binary Ninja databases you want to analyze (PE, ELF, Mach-O, .bndb)
+	2. It's where analysis files (JSON) are stored after being processed by Paper Machete.
+
+The Paper Machete CLI `paper_machete.py` enumerates this folder when presenting you with analysis/migration options.
+
+FAQ:
+Q: What if my target isn't a PE/ELF/Mach-O executable? It's a binary blob!
+A: Analyze it with Binary Ninja and save your analysis as a .bndb file in this folder.
diff --git a/config b/config
@@ -0,0 +1,2 @@
+[PATHS]
+GRAKN=/home/user/PaperMachete/grakn-dist-1.0.0
diff --git a/img/grakn-start.png b/img/grakn-start.png
diff --git a/img/grakn_crash.png b/img/grakn_crash.png
diff --git a/img/grakn_crash_2.png b/img/grakn_crash_2.png
diff --git a/paper_machete.py b/paper_machete.py
@@ -0,0 +1,284 @@
+import sys
+import subprocess
+from os import listdir
+from os.path import abspath, isdir, isfile, join, splitext
+from ConfigParser import RawConfigParser
+from mimetypes import guess_type
+from urllib2 import urlopen
+from ast import literal_eval
+import pmanalyze
+
+ENTER = '\nPress ENTER to continue'
+MACHETE = abspath('.')
+query_path = join(MACHETE, "queries")
+configParser = RawConfigParser()
+configParser.read('config')
+GRAKN = configParser.get('PATHS', 'GRAKN') 
+ANALYSIS = join(MACHETE, "analysis")
+
+MAX_ACTIVE = 25     # migration knob: max number of migration workers running at once
+MAX_BATCHES = 1000000000   # migration knob: max number of rows to execute in one transation
+
+MENU1 = "[1] Analyze a binary file"
+MENU2 = "[2] Migrate a JSON file into Grakn"
+MENU3 = "[3] Run all CWE queries"
+MENU4 = "[4] Clean and restart Grakn"
+MENU5 = "[5] Quit"
+
+TEMPLATE_DESC = [
+    '', # n/a
+    'Migrating functions.',                         # template 1
+    'Migrating basic-blocks.',                      # template 2
+    'Linking basic-blocks to their functions.',     # template 3
+    'Migrating instructions.',                      # template 4
+    'Linking instructions to their basic-blocks.',  # template 5
+    'Migrating all AST nodes.',                     # template 6
+    'Linking AST nodes.'                            # template 7
+]
+
+def print_banner(title=""):
+    subprocess.call("clear")
+    print("""
+ ____                        __  __            _          _           
+|  _ \ __ _ _ __  ___ _ __  |  \/  | __ _  ___| |__   ___| |_ ___    ________
+| |_) / _` | '_ \/ _ \ '__| | |\/| |/ _` |/ __| '_ \ / _ \ __/ _ \  /_______/
+|  __/ (_| | |_)|  __/ |    | |  | | (_| | (__| | | |  __/ ||  __/  \_______\\
+|_|   \__,_| .__/\___|_|    |_|  |_|\__,_|\___|_| |_|\___|\__\___|  /_______/
+           |_|                                                     @==|;;;;;;>
+""")
+    total_len = 80
+    if title:
+        padding = total_len - len(title) - 4
+        print("== {} {}\n".format(title, "=" * padding))
+    else:
+        print("{}\n".format("=" * total_len))
+
+def run_script(query_path, query, keyspace):
+    try:
+        subprocess.call(["python3.6", join(query_path, query), keyspace])
+    except OSError:
+        print("It looks like you don't have Python3.6 installed. " \
+            "The Grakn Python driver requires it.")
+        return -1
+    return 0
+
+def run_queries(query, keyspace):
+    if query == 'all_queries':
+        print("Running all CWE queries against the '{}' keyspace...".format(keyspace))
+        queries = [f for f in listdir(query_path) if isfile(join(query_path, f))]
+        for query in queries:
+            if ".py" not in query: continue
+            if run_script(query_path, query, keyspace): return
+            print("Script " + query + " complete.")
+        print("All queries complete.")
+    else:
+        if isfile(join(query_path, query)):
+            if run_script(query_path, query, keyspace): return
+        else:
+            print("Could not find the python script " + query)
+            print("Please make sure it is located in " + query_path)
+        return
+
+
+def get_file_selection(types):
+    file_list = listdir(ANALYSIS)
+    filtered = []
+    for file in file_list:
+        if types == "json" and guess_type(join(ANALYSIS, file))[0] == "application/json":
+            filtered.append(file)
+        elif types == "bin":
+            filecmd = (subprocess.check_output(["file", join(ANALYSIS, file)])).lower()
+            filecmd = filecmd.split(": ")[1] # remove file path returned by 'file' utility
+            if "elf" in filecmd or "mach-o" in filecmd or "pe" in filecmd or ".bndb" in file.lower():
+                filtered.append(file)
+        else:
+            pass # not json or executable binary
+
+    # print file choices
+    if len(filtered) == 0:
+        if types == "json":
+            print("No json files were found in {}".format(ANALYSIS))
+        elif types == "bin":
+            print("No executable files were found in {}".format(ANALYSIS))
+        raw_input(ENTER)
+        return "quit"
+    else:
+        for i, file in enumerate(filtered):
+            print "[{}] {}".format(i, file)
+
+    index = raw_input("\nSelect a file number to analyze ([q]uit): ").lower()
+    if index == "q" or index == "quit":
+        return "quit"
+
+    try:
+        index = int(index)
+        if index in range(0, len(filtered)):
+            return filtered[int(index)]
+    except ValueError:
+        pass
+
+    if index != "":
+        print("\nThat is not a valid file selection. Try again.")
+        raw_input(ENTER)
+    if types == "bin":
+        print_banner(MENU1)
+    elif types == "json":
+        print_banner(MENU2)
+    else:
+        print_banner()
+
+    return False
+
+
+def main():
+    menu = True
+    while menu:
+        print_banner()
+
+        # check directories	
+        if not isdir(GRAKN):
+            if GRAKN == '':
+                print("Please set the path to your Grakn installation in the config file.\n")
+                print("Open the file called 'config' in your paper machete folder, and set")
+                print("the variable 'GRAKN' to the full file path of your Grakn installation.")
+            else:
+                print("Grakn directory not found\n")
+                print("Please ensure grakn is located in {}".format(GRAKN))
+            sys.exit()
+
+        if not isdir(MACHETE):
+            print("Paper Machete directory not found")
+            print("Please ensure Paper Machete is located in {}".format(MACHETE))
+            sys.exit()
+
+        if not isdir(ANALYSIS):
+            print("Creating directory '{}'".format(ANALYSIS))
+            subprocess.call(["mkdir", "analysis"])
+
+        menu_option = raw_input("{}\n{}\n{}\n{}\n{}\n\n>> ".format(MENU1,MENU2,MENU3,MENU4,MENU5))
+
+        try:
+            menu_option = int(menu_option)
+        except ValueError:
+            if menu_option != "":
+                print("'{}' is not a valid option.".format(menu_option))
+                raw_input(ENTER)
+            continue
+
+        # analyze a binary file
+        if menu_option == 1:
+
+            # display supported binary files in ./analysis
+            binary = False
+            while binary == False:
+                print_banner(MENU1)
+                binary = get_file_selection("bin")
+                if binary == "quit":
+                    break
+            if binary == "quit":
+                continue
+
+            # check to see if the file exists, if it does, process it
+            if not isfile(join(ANALYSIS, binary)):
+                print("File '{}' not found.".format(binary))
+            else:
+                functions = str(raw_input('Specify a list of functions examine seperated by spaces (ENTER for all): ')).split()
+                if len(functions) == 0:
+                    pmanalyze.main(join(ANALYSIS, binary))
+                else:
+                    print functions
+                    pmanalyze.main(join(ANALYSIS, binary), functions)
+            raw_input(ENTER)
+
+        # migrate a json file into Grakn
+        elif menu_option == 2:
+
+            # display supported binary files in ./analysis
+            json = False
+            while json == False:
+                print_banner(MENU2)
+                json = get_file_selection("json")
+                if json == "quit":
+                    break
+            if json == "quit":
+                continue
+
+            # check to see if the keyspace already exists for this file
+            try:
+                keyspace = json.lower().replace('.json', '')
+                keyspaces = literal_eval(urlopen('http://127.0.0.1:4567/kb').read())
+
+                inc = 1
+                finding_name = True
+                while finding_name:
+                    inc += 1
+                    if keyspace not in keyspaces:
+                        finding_name = False # keyspace name is not in use
+                    else:
+                        keyspace = "{}_{}".format(keyspace, inc) # add a _# suffix and try again
+            except:
+                print("Unable to query keyspace names. Is Grakn running?\nContinuing assuming keyspace '{}' is OK to use.".format(keyspace))
+
+            try:
+                # insert the ontology
+                print("Inserting ontology into the '{}' keyspace...".format(keyspace))
+                subprocess.call([join(GRAKN,"graql"),"console", "-f", join(MACHETE, "templates", "binja_mlil_ssa.gql"), "-k", keyspace])
+
+
+                # migrate data into Grakn
+                print("\nMigrating data from '{}' into the '{}' keyspace...".format(json, keyspace))
+
+                # loop over all 7 templates
+                for num in range(1,8):
+                    print(">> Migration step {} of 7: {}".format(num, TEMPLATE_DESC[num]))
+                    subprocess.call([join(GRAKN, "graql"), "migrate", "json", "--template", join(MACHETE, "templates", "binja_mlil_ssa_{}.tpl".format(num)), "--input", join(ANALYSIS, json), "--keyspace", keyspace])
+
+                print("Data successfully migrated into Grakn. You can now run CWE query scripts against '{}' to check for vulnerabilities".format(keyspace))
+                raw_input(ENTER)
+            except:
+                print("Upload failed... please try agin.")
+                raw_input(ENTER)
+
+        # run CWE queries
+        elif menu_option == 3:
+            keyspace = None
+            keyspaces = literal_eval(urlopen('http://127.0.0.1:4567/kb').read())['keyspaces']
+
+            print_banner(MENU3)
+
+            for i, ks in enumerate(keyspaces):
+                print("[{}] {}".format(i, ks['name']))
+
+            index = raw_input("\nSelect a keyspace to run all queries against ([q]uit): ").lower()
+            if index == "q" or index == "quit":
+                continue
+
+            try:
+                index = int(index)
+                if index in range(0, len(keyspaces)):
+                    keyspace = keyspaces[int(index)]['name']
+            except ValueError:
+                continue
+
+            run_queries('all_queries', keyspace)
+            raw_input(ENTER)
+
+        # clean and restart Grakn
+        elif menu_option == 4:
+            print("Restarting Grakn. Press \"Y\" when prompted.\nWait until you see the Grakn banner before continuing!")
+            raw_input(ENTER)
+
+            subprocess.call([join(GRAKN, "grakn"), "server", "stop"])
+            subprocess.call([join(GRAKN, "grakn"), "server", "clean"])
+            subprocess.call([join(GRAKN, "grakn"), "server", "start"])
+
+        # quit
+        elif menu_option == 5:
+            menu = False
+
+        else:
+            print("Invalid option!\n")
+            raw_input(ENTER)
+
+if __name__ == "__main__":
+    main()
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		# Auto detect text files and perform LF normalization
		* text=auto
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		[PATHS]
		GRAKN=/home/user/PaperMachete/grakn-dist-1.0.0