Skip to content

Latest commit

 

History

History
234 lines (200 loc) · 6.9 KB

slides_2016-05-17_phdays_vi.src.org

File metadata and controls

234 lines (200 loc) · 6.9 KB

Abstract from phdays.com:

Speeds in hash cracking grow. The number of hashing algorithms grows. Work needed to maintain universal cracker grows too. The problem gave birth to john-devkit, an advanced code generator for the famous password cracker John the Ripper. More than 100 hash types are implemented within john-devkit. Its key aspects will be discussed: separation of algorithms, optimizations and output for different computing devices, simple intermediate representation of hashing algorithms, complexity of optimizations for humans and machines, bitslicing, comparison of speeds.

Slides below: —

john-devkit: 100 Hash Types Later

Aleksey Cherepanov

john-devkit and John the Ripper (JtR)

  • john-devkit is a code generator for JtR
    • it is an experiment and is not used in practice
  • JtR is the famous hash cracker
    • primary purpose is to detect weak Unix passwords
    • supports 200+ hash formats (types) coded by hand
    • supports dynamic hash types described by formula at run-time
    • and it can utilize CPUs, GPUs and even FPGAs (for bcrypt only)

The problem

  • there are a lot of hash types supported
  • developers care about speed
    • even 5% change is worth of investigation in case of popular hash types
  • it is fun to implement an optimization only once
  • then it is hard routine work to apply the optimization to all implemented formats
  • it is very time consuming to improve all hash types

john-devkit as a possible solution

  • the main desired ability was/is to transform code by program
    • optimizations may be viewed as transformations of code
    • when we separate implementation into base code and optimizations
      • the code may be easier
      • optimizations may be reused for other algorithms again for free
      • it is possible to play with optimizations easily
  • to simplify everything, john-devkit uses its own intermediate representation (IR) of code
    • IR is not low level
    • IR is specific for cryptography
  • john-devkit uses DSL on top of Python to populate IR

Flow: describe algorithm

  • code example:

>>>> from dk import *

code = [] with L.evaluate_to(code): L.Var.setup(4, ‘le’) a, b = L.input(), L.input() c = a + b print c L.output(c)

for instruction in code: print instruction […] <<<<

Flow: describe algorithm, output

  • output from the example:

>>>> <lang_main.Var object at 0x00007f88d07ec800> [‘var_setup’, ‘4’, ‘le’] [‘input’, ‘lib1’] [‘input’, ‘lib2’] [‘__add__’, ‘lib3’, ‘lib1’, ‘lib2’] [‘output’, ‘lib3’] <<<<

Flow: describe algorithm, comments

  • ‘print c’ is evaluated in Python and is not included into IR
  • c in ‘print c’ was DSL’s object, not a regular value
  • operators are overloaded to emit instruction and give new objects
  • john-devkit does not affect AST or bytecode of Python and may be run on any implementation of Python (usually PyPy for speed)
  • from Python’s POV, DSL is just a way to fill list of instructions
    • DSL may be used to describe full program, or a small part (it is used in optimizations)
  • from POV of a program in DSL, Python is a preprocessor
    • Python is fully evaluated before IR is converted further
    • Python is very mighty preprocessor
    • DSL does not see names of variables in Python

Flow: IR and transformations

  • 1 instruction: ‘\_\_add\_\_’, ‘lib3’, ‘lib1’, ‘lib2’
  • IR is very simple and certain
    • the whole program is just a list of lists of strings
    • each instruction has operator name and list of arguments
    • most instructions do not modify arguments
      • they return new “objects” instead
      • so IR is close to Static Single Assignment (SSA) form
      • it is friendly to transformations
  • when IR is obtained, transformations occur
    • programmer is free to do anything on this list of instructions
      • use existing filter
      • create custom filter
    • we’ll skip code example of transformations

Flow: output to C

  • code example:

>>>> […] c_template = r”’ #include <stdio.h> int main(void) { $type out, in[2] = { 11, 22 }; #define dk_input(i) (in[(i)]) #define dk_output(v, i) (out = (v)) $code printf(“from C: %d\n”, out); }”’ O.gen(code, ‘t.c’, c_template, {}, {}) <<<<

Flow: output code, output

  • the generated code:

>>>> #include <stdio.h> int main(void) { unsigned int out, in[2] = { 11, 22 }; #define dk_input(i) (in[(i)]) #define dk_output(v, i) (out = (v)) #define lib1 (dk_input(0)) #define lib2 (dk_input(1)) unsigned int lib3 ; lib3 = lib1 + lib2; dk_output(lib3, 0); #undef lib1 #undef lib2 printf(“from C: %d\n”, out); } <<<<

Flow: output code, comments

  • our final product is code in target language
    • it is C
    • PoC output to OpenCL exists
  • john-devkit uses a template to insert code into
    • it is implemented with standard string.Template class in Python
    • several variables are inserted into template
    • template has to define macros to connect generated code with environment
  • john-devkit produces code with structure similar to IR
    • the code is linear and noisy
    • it is possible to manually map generated code to source instructions of IR for debugging without special tools
    • code below is re-indented for readability
  • generated code may be compiled by a regular C compiler
    • produced format for JtR are built just like other formats

Implemented formats

  • in previous year, 7 formats were implemented with focus on performance successfully
  • now the focus is on number of hash types:
  • 9 iterated hash types:
    • pbkdf2-hmac-{md5,sha1,sha256,sha512}
    • hmac-{md5,sha1,sha256,sha512}
    • 1 variant of TrueCrypt: pbkdf2-hmac-sha512 + AES XTS
  • dynamic hash types, 102 were tested:
    • including 62 real world hash types, like
      • md5(md5($p).$s) (vBulletin)
      • md5(md5($s).md5($p)) (IPB)
    • including 40 synthetic hash types, like
      • sha512($s.sha512($p))
  • but speeds are poor yet because optimizations were not applied

Observed problems

  • C template is very time consuming part
    • some optimizations like interleaving, vectorization and bitslicing need support in template
    • some hash types need separate templates
      • TrueCrypt format tries to decrypt full block and check crc32 of data
      • it may be implemented in john-devkit later
  • it is possible to describe new hash type by formula as in JtR
  • it is possible to describe transformations for 1 format well
  • but good optimizations and mass production were not combined
    • in best cases, generated formats are slower than dynamic hash types in JtR by size of SIMD vector
  • john-devkit and hash types are being developed together
    • a hash type code is tweaked to better fit optimizations
    • new optimizations need new instructions in IR and backend

Conclusions

  • john-devkit can produce good code
  • john-devkit can produce many hash types
  • but not together, it needs more work

Thank you!