Abstract from phdays.com:
Speeds in hash cracking grow. The number of hashing algorithms grows. Work needed to maintain universal cracker grows too. The problem gave birth to john-devkit, an advanced code generator for the famous password cracker John the Ripper. More than 100 hash types are implemented within john-devkit. Its key aspects will be discussed: separation of algorithms, optimizations and output for different computing devices, simple intermediate representation of hashing algorithms, complexity of optimizations for humans and machines, bitslicing, comparison of speeds.
Slides below: —
john-devkit: 100 Hash Types Later
Aleksey Cherepanov
—
john-devkit and John the Ripper (JtR)
- john-devkit is a code generator for JtR
- it is an experiment and is not used in practice
- JtR is the famous hash cracker
- primary purpose is to detect weak Unix passwords
- supports 200+ hash formats (types) coded by hand
- supports dynamic hash types described by formula at run-time
- and it can utilize CPUs, GPUs and even FPGAs (for bcrypt only)
—
The problem
- there are a lot of hash types supported
- developers care about speed
- even 5% change is worth of investigation in case of popular hash types
- it is fun to implement an optimization only once
- then it is hard routine work to apply the optimization to all implemented formats
- it is very time consuming to improve all hash types
—
john-devkit as a possible solution
- the main desired ability was/is to transform code by program
- optimizations may be viewed as transformations of code
- when we separate implementation into base code and optimizations
- the code may be easier
- optimizations may be reused for other algorithms again for free
- it is possible to play with optimizations easily
- to simplify everything, john-devkit uses its own intermediate
representation (IR) of code
- IR is not low level
- IR is specific for cryptography
- john-devkit uses DSL on top of Python to populate IR
—
Flow: describe algorithm
- code example:
>>>> from dk import *
code = [] with L.evaluate_to(code): L.Var.setup(4, ‘le’) a, b = L.input(), L.input() c = a + b print c L.output(c)
for instruction in code: print instruction […] <<<<
—
Flow: describe algorithm, output
- output from the example:
>>>> <lang_main.Var object at 0x00007f88d07ec800> [‘var_setup’, ‘4’, ‘le’] [‘input’, ‘lib1’] [‘input’, ‘lib2’] [‘__add__’, ‘lib3’, ‘lib1’, ‘lib2’] [‘output’, ‘lib3’] <<<<
—
Flow: describe algorithm, comments
- ‘print c’ is evaluated in Python and is not included into IR
- c in ‘print c’ was DSL’s object, not a regular value
- operators are overloaded to emit instruction and give new objects
- john-devkit does not affect AST or bytecode of Python and may be run on any implementation of Python (usually PyPy for speed)
- from Python’s POV, DSL is just a way to fill list of instructions
- DSL may be used to describe full program, or a small part (it is used in optimizations)
- from POV of a program in DSL, Python is a preprocessor
- Python is fully evaluated before IR is converted further
- Python is very mighty preprocessor
- DSL does not see names of variables in Python
—
Flow: IR and transformations
- 1 instruction: ‘\_\_add\_\_’, ‘lib3’, ‘lib1’, ‘lib2’
- IR is very simple and certain
- the whole program is just a list of lists of strings
- each instruction has operator name and list of arguments
- most instructions do not modify arguments
- they return new “objects” instead
- so IR is close to Static Single Assignment (SSA) form
- it is friendly to transformations
- when IR is obtained, transformations occur
- programmer is free to do anything on this list of instructions
- use existing filter
- create custom filter
- we’ll skip code example of transformations
- programmer is free to do anything on this list of instructions
—
Flow: output to C
- code example:
>>>> […] c_template = r”’ #include <stdio.h> int main(void) { $type out, in[2] = { 11, 22 }; #define dk_input(i) (in[(i)]) #define dk_output(v, i) (out = (v)) $code printf(“from C: %d\n”, out); }”’ O.gen(code, ‘t.c’, c_template, {}, {}) <<<<
—
Flow: output code, output
- the generated code:
>>>> #include <stdio.h> int main(void) { unsigned int out, in[2] = { 11, 22 }; #define dk_input(i) (in[(i)]) #define dk_output(v, i) (out = (v)) #define lib1 (dk_input(0)) #define lib2 (dk_input(1)) unsigned int lib3 ; lib3 = lib1 + lib2; dk_output(lib3, 0); #undef lib1 #undef lib2 printf(“from C: %d\n”, out); } <<<<
—
Flow: output code, comments
- our final product is code in target language
- it is C
- PoC output to OpenCL exists
- john-devkit uses a template to insert code into
- it is implemented with standard string.Template class in Python
- several variables are inserted into template
- template has to define macros to connect generated code with environment
- john-devkit produces code with structure similar to IR
- the code is linear and noisy
- it is possible to manually map generated code to source instructions of IR for debugging without special tools
- code below is re-indented for readability
- generated code may be compiled by a regular C compiler
- produced format for JtR are built just like other formats
—
Implemented formats
- in previous year, 7 formats were implemented with focus on performance successfully
- now the focus is on number of hash types:
- 9 iterated hash types:
- pbkdf2-hmac-{md5,sha1,sha256,sha512}
- hmac-{md5,sha1,sha256,sha512}
- 1 variant of TrueCrypt: pbkdf2-hmac-sha512 + AES XTS
- dynamic hash types, 102 were tested:
- including 62 real world hash types, like
- md5(md5($p).$s) (vBulletin)
- md5(md5($s).md5($p)) (IPB)
- including 40 synthetic hash types, like
- sha512($s.sha512($p))
- including 62 real world hash types, like
- but speeds are poor yet because optimizations were not applied
—
Observed problems
- C template is very time consuming part
- some optimizations like interleaving, vectorization and bitslicing need support in template
- some hash types need separate templates
- TrueCrypt format tries to decrypt full block and check crc32 of data
- it may be implemented in john-devkit later
- it is possible to describe new hash type by formula as in JtR
- it is possible to describe transformations for 1 format well
- but good optimizations and mass production were not combined
- in best cases, generated formats are slower than dynamic hash types in JtR by size of SIMD vector
- john-devkit and hash types are being developed together
- a hash type code is tweaked to better fit optimizations
- new optimizations need new instructions in IR and backend
—
Conclusions
- john-devkit can produce good code
- john-devkit can produce many hash types
- but not together, it needs more work
—
Thank you!
- Thank you!
- code: https://github.com/AlekseyCherepanov/john-devkit
- more technical detail will be on john-dev mailing list
- my email: [email protected]