Skip to content

About Mercury

David McGrew edited this page Jul 22, 2020 · 3 revisions

Fingerprint strings

Network fingerprinting identifies a particular process, Operating System (OS), or device type by observing the network traffic that it sends, extracting characteristic data features, and then analyzing them. These data patterns often appear as particular values of parameters (such as the TCP initial window size) or as a particular sequence of optional parameters (such as the list of TLS cipher suites offered by a client). Each data features can conveniently be represented and analyzed as a byte string, though to ensure reversibility, mercury uses a fingerprint format that represents a parse tree formed by parsing the network data, selecting only the data elements that are characteristic of the sender, and then normalizing those elements to eliminate any session-specific data. The resulting format is flexible, simple, and reversible - that is, there is a one-to-one mapping between a fingerprint and the byte strings in the network session that correspond to the characteristic data features. The parse tree is represented bracket notation (sometimes called balanced parenthesis): each node of the tree is identified by a pair of parenthesis. Terminal nodes, which correspond to byte strings appearing in the packet, are represented as a hexadecimal strings, like 474554. Interior nodes are represented by wrapping parenthesis around zero or more external (or other internal) nodes. For instance, an example of a short HTTP fingerprint is

(474554)(485454502f312e31)(486f7374)(4163636570742d456e636f64696e673a20677a69702c206465666c617465)

If represented as ASCII, the terminal strings would be 'GET', 'HTTP/1.1', 'Host', and 'Accept-Encoding: gzip, deflate'. Mercury uses a hexadecimal representation for all data, to avoid encoding issues.

There are many distinct fingerprint strings, and the relationship between these strings and software processes, libraries, and operating systems is non-trivial. A single string may correspond to more than one application, and a single application may generate more than one fingerprint strings. Nonetheless, these strings are quite informative, especially for tasks such as the detection of obsolete software.

Clone this wiki locally