Skip to content

Commit

Permalink
Merge pull request #235 from network-intelligence/dev
Browse files Browse the repository at this point in the history
Merging dev into trunk
  • Loading branch information
davidmcgrew authored and GitHub Enterprise committed Mar 8, 2024
2 parents ce8d37e + 720fb22 commit 6ea8fa6
Show file tree
Hide file tree
Showing 29 changed files with 1,817 additions and 209 deletions.
72 changes: 68 additions & 4 deletions doc/npf.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,12 @@ where

## TLS

TLS fingerprints are formed from packets containing a TLS Client Hello message. There are two formats defined. The newer one, "tls/1", sorts the extensions into lexicographic order, to compensate for the randomization of those fields introduced by some TLS clients. The older one, "tls", does not sort those extensions into order.
TLS fingerprints are formed from packets containing a TLS Client Hello message. There are three fingerprint formats defined. The format "tls/1", sorts the extensions into lexicographic order, to compensate for the randomization of those fields introduced by some TLS clients. The recent format "tls/2" sort only the selected extensions in lexicographic order. The older one, "tls", does not sort those extensions into order.

The "tls/2" fingerprint format is
```
"tls/2" (TLS_Version) (TLS_Ciphersuite) [(selected_TLS_Extension)*]
```

The "tls/1" fingerprint format is

Expand Down Expand Up @@ -160,9 +165,8 @@ where
DEGREASE(extension[0:2]) otherwise.
```

`QUIC_extension` is as defined below.

The function DEGREASE takes as input a two-byte value and returns a two-byte value.
The function DEGREASE takes as input a two-byte value and returns a two-byte value.

```
DEGREASE(x) = 0x0a0a if x is in TLS_GREASE, and
Expand All @@ -184,13 +188,66 @@ TLS_EXT_FIXED = {
}.
```

selected_TLS_Extension chooses only a subset of extensions from TLS_extension as defined below,

```
selected_TLS_Extension = extension if DEGREASE(extension[0:2]) is in TLS_EXT_FIXED,
ENCODE(extension[0:2]) if ENCODE(extension[0:2]) is in TLS_EXT_INCLUDE,
```

The function ENCODE is defined as below
```
ENCODE(x) = DEGREASE(x) if DEGREASE(x) is in TLS_EXT_INCLUDE
ENCODE_UNASSIGNED(x) if x is in TLS_UNASSIGNED
ENCODE_PRIVATE(x) if x is in TLS_PRIVATE
```

The function ENCODE_UNASSIGNED and ENCODE_PRIVATE accepts a 2 byte value and returns a two byte value and is defined as below,

```
ENCODE_UNASSIGNED(x) = 0x003e if x is in TLS_UNASSIGNED,
x otherwise
```

```
ENCODE_PRIVATE(x) = 0xff00 if x is in TLS_PRIVATE,
x otherwise
```

The set TLS_EXT_INCLUDE, TLS_UNASSIGNEDm TLS_PRIVATE is defined as

```
TLS_EXT_INCLUDE = {
0x0000, 0x0001, 0x0002, 0x0003, 0x0004, 0x0005, 0x0006, 0x0007,
0x0008, 0x0009, 0x000a, 0x000b, 0x000c, 0x000d, 0x000e, 0x000f
0x0010, 0x0011, 0x0012, 0x0013, 0x0014, 0x0016, 0x0017, 0x0018,
0x0019, 0x001a, 0x001b, 0x001c, 0x001d, 0x001e, 0x001f, 0x0020,
0x0021, 0x0022, 0x0024, 0x0025, 0x0026, 0x0027, 0x0028, 0x002b,
0x002c, 0x002d, 0x002e, 0x002f, 0x0030, 0x0031, 0x0032, 0x0033,
0x0034, 0x0035, 0x0036, 0x0037, 0x0038, 0x0039, 0x003a, 0x003b,
0x003c, 0x003d, 0x003e, 0x0a0a, 0x3374, 0x5500, 0x754f, 0x7550,
0xfd00, 0xfe0d, 0xff00, 0xff01, 0xff03, 0xffce
}
TLS_UNASSIGNED = set of all extensions under the category "Unassigned" as per IANA
TLS_PRIVATE = set of all extensions under the category "Reserved for Private Use" as per IANA
```



## QUIC

QIUC fingerprints are computed from the QUIC Initial Packet. To compute this fingerprint, it is necessary to remove header protection, decrypt the QUIC Frames, reassemble the CRYPTO Frame, and then process the TLS Client Hello in that frame. If there is no CRYPTO Frame in the packet, it is not possible to compute a fingerprint. The fingerprint format is
QIUC fingerprints are computed from the QUIC Initial Packet. To compute this fingerprint, it is necessary to remove header protection, decrypt the QUIC Frames, reassemble the CRYPTO Frame, and then process the TLS Client Hello in that frame. If there is no CRYPTO Frame in the packet, it is not possible to compute a fingerprint. There are two quic fingerprint formats. The newer format "quic/1" sorts the selected extensions in lexicographic order while the older format "quic" sorts all extensions in lexicographic order.

The "quic/1" format is

```
"quic/1" (QUIC_Version) (TLS_Version) (TLS_Ciphersuites) [(Selected_QUIC_Extension)* ]
```

The older format "quic" is

```
"quic/" (QUIC_Version) (TLS_Version) (TLS_Ciphersuites) [ QUIC_Extension* ]
Expand Down Expand Up @@ -223,7 +280,13 @@ An example of a QUIC fingerprint is
```
quic/(ff00001d)(0303)(0a0a130113021303)[(0a0a)(0a0a)(0000)(000500050100000000)(000a000c000a0a0a001d001700180019)(000d0018001604030804040105030203080508050501080606010201)(0010000e000c0568332d32390568332d3237)(0012)(001b0003020001)(002b0005040a0a0304)(002d00020101)(0033)((ffa5)[(04)(05)(06)(07)(09)(0e)(0f)])]
```
- `Selected_QUIC_Extension` chooses only a subset of Quic Extensions and is defined as below

```
Selected_QUIC_Extension = extension if DEGREASE(extension[0:2]) is in TLS_EXT_FIXED,
QTP(extension) if extension[0:2] is in { 0x0039, 0xffa5 },
ENCODE(extension[0:2]) if ENCODE(extension[0:2]) is in TLS_EXT_INCLUDE,
```


## HTTP
Expand Down Expand Up @@ -344,3 +407,4 @@ A secondary goal is to handle fingerprints generated from truncated protocol mes
The JA3 fingerprinting system has a relatively compact representation, consisting of 32 hex characters, but it only applies to TLS, is not reversible, and does not utilize GREASE information. The original mercury fingerprinting system is reversible, utilizes GREASE, and applies to multiple protocols, but it does not contain an explicit indication of the protocol, and is not compact. Neither system allows to indicate versioning information that would enable the details of the fingerprinting scheme to adapt over time. This note defines a fingerprint naming scheme that aims to provide the benefits of both systems, along with explicit information about protocols and versions, drawing inspiration from the [Common Platform Enumeration](https://nvd.nist.gov/products/cpe) naming system.



167 changes: 167 additions & 0 deletions doc/tls_extn_sort_autogen.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
# Autogeneration of C++ helper code to sort TLS extensions

## Sorting TLS extensions in fingerprint string
The TLS fingerprint string consists of TLS version, cipher suites and TLS extensions. The TLS extensions in fingerprints are sorted starting from fingerprint format 1 and above. In general the time complexity to sort is O(nlogn). Sorting can be done in linear time if the values to be sorted are known before hand. For tls extension use case, modified version of counting sort is used.

### Algorithm for sorting tls extensions in linear time:

1. Prepare tls_include_list which contains the list of TLS extensions that are part of the fingerprint string.
2. Sort the tls_include_list and use hash table to store the sorted indices of each extension in tls_include_list.
2. Create an auxiliary array of the size of tls_include_list to store the extension and its count and initialize the count to zero.
3. For each extension in the input, get the sorted index of the extension by performing hash table lookup. Store the extension in the retrieved index of auxiliary array and increment the count.
4. Now do the final pass in the auxiliary array to read the count of each element. While the count is positive, print the element and decrement the count to get the sorted tls extensions.

The hash table mentioned above needs rework as when there is modification to the TLS extension include list.

Mercury package provides tls_csv utility which can generate C++ classes and supporting apis required to sort tls extensions based on the above mentioned algorithm. The autogeneration code for tls_csv utility is present under src/tables folder in the mercury repository and source code is present in src/tables/tls_extension_generator.cc.

## How to use tls_csv utility to generate C++ classes

tls_csv utility can be run as below

```
./tls_csv outfile=<of> include_extensions=<include_extensions_file> <infile.csv>:<classname> [ <infile.csv>:<classname> ... ]
```
`outfile` - writes the header file with name <of>

`include_extensions` - reads the input text file <include_extensions_file>

`<infile.csv>` - input csv file

`<classname>` - name of the class name in the output header file <of>



### CSV files containing tls extension type code and name
The tls_csv reads one or more CSV files which contains the mapping between tls extension type code and the name. The Makefile in the src/tables has the changes to download the csv files from the IANA site.

Sample csv file
```
Value,Extension Name,TLS 1.3,DTLS-Only,Recommended,Reference
0,server_name,"CH, EE, CR",N,Y,[RFC6066][RFC9261]
1,max_fragment_length,"CH, EE",N,N,[RFC6066][RFC8449]
2,client_certificate_url,-,N,Y,[RFC6066]
```
The tls_csv utility uses the values from the first two columns - Value and Extension Name.

### Text file with TLS extensions that need to be part of fingerprint string
It also reads a text file which contains the comma separated list of tls extensions that needs to be part of the fingerprint. The extensions values can also contain a range of tls extensions whose format is [start_range-end_range]

Example:

> 0-20,22-34,36-40,43-62,2570
Mercury package has the input text file local_include_extension.txt and is present in the path src/tables/source. This file needs to be edited for any change in the tls extensions include list.

## Compiling tls_csv utility
The Makefile in src/folder has the required changes to compile tls_csv utility.

To compile the changes, do either of the below

> make
This will compile both csv and tls_csv utility


> make tls
This will compile the tls_csv utility only.

## Running tls_csv utility

Let us see how to run the tls_csv file and understand its output
```
./tls_csv outfile=tls_extensions.h verbose=true dir=source include_extensions=local_include_extension.txt tls-extensiontype-values-1.csv:tls_extensions_assign
```
Running the tls_csv will write the header file tls_extensions.h and it will create the class with name tls_extensions_assign.

Let us understand the autogenerated code with a sample output from tls_csv utility.

```
// tls_extensions.h
//
// this file was autogenerated at 2024-02-14T10:17:23Z
// you should edit the source file(s) instead of this one
//
// source files:
// tls-extensiontype-values-1.csv
//
#ifndef TLS_EXTENSIONS_H
#define TLS_EXTENSIONS_H
#include <unordered_map>
class tls_extensions_assign{
static std::unordered_map<int32_t, int32_t>& get_mapping_index() {
static std::unordered_map<int32_t, int32_t> mapping_index = {
{ 0, 0},
{ 1, 1},
{ 2, 2},
{ 3, 3},
{ 4, 4},
{ 5, 5},
{ 6, 6},
....
....
{ 65037, 65},
{ 65280, 66},
{ 65281, 67},
{ 65283, 68},
{ 65486, 69},
};
return mapping_index;
}
public:
static constexpr uint16_t include_list_len = 70;
tls_extensions_assign() {}
static int32_t get_index(uint16_t type) {
static const std::unordered_map<int32_t, int32_t> &mapping_index = get_mapping_index();
auto it = mapping_index.find(type);
if (it != mapping_index.end()) {
return(it->second);
}
return -1;
}
static constexpr uint16_t smallest_private_extn = 65280;
static constexpr uint16_t smallest_unassigned_extn = 62;
};
#endif // TLS_EXTENSIONS_H
```

Let us look at how the class is created by reading the input files.

#### Step 1
The text file local_include_extension.txt is read and parsed to create a list of tls extensions and sort the list. The sorted list is used to created the unordered map that is present in the output file with type code as the key and its position in the sorted list as the value.

#### Step 2
A static variable include_list_len is initialized with the length of tls extensions include list.

#### Step 3
The input csv file tls-extensiontype-values-1.csv is parsed to find the smallest tls extension in Unassigned and Reserved for Private Use range. These values are used to initialize the two static variables smallest_unassigned_extn and smallest_private_extn in the class.

### Step 4
Add required helper functions to complete the class.

Member function get_index()


`Input Parameter` - tls extension type code
`Output Parameters` - Returns the index of the sorted position of the tls extension code in the tls extension include list.
- Returns -1 if the extension is not part of the tls extension include list.

## References
Counting Sort - https://www.geeksforgeeks.org/counting-sort/





4 changes: 2 additions & 2 deletions src/intercept.cc
Original file line number Diff line number Diff line change
Expand Up @@ -826,7 +826,7 @@ class intercept {
}

pkt_proc_ctx->analysis.fp.init();
std::visit(compute_fingerprint{pkt_proc_ctx->analysis.fp, pkt_proc_ctx->global_vars.tls_fingerprint_format}, (is_tcp ? tcp_proto : udp_proto));
std::visit(compute_fingerprint{pkt_proc_ctx->analysis.fp, pkt_proc_ctx->global_vars.fp_format}, (is_tcp ? tcp_proto : udp_proto));

if (pkt_proc_ctx->analysis.fp.get_type() != fingerprint_type_unknown)
{
Expand Down Expand Up @@ -916,7 +916,7 @@ class intercept {
}

pkt_proc_ctx->analysis.fp.init();
std::visit(compute_fingerprint{pkt_proc_ctx->analysis.fp, pkt_proc_ctx->global_vars.tls_fingerprint_format}, (is_tcp ? tcp_proto : udp_proto));
std::visit(compute_fingerprint{pkt_proc_ctx->analysis.fp, pkt_proc_ctx->global_vars.fp_format}, (is_tcp ? tcp_proto : udp_proto));
if (pkt_proc_ctx->analysis.fp.get_type() != fingerprint_type_unknown) {
pkt_proc_ctx->analysis.fp.write(record);
}
Expand Down
1 change: 1 addition & 0 deletions src/libmerc/Makefile.in
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,7 @@ LIBMERC_H += smb1.h
LIBMERC_H += smb2.h
LIBMERC_H += bencode.h
LIBMERC_H += bittorrent.h
LIBMERC_H += tls_extensions.h

# asn1/oid.cc and asn1/oid.h are auto-built from ASN1 files in the
# asn1 subdirectory; this is a pattern target that builds both files
Expand Down
25 changes: 23 additions & 2 deletions src/libmerc/analysis.h
Original file line number Diff line number Diff line change
Expand Up @@ -785,6 +785,7 @@ class classifier {

std::vector<fingerprint_type> fp_types;
size_t tls_fingerprint_format = 0;
size_t quic_fingerprint_format = 0;
bool first_line = true;

// the common object holds data that is common across all
Expand All @@ -810,6 +811,8 @@ class classifier {

size_t get_tls_fingerprint_format() const { return tls_fingerprint_format; }

size_t get_quic_fingerprint_format() const { return quic_fingerprint_format; }

static std::pair<fingerprint_type, size_t> get_fingerprint_type_and_version(const std::string &s) {
fingerprint_type type = fingerprint_type_unknown;
unsigned int version = 0;
Expand Down Expand Up @@ -918,7 +921,9 @@ class classifier {
std::pair<fingerprint_type, size_t> fingerprint_type_and_version = get_fingerprint_type_and_version(fp_string.c_str());

if (fp_type_code != fingerprint_type_and_version.first) {
printf_err(log_warning, "fingerprint type of str_repr '%s' does not match fp_type, ignorning JSON line\n", fp_string.c_str());
printf_err(log_warning,
"fingerprint type of str_repr '%s' does not match fp_type, ignorning JSON line\n",
fp_string.c_str());
return;
}

Expand All @@ -929,7 +934,23 @@ class classifier {
tls_fingerprint_format = fingerprint_type_and_version.second;
} else {
if (fingerprint_type_and_version.second != tls_fingerprint_format) {
printf_err(log_warning, "fingerprint version with inconsistent format, ignoring JSON line\n");
printf_err(log_warning,
"%s fingerprint version with inconsistent format, ignoring JSON line\n",
fp_type_string.c_str());
return;
}
}
first_line = false;
}

if (fingerprint_type_and_version.first == fingerprint_type_quic) {
if (first_line == true) {
quic_fingerprint_format = fingerprint_type_and_version.second;
} else {
if (fingerprint_type_and_version.second != quic_fingerprint_format) {
printf_err(log_warning,
"%s fingerprint version with inconsistent format, ignoring JSON line\n",
fp_type_string.c_str());
return;
}
}
Expand Down
Loading

0 comments on commit 6ea8fa6

Please sign in to comment.