Skip to content

Commit

Permalink
perf(fe): optimize keyword hash table
Browse files Browse the repository at this point in the history
Improve the keyword hash table's hash function based on advice from
youtube.com/@bunnyboss3707. This new hash function shrinks the table
from 512 entries to 256 entries, reducing dcache pressure. The new
hash function is also computationally simpler because it embeds the
index modulo operation.

Overall, the new hash table improves jQuery parse performance by about
0.9% on my Apple M1 machine:

    Benchmark                                     Time             CPU      Time Old      Time New       CPU Old       CPU New
    --------------------------------------------------------------------------------------------------------------------------
    benchmark_parse_file_pvalue                 0.0000          0.0000      U Test, Repetitions: 20 vs 20
    benchmark_parse_file_mean                  -0.0094         -0.0094       1255343       1243571       1255046       1243307
    benchmark_parse_file_median                -0.0093         -0.0092       1254804       1243189       1254514       1243025
    benchmark_parse_file_stddev                -0.0571         -0.0800          3088          2912          2853          2625
    benchmark_parse_file_cv                    -0.0481         -0.0713             0             0             0             0
  • Loading branch information
strager committed Dec 22, 2023
1 parent 8002d74 commit 3332f89
Show file tree
Hide file tree
Showing 3 changed files with 112 additions and 360 deletions.
24 changes: 13 additions & 11 deletions src/quick-lint-js/fe/keyword-lexer.h
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ namespace quick_lint_js {
struct Keyword_Lexer {
using Selection_Type = std::uint32_t;
using Hash_Type = std::uint32_t;
using Seed_Type = std::uint64_t;
using Seed_Type = std::uint32_t;

static constexpr int padding_size = 17;

Expand Down Expand Up @@ -71,20 +71,22 @@ struct Keyword_Lexer {
#endif

// Step 2 of the hash function for Lexer::identifier_token_type().
static Hash_Type mix(Selection_Type selection, Seed_Type seed) {
//
// The table's size is (1 << table_size_shift).
//
// This function reduces the hash into an index into the hash table. In other
// words, this function returns a number between 0 (inclusive) and
// (1 << table_size_shift) (exclusive).
static Hash_Type mix_and_reduce(Selection_Type selection, Seed_Type seed,
Hash_Type table_size_shift) {
// This hash function executes quickly, but might produce a lot of
// collisions. Collisions are fine, though; collisions just slow down table
// generation, not run-time.

// Pierre L’Ecuyer. 1999. Tables of linear congruential generators of
// different sizes and good lattice structure. Mathematics of Computation of
// the American Mathematical Society 68, 225 (1999), 249–260.
//
// https://www.ams.org/journals/mcom/1999-68-225/S0025-5718-99-00996-5/S0025-5718-99-00996-5.pdf
std::uint64_t magic = 4292484099903637661ULL;

std::uint64_t x = static_cast<std::uint64_t>(selection) ^ seed;
return static_cast<std::uint32_t>(multiply_u64_get_top_64(x, magic));
// This hash function was suggested by youtube.com/@bunnyboss3707:
// https://www.youtube.com/watch?v=DMQ_HcNSOAI&lc=UgxPDeWYyiAdMsUCV8V4AaABAg
return static_cast<std::uint32_t>(selection * seed) >>
(32 - table_size_shift);
}

// Compare two strings, 'a' and 'b', each with size 'size'.
Expand Down
Loading

0 comments on commit 3332f89

Please sign in to comment.