Skip to content

jnordwick/zig-1brc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

1 Billion Row Challenge in zig

This was an attempt to do the billion row challenge in zig and use std. I wanted to see if it is any better than it was six months ago. A lot of optimizations could be done and cut significant time from it.

The code is a little messy, but it was done in half a day and I’m tired.

The data files aren’t included but the script to generate them is. You can get some of the base data files from the original github:

https://github.com/gunnarmorling/1brc

Strategy

The file is mmaped and split into regions for each thread. Each thread hashes the station names and keeps a running sum and count. It treats all numbers as integers of fixed precision (2 decimals - the second for rounding). It then merges the hash tables, sorts the aggregations table keys after iterating through the table, the looks up the keys on order with the stats and prints out the ints (inserting the spot in the right place).

Optimizations

  • [ ] The hash table is really slow
  • [ ] dont iterate over the hashtable and just grab the arrays out of it
  • [ ] sort in the original hashtable arrays
  • [ ] pairwise merge in the threads recursivey
  • [ ] turn the station names in some ints and direct hash them

About

Billion Row Challenge in Zig

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published