Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

benchmark ideas #240

Open
StefanKarpinski opened this issue Nov 7, 2018 · 5 comments
Open

benchmark ideas #240

StefanKarpinski opened this issue Nov 7, 2018 · 5 comments

Comments

@StefanKarpinski
Copy link

I could have sworn I'd previously opened an issue here with an idea for a benchmark but now I can't find it. There was a discussion of some algorithms that are hard/impossible to do in a vectorized way, and a couple that came up were:

Please post and discuss more ideas here!

@KristofferC
Copy link
Contributor

KristofferC commented Nov 7, 2018

I could do a PDE solver (FEM) benchmark. A funny thing about PDEs is that one solving the simplest possible (regular grid with diffusion) is a couple of lines while a general framework can be arbitrarily big. But something that is kinda realistic should be possible in a few hundred lines.

@ChrisRackauckas
Copy link

A funny thing about PDEs is that one solving the simplest possible (regular grid with diffusion) is a couple of lines while a general framework can be arbitrarily big.

Very true. Though I wonder if we should just link over to @johnfgibson's https://github.com/johnfgibson/julia-pde-benchmark which is really well done already. Instead of a PDE, we could also do something like the 4th order Runge-Kutta method.

https://en.wikipedia.org/wiki/Runge%E2%80%93Kutta_methods#The_Runge%E2%80%93Kutta_method

A quick Julia code for using it to simulate the Lorenz equation is:

function f(du,u)
 du[1] = 10.0*(u[2]-u[1])
 du[2] = u[1]*(28.0-u[3]) - u[2]
 du[3] = u[1]*u[2] - (8/3)*u[3]
end

function rk4_solve(u,dt,n)
  k1 = similar(u); k2 = similar(u)
  k3 = similar(u); k4 = similar(u)
  tmp = similar(u)
  for i in 1:n
    f(k1,u)
    @. tmp = u + dt*k1/2
    f(k2,u)
    @. tmp = u + dt*k2/2
    f(k3,u)
    @. tmp = u + dt*k3
    f(k4,u)
    @. u = u + (k1 + 2k2 + 2k3 + k4)/6
  end
  u
end

u = [1.0,0.0,0.0]
dt = 1/2
n = 100
u = rk4_solve(u,dt,n)

Some things which are immediately highlighted in this example is:

@jakobnissen
Copy link

To give some context for the "counting substring problem": I work in bioinformatics, where we often have long sequences of DNA, 100's of thousands to millions of bases:
TAGTGATAGTGCTTCGGGAAAACC ...
A kmer is a DNA-sequence of length k. For small values of k, we can pack them into unsigned integers, which is often the only way to process sequences efficiently enough to handle millions of sequences. So over time, kmer analysis has become a very common procedure, almost a small subfield.
For example, a common thing to do is choose some small K (typically 4), and count the frequency of all 4^4 possible 4-mers. Any sequence, no matter the length, can then be represented by a 256-length vector of frequencies.
To make things worse, some of the characters in the sequence are undetermined (marked by "N"). Any kmer containing an N at any position is invalid and must be ignored.

So the challenge is: Given a string 250.000 characters long, tally up each 4-mer and count their frequency, skipping any 4-mer that contains an invalid character.
In BioJulia, the heavy lifting is done by this generic kmer iterator protocol, and finishes in ~450 microseconds.

@StefanKarpinski
Copy link
Author

StefanKarpinski commented Nov 8, 2018

So I guess the microbenchmark version of the k-mer counting code would be to fix a specific length like 4-mers and then count the frequency in a long, random fake DNA sequence, and have some simpler specialized code for that. Or maybe an example DNA sequence, although for microbenchmarks we generally haven't had data files, but there's no reason we couldn't.

@KristofferC
Copy link
Contributor

We have some data files in this repo https://github.com/JuliaCI/BaseBenchmarks.jl/tree/master/src/problem/data.

Keno pushed a commit that referenced this issue Feb 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants