You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wrote this up a little while back and wanted to share this for the sake of posterity (in case anybody is feeling ambitious). I think there are 2 (related) avenues for optimizing the Temperature calculation in tabulated mode. This is definitely not worth pursuing unless we know that the temperature calculation is a performance bottleneck!
This entire discussion entirely ignores the impact of metals on mean molecular weight since that is always added after the fact.
Background
In tabulated mode, Grackle provides 2D and 3D grids to compute mmw via bilinear and trilinear interpolation respectively. We might write these functions as mmw = f(log(T), log(nH)) and mmw = f(logT, log(1+z), log(nH)). A relevant detail for later: T has constant spacing in log-space.
To compute T, we take rho (total mass density), eint (specific internal energy), the assumed HydrogenMassFraction, the assumed gamma (adiabatic index), and (if applicable) and we solve iteratively for T.
In more detail:
from rho and the assumed HydrogenMassFraction, we get nH
for a calorically perfect ideal gas we know:
eint = p / (rho*(gamma-1))
p = rho * kboltz * T/(mmw * mH)
in other words: mmw = kboltz * T / (mH * (gamma - 1) * eint)
Essentially, we need to find the root of the following function: f(log(T), log(1+z), log(nH)) - kboltz *T / (mH * (gamma - 1) * eint) = 0
In practice, this isn't terrible to solve because for a given cell, T is the only unspecified variable.
For simplicity, we rewrite this as: f(log(T), log(1+z), log(nH)) - c * T = 0
Idea 1: Exploit structure of the interpolation grid
The first idea's premise is simple. Rather than directly solving for the root, break the calculation into 2 parts.
In the first part, we identify the pair of T values that are part of the interpolation grid that bound the root (and compute the mean molecular weight at both locations)
Then separately solve for the root.
Details about Part 1: Solving for the Tgrid values bounding the root
We can exploit some basic tricks here to do this efficiently based on our initial guess for the mean molecular weight fguess(log(T), log(1+z), log(nH)).
We just need to adopt a very simplistic choice:
to start, it's useful to know that on the existing grids, mmw has a minimum value of 0.612 (all Hydrogen and Helium is ionized) and a maximum of 1.281 (there is no free electrons).1
So let's simply define fguess(log(T), log(1+z), log(nH)) as a function that always returns 0.8854 (the geometric mean of mmw's extrema).
Using this choice for fguess ensures that the true mean molecular weight, mmw_true, always satisfies mmw_guess/1.45 <= mmw_true <= 1.45*mmw_guess. Rephrasing this in terms of temperature:
T_guess/1.45 <= T_true <= 1.45*T_guess.
In other words, abs( log10(T_true) - log10(T_guess) ) is less than log10(1.45) or 0.1604.
This is relevant because the Temperature grid has a constant spacing where log(T) always differs by exactly 0.1.
Putting this together, in the first part of the calculation, we would do the following:
solve for Tguess
Identify the nearest 6 temperature grid point to Tguess in log-space (and compute mmw at each location). These Temperatures are GUARANTEED to bound the true value.
Use these mmw values to identify the pair of these T values that bound the root
Details about Part 2: Perform the rest of the solve
Let's call the pair of the T values as T_left and T_right (and their associated mmw values as mmw_left and mmw_right).
A property of our usage of bilinear and trilinear interpolation is that we can analytically write the interpolation function between T_left and T_right as mmw = f(logT, ...) = a * log10(T) + b
Thus we end up solving a * log10(T) + b - c * T = 0
Unfortunately, this doesn't have an analytic solution, so we need to solve this analytically. But, at least we can analytically write out the derivatives.
Performance considerations
It's a little unclear whether this idea alone is adequate to achieve better performance. But it's worth noting that
Part 1 will probably be at least as fast as 6 iterations of the existing solver. It could easily be faster since the access order of the interpolation table is much faster. On CPUs in particular, these 6 evaluations could be vectorized very efficiently since we know ahead of time that we are loading contiguous values from the tables
If we could come up with a slightly better, yet still inexpensive, fguess would reduce the cost of part 1.2
Each iteration in Part 2 will be much faster than an iteration in the existing implementation.
Idea 2: Modifying the interpolation properties
This is only worth considering alongside with Idea 1. Essentially the idea would be to change the interpolation details such that either:
the entire interpolation grid interpolates log(mmw) rather than mmw
mmw is interpolated with respect to T rather than logT (we would not change anything about the grid, it would still have constant spacing in log-space).
If we adopted either of these solutions, then once we know the pair of Tgrid values bounding the root (T_left and T_right), then there is an analytic solution.
Conclusions
By adopting both ideas 1 and 2, the calculation would no longer be iterative. Instead it would be constant in time and involve no branching. I'm fairly confident it would be faster than the existing solution
The main disadvantage is that the calculation might require custom code for different cooling tables. But I don't think that's necessarily deal-breaker, especially if this provides a significant performance boost.
Footnotes
Numbers are slightly rounded here, for demonstration purposes. While we would want to use the actual values in an implementation, they won't change the story ↩
If fguess provides a guess within a factor of 10^0.15, then we would only need to consider the nearest 5 grid values. Likewise a guess within 10^0.1 or 10^0.05 would only consider the nearest 4 or 3 grid values. ↩
The text was updated successfully, but these errors were encountered:
I wrote this up a little while back and wanted to share this for the sake of posterity (in case anybody is feeling ambitious). I think there are 2 (related) avenues for optimizing the Temperature calculation in tabulated mode. This is definitely not worth pursuing unless we know that the temperature calculation is a performance bottleneck!
This entire discussion entirely ignores the impact of metals on mean molecular weight since that is always added after the fact.
Background
In tabulated mode, Grackle provides 2D and 3D grids to compute mmw via bilinear and trilinear interpolation respectively. We might write these functions as
mmw = f(log(T), log(nH))
andmmw = f(logT, log(1+z), log(nH))
. A relevant detail for later:T
has constant spacing in log-space.To compute T, we take
rho
(total mass density),eint
(specific internal energy), the assumed HydrogenMassFraction, the assumedgamma
(adiabatic index), and (if applicable) and we solve iteratively forT
.In more detail:
rho
and the assumed HydrogenMassFraction, we getnH
eint = p / (rho*(gamma-1))
p = rho * kboltz * T/(mmw * mH)
mmw = kboltz * T / (mH * (gamma - 1) * eint)
Essentially, we need to find the root of the following function:
f(log(T), log(1+z), log(nH)) - kboltz *T / (mH * (gamma - 1) * eint) = 0
In practice, this isn't terrible to solve because for a given cell,
T
is the only unspecified variable.For simplicity, we rewrite this as:
f(log(T), log(1+z), log(nH)) - c * T = 0
Idea 1: Exploit structure of the interpolation grid
The first idea's premise is simple. Rather than directly solving for the root, break the calculation into 2 parts.
T
values that are part of the interpolation grid that bound the root (and compute the mean molecular weight at both locations)Details about Part 1: Solving for the Tgrid values bounding the root
We can exploit some basic tricks here to do this efficiently based on our initial guess for the mean molecular weight
fguess(log(T), log(1+z), log(nH))
.We just need to adopt a very simplistic choice:
mmw
has a minimum value of0.612
(all Hydrogen and Helium is ionized) and a maximum of1.281
(there is no free electrons).1fguess(log(T), log(1+z), log(nH))
as a function that always returns 0.8854 (the geometric mean ofmmw
's extrema).Using this choice for
fguess
ensures that the true mean molecular weight,mmw_true
, always satisfiesmmw_guess/1.45 <= mmw_true <= 1.45*mmw_guess
. Rephrasing this in terms of temperature:T_guess/1.45 <= T_true <= 1.45*T_guess
.abs( log10(T_true) - log10(T_guess) )
is less thanlog10(1.45)
or0.1604
.log(T)
always differs by exactly0.1
.Putting this together, in the first part of the calculation, we would do the following:
Tguess
Tguess
in log-space (and compute mmw at each location). These Temperatures are GUARANTEED to bound the true value.Details about Part 2: Perform the rest of the solve
Let's call the pair of the T values as
T_left
andT_right
(and their associated mmw values asmmw_left
andmmw_right
).A property of our usage of bilinear and trilinear interpolation is that we can analytically write the interpolation function between
T_left
andT_right
asmmw = f(logT, ...) = a * log10(T) + b
Thus we end up solving
a * log10(T) + b - c * T = 0
Unfortunately, this doesn't have an analytic solution, so we need to solve this analytically. But, at least we can analytically write out the derivatives.
Performance considerations
It's a little unclear whether this idea alone is adequate to achieve better performance. But it's worth noting that
fguess
would reduce the cost of part 1.2Idea 2: Modifying the interpolation properties
This is only worth considering alongside with Idea 1. Essentially the idea would be to change the interpolation details such that either:
log(mmw)
rather thanmmw
mmw
is interpolated with respect toT
rather thanlogT
(we would not change anything about the grid, it would still have constant spacing in log-space).If we adopted either of these solutions, then once we know the pair of Tgrid values bounding the root (
T_left
andT_right
), then there is an analytic solution.Conclusions
By adopting both ideas 1 and 2, the calculation would no longer be iterative. Instead it would be constant in time and involve no branching. I'm fairly confident it would be faster than the existing solution
The main disadvantage is that the calculation might require custom code for different cooling tables. But I don't think that's necessarily deal-breaker, especially if this provides a significant performance boost.
Footnotes
Numbers are slightly rounded here, for demonstration purposes. While we would want to use the actual values in an implementation, they won't change the story ↩
If
fguess
provides a guess within a factor of10^0.15
, then we would only need to consider the nearest 5 grid values. Likewise a guess within10^0.1
or10^0.05
would only consider the nearest 4 or 3 grid values. ↩The text was updated successfully, but these errors were encountered: