-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using scikit-fem
on pip: solver remark
#690
Comments
@gdmcbain weren't you using Windows at some point? Are you qualified to comment on this? Is there anything we can do here? |
Just as a sidenote, why does conda-forge still have version |
I believe it's caused by this line: https://github.com/conda-forge/scikit-fem-feedstock/blob/master/recipe/meta.yaml#L39
|
Yes, I use Ubuntu 20.04.2 LTS, Pop!_OS 21.04, and MS-Windows 10 regularly, all with Python installed from Miniconda and scikit-fem then pip-installed from master or other branch of this repo or sometimes PyPI. I do have the anecdotal finding that when I rewrote the unsteady Navier–Stokes tutorial example from FEniCS in scikit-fem in gdmcbain/fenics-tuto-in-skfem#5, the timings were roughly: FEniCS Linux half an hour, scikit-fem Linux twelve minutes, scikit-fem MS-Windows five minutes, but the Windows and Linux were different machines, so really not a very useful comparison. I find that I can |
Trying to fix conda-forge in a PR: conda-forge/scikit-fem-feedstock#17 Edit: Oops, used wrong branch. |
Perfect!! Thanks! I have been meaning to switch to conda for quite sometime, but have resisted so far since another package |
Thanks, I think conda uses |
Well, I tried this out on my System 76 laptop running Pop!_OS 21.04—not on that unsteady Navier–Stokes solver, that's all preconditioned Krylov, but for the steady Navier–Stokes solver in ex27 which (currently still) needs direct solvers to track the steady solution branch past the supercritical Hopf bifurcation, and got the opposite result! The default SuperLU was significantly quicker than pypardiso! (The latter, changing only ex27, is in gdmcbain@bd2553b.) Here's how SnakeViz saw it, run as, e.g. python -m cProfile -o ex27.pardiso.prof -s cumtime docs/examples/ex27.py
snakeviz ex27.pardiso.prof Master (SuperLU): pypardiso: |
Hmm interesting and opposite of what I've read so far. I can try running |
Yes, please do. Maybe ex27 is too small? I did reduce the number of elements so that it wouldn't take so long every time |
To that effect, here is the snakeviz from running ex36 with the two solvers PyPardisoScipyThe assembly time is also large for my case, but the difference in solving is clear. Now onto, ex27. Codeimport numpy as np
from scipy.sparse import bmat
from skfem.helpers import grad, transpose, det, inv, identity
from skfem import *
mu, lmbda = 1., 1.e3
def F1(w):
u = w["disp"]
p = w["press"]
F = grad(u) + identity(u)
J = det(F)
Finv = inv(F)
return p * J * transpose(Finv) + mu * F
def F2(w):
u = w["disp"]
p = w["press"].value
F = grad(u) + identity(u)
J = det(F)
Js = .5 * (lmbda + p + 2. * np.sqrt(lmbda * mu + .25 * (lmbda + p) ** 2)) / lmbda
dJsdp = ((.25 * lmbda + .25 * p + .5 * np.sqrt(lmbda * mu + .25 * (lmbda + p) ** 2))
/ (lmbda * np.sqrt(lmbda * mu + .25 * (lmbda + p) ** 2)))
return J - (Js + (p + mu / Js - lmbda * (Js - 1)) * dJsdp)
def A11(w):
u = w["disp"]
p = w["press"]
eye = identity(u)
F = grad(u) + eye
J = det(F)
Finv = inv(F)
L = (p * J * np.einsum("lk...,ji...->ijkl...", Finv, Finv)
- p * J * np.einsum("jk...,li...->ijkl...", Finv, Finv)
+ mu * np.einsum("ik...,jl...->ijkl...", eye, eye))
return L
def A12(w):
u = w["disp"]
F = grad(u) + identity(u)
J = det(F)
Finv = inv(F)
return J * transpose(Finv)
def A22(w):
u = w["disp"]
p = w["press"].value
Js = .5 * (lmbda + p + 2. * np.sqrt(lmbda * mu + .25 * (lmbda + p) ** 2)) / lmbda
dJsdp = ((.25 * lmbda + .25 * p + .5 * np.sqrt(lmbda * mu + .25 * (lmbda + p) ** 2))
/ (lmbda * np.sqrt(lmbda * mu + .25 * (lmbda + p) ** 2)))
d2Jdp2 = .25 * mu / (lmbda * mu + .25 * (lmbda + p) ** 2) ** (3/2)
L = (-2. * dJsdp - p * d2Jdp2 + mu / Js ** 2 * dJsdp ** 2 - mu / Js * d2Jdp2
+ lmbda * (Js - 1.) * d2Jdp2 + lmbda * dJsdp ** 2)
return L
mesh = MeshTet().refined(3)
uelem = ElementVectorH1(ElementTetP2())
pelem = ElementTetP1()
elems = {
"u": uelem,
"p": pelem
}
basis = {
field: Basis(mesh, e, intorder=2)
for field, e in elems.items()
}
du = basis["u"].zeros()
dp = basis["p"].zeros()
stretch_ = 1.
ddofs = [
basis["u"].find_dofs(
{"left": mesh.facets_satisfying(lambda x: x[0] < 1.e-6)},
skip=["u^2", "u^3"]
),
basis["u"].find_dofs(
{"bottom": mesh.facets_satisfying(lambda x: x[1] < 1.e-6)},
skip=["u^1", "u^3"]
),
basis["u"].find_dofs(
{"back": mesh.facets_satisfying(lambda x: x[2] < 1.e-6)},
skip=["u^1", "u^2"]
),
basis["u"].find_dofs(
{"front": mesh.facets_satisfying(lambda x: np.abs(x[2] - 1.) < 1e-6)},
skip=["u^1", "u^2"]
)
]
dofs = {}
for dof in ddofs:
dofs.update(dof)
du[dofs["left"].all()] = 0.
du[dofs["bottom"].all()] = 0.
du[dofs["back"].all()] = 0.
du[dofs["front"].all()] = stretch_
I = np.hstack((
basis["u"].complement_dofs(dofs),
basis["u"].N + np.arange(basis["p"].N)
))
@LinearForm(nthreads=8)
def a1(v, w):
return np.einsum("ij...,ij...", F1(w), grad(v))
@LinearForm(nthreads=8)
def a2(v, w):
return F2(w) * v
@BilinearForm(nthreads=8)
def b11(u, v, w):
return np.einsum("ijkl...,ij...,kl...", A11(w), grad(u), grad(v))
@BilinearForm(nthreads=8)
def b12(u, v, w):
return np.einsum("ij...,ij...", A12(w), grad(v)) * u
@BilinearForm(nthreads=8)
def b22(u, v, w):
return A22(w) * u * v
from time import time
t0 = time()
for itr in range(12):
uv = basis["u"].interpolate(du)
pv = basis["p"].interpolate(dp)
K11 = asm(b11, basis["u"], basis["u"], disp=uv, press=pv)
K12 = asm(b12, basis["p"], basis["u"], disp=uv, press=pv)
K22 = asm(b22, basis["p"], basis["p"], disp=uv, press=pv)
f = np.concatenate((
asm(a1, basis["u"], disp=uv, press=pv),
asm(a2, basis["p"], disp=uv, press=pv)
))
K = bmat(
[[K11, K12],
[K12.T, K22]], "csr"
)
t1 = time()
uvp = solve(*condense(K, -f, I=I), solver="pardiso")
print(f"Solve time: {time() - t1}")
delu, delp = np.split(uvp, [du.shape[0]])
du += delu
dp += delp
normu = np.linalg.norm(delu)
normp = np.linalg.norm(delp)
print(f"{itr+1}, norm_du: {normu}, norm_dp: {normp}")
if normu < 1.e-8 and normp < 1.e-8:
break
print(f"Total time: {time() - t0}")
if __name__ == "__main__":
mesh.save(
"example36_results.xdmf",
{"u": du[basis["u"].nodal_dofs].T, "p": dp[basis["p"].nodal_dofs[0]]},
) |
A weird thing that I notice is that when I try scikit-fem/docs/examples/ex27.py Line 109 in c07ff9f
I get a smaller time for the solve in SuperLU whereas Pardiso is constant. Is this how it is supposed to be? Or am i doing something wrong here? (49.69 s for SuperLU vs 68.75 for Pardiso). |
Ah, I have previously, but since pygmsh switched first its licence and then its API, the code referring to it was removed and replaced by a fixed mesh... scikit-fem/docs/examples/ex27.py Line 109 in c07ff9f
Unfortunately appending
i.e. boundaries are not preserved during refinement. Let me see if I can dig out the old mesh-generating function and |
Oh, that worked for you? |
Ah, hang on: what is your changed line? Note that the self.mesh.refined(n) it would not have done anything; that would be consistent with
|
See #536 for discussion of the change from |
You're right, I seem to have confounded the two. And there is the error you posted. |
But then, if I naively do In [1]: from scipy.sparse import random as rand
In [2]: import numpy as np
In [3]: import pypardiso
In [4]: from scipy.sparse.linalg import spsolve as spl
In [5]: A = rand(10000, 10000, density=0.01, format="csr", dtype= np.float64)
In [6]: A += identity(A.shape[0])
In [7]: b = np.random.randn(A.shape[0])
In [8]: %timeit pypardiso.spsolve(A,b)
45.6 ms ± 8.92 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [9]: %timeit spl(A, b)
In [12]: A = rand(1000, 1000, density=0.01, format="csr", dtype= np.float64) + identity(1000)
In [13]: %timeit spl(A, b)
286 ms ± 3.42 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [14]: %timeit pypardiso.spsolve(A,b)
330 µs ± 2.28 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Any thoughts @gdmcbain ? |
Well, I'm not an expert but my intuition is that the sparse matrices assembled from finite element problems do have some kind of structure compared to How do we go with simple Laplace + Dirichlet, as in ex01? |
You could try reordering the rows and the columns, e.g., using |
Ah, yes, reordering #168. |
I don't use |
0.41s (pardiso) vs 1.495s (SuperLU) on my laptop. Will check from skfem import *
from skfem.helpers import dot, grad
# create the mesh
m = MeshTri().refined(8)
# or, with your own points and cells:
# m = MeshTri(points, cells)
e = ElementTriP1()
basis = Basis(m, e) # shorthand for CellBasis
# this method could also be imported from skfem.models.laplace
@BilinearForm
def laplace(u, v, _):
return dot(grad(u), grad(v))
# this method could also be imported from skfem.models.unit_load
@LinearForm
def rhs(v, _):
return 1.0 * v
A = asm(laplace, basis)
b = asm(rhs, basis)
# or:
# A = laplace.assemble(basis)
# b = rhs.assemble(basis)
# enforce Dirichlet boundary conditions
A, b = enforce(A, b, D=m.boundary_nodes())
# solve -- can be anything that takes a sparse matrix and a right-hand side
x = solve(A, b) # solve(A, b, solver="pardiso")
# plot the solution
from skfem.visuals.matplotlib import plot, savefig
plot(m, x, shading='gouraud', colorbar=True)
savefig('solution.png') |
Thanks! I have been meaning to do this for sometime. Back then I did not have a good understanding of package management and virtual environments. |
This is correct. These days a |
O. K., with the proposed fix #693 to #691, I can now read MSH 4.1 (or rather VTU recoded & saved from MSH 4.1) in ex27 and so can generate meshes of different fineness to see how pardiso scales compared to the default solver. Halving the mesh-size with (see #693 for original generation instructions at gmsh -2 -clscale 0.1 backward-facing_step.geo and recoding & saving with |
Halving again (
Hitherto I had not noticed any difference in the solutions obtained by pardiso & scipy, but I think this says that there must be. It's true that this is a fairly challenging nonlinear problem. |
Hmm, the failure isn't deterministic either; here it was rerun (without profiling); same
|
Another rerun kept going, but the continuation really wasn't getting anywhere:
|
Thanks for the detailed analysis @gdmcbain. Indeed its a bit htereogeneous as to when Pardiso outperforms SuperLU and vice-versa. I will check things on my laptop too this week. The significant speed up in incompressible hyperelsaticity led to me believe that it should carry over to other nonlinear problems as well, but not that simple I guess. |
Yes, so far Pardiso looks better on everything except ex27 (lucky me), so I do think it's worth persisting with, particularly as it's so easy to use in scikit-fem via pypardiso. I'd say it was well worth including in the suite of integrations #474. I don't know why ex27 was problematic or why Pardiso gave different results to SciPy and different results from run to run, but would be interesting in learning more if this is expected or other evidence is found. |
Although this is tangentially related to
scikit-fem
since the primary objective ofscikit-fem
is assembly and not solving the resulting systems, it maybe important to note that when working with a defaultpip
installation on Windows, there is a significant performance degradation in solving, since the default sparse solver isSuperLU
which is super slow. I think the conda installation may not suffer from this since the default solver isUMFPACK
(?) which can be faster. There maybe further improvements possible when numpy is compiled against an optimized BLAS but that is something most users wouldn't care for.I managed to get
Pardiso
working on Windows and with pypardiso things are significantly faster. See the timings on example 36 withnthreads=8
(andMesh.refined(3)
) and a slight addition atscikit-fem/skfem/utils.py
Line 102 in c07ff9f
namely simply calling
pypardiso.spsolve
Timings
Pardiso
SuperLU
The text was updated successfully, but these errors were encountered: