-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Matrix-matrix solver on GPU #1381
Comments
Hi Baptiste, Thanks for trying Ginkgo and for your report! To get a bit more info, what is the size of your A matrix in this case? The MultiVector is simply a rewrite of the current batch code (which uses Dense currently). We are trying to merge the entire batch functionality into our code. |
Hi @blegouix, could you also specify how you are executing strategy 1? Our solvers can handle dense matrices as right-hand-side, so you can just call |
A size is 10x10, B and X are 10x1000. Here is the code for strategy 1:
1000 individuals kernels are launched sequentially: |
I think strategy 1 is the right choice for this problem as you have only 1 matrix. I assume n=10 and n_batch=1000 in the code above ? And you are calling the If you want to solve it with strategy 2, then you will have to duplicate the matrix, because the batched solvers in |
Yes I call it only once as in the provided code with n=10 and n_batch=1000. It is not the expected behavior ? |
Yes, that is the expected behaviour. It is a bit weird that you are seeing 1000 separate kernel calls. Our monolithic solvers support multiple RHS, so there should be only one solver call. There might be multiple kernel calls inside that single solver call though (dot, norm, axpy, spmv etc) and these kernels are called every iteration, so if you have 50 iterations for example, with CG, then maybe you see something around 7*50 kernel calls. Essentially, you do these solver operations for all rhs in parallel. |
Ok great, so I have to figure out what is causing that. It may be the solver/preconditionner choice ? I would be happy to take your ideas |
You could add a |
I will add the Convergence logger but I can already confirm that the solver converges (I tried with A=Identity / B=ones or simple matrices like that and I obtain right result, plus error computed norm(B-AX) checked). And I confirm that I have n_batch individual kernels launched in sequential (1000 is not the number of iterations). I tried with n_batch=10 or n_batch=100 and I counted them. This is not the number of iterations of the solver. |
Ok, this is related to the size of the matrix 10x10 and the use of a Jacobi-block preconditionner (which is obviously not the good choice). What I see is not the actual Bicgstab iteration but something related to the preconditionning, because 10 being smaller than 32, block-jacobi is enough to inverse the matrix by itself and solves directly the systems. Great thanks, |
Hello,
I want to implement a matrix matrix solver AX=B with A sparse and X and B dense matrixes. To do so, I tried two strategies:
I obtain the following behaviours:
The first solution is not satisfying because the columns of B and X are processed sequentialy in n_batch 1-block kernels (no parallelization of the individual problems). The second solution is not satisfying too because it involves a huge DtoD memorycopy and fills the memory with duplicated information.
Is there already a way to do it or is the developments related to new MultiVector class appropriate for this problem and should we expect a solver anytime soon ?
Any known workaround with for loop or Kokkos::parallel_for ?
Regards,
The text was updated successfully, but these errors were encountered: