Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the L1 jacobi #1310

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from
Open

Add the L1 jacobi #1310

wants to merge 6 commits into from

Conversation

yhmtsai
Copy link
Member

@yhmtsai yhmtsai commented Mar 23, 2023

This add the L1 to Jacobi. The detail is shown in https://epubs.siam.org/doi/abs/10.1137/100798806
original preconditioner is on $M$ (which is diagonal of A for scalar Jacobi or diagonal block of A)
L1 smoother is on $M' = M + D^{\mathcal{l}_1}$ whose $D^{\mathcal{l}_1}$ is a diagonal matrix with

$$D_{ii}^{\mathcal{l}1} = \sum{j \in \text{the off-diagonal block of corresponding block of i}}{|A_{ij}|}$$

I am not sure whether the theorems in the paper still work for negative diagonal entries.

@yhmtsai yhmtsai added the 1:ST:ready-for-review This PR is ready for review label Mar 23, 2023
@yhmtsai yhmtsai self-assigned this Mar 23, 2023
@ginkgo-bot ginkgo-bot added reg:testing This is related to testing. mod:core This is related to the core module. mod:cuda This is related to the CUDA module. mod:reference This is related to the reference module. type:preconditioner This is related to the preconditioners mod:hip This is related to the HIP module. labels Mar 23, 2023
@MarcelKoch
Copy link
Member

I'm a bit confused. The paper talks about the L1 Jacobi only in a distributed setting, ie the off-diagonal entries you mentioned are the entries of our non-local matrix in the distributed matrix. But your implementation is for non-distributed matrices. Can you perhaps provide another reference for the non-distributed case?

@codecov
Copy link

codecov bot commented Mar 23, 2023

Codecov Report

Attention: Patch coverage is 96.71053% with 5 lines in your changes missing coverage. Please review.

Project coverage is 90.82%. Comparing base (bf09bd2) to head (259f96f).
Report is 3 commits behind head on develop.

Current head 259f96f differs from pull request most recent head 14d1a98

Please upload reports for the commit 14d1a98 to get more accurate results.

Files Patch % Lines
core/device_hooks/common_kernels.inc.cpp 0.00% 2 Missing ⚠️
reference/test/preconditioner/jacobi_kernels.cpp 95.12% 2 Missing ⚠️
common/unified/preconditioner/jacobi_kernels.cpp 83.33% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1310      +/-   ##
===========================================
+ Coverage    90.03%   90.82%   +0.79%     
===========================================
  Files          759      577     -182     
  Lines        61167    48872   -12295     
===========================================
- Hits         55074    44390   -10684     
+ Misses        6093     4482    -1611     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@yhmtsai
Copy link
Member Author

yhmtsai commented Mar 23, 2023

it's only based on the nonoverlapping partition.
we can also add the L1 to the Schwarz for the distributed local preconditioner.
6.2 in the paper mentioned the L1 scalar Jacobi

@MarcelKoch
Copy link
Member

I think sec. 6.2 means using the preconditioner from 6. M = M_H + D^l with M_H = D. The D^l is still build from the non-local columns. I'm not saying it still works for non-distributed matrices, just that the paper description applies to distributed matrices. (The L1 smoother for non-distributed matrices would just be the standard Jacobi.)

@yhmtsai
Copy link
Member Author

yhmtsai commented Mar 23, 2023

If the partition is only designed with the mpi (let me use the mpi here to distinguish), L1 is indeed normal Jacobi
i.e using Schwarz with L1 will give this behavior.
But the paper from second paragraph of section 5 makes me think it works for any distributed (any block partition). design the partition by the mpi is one example

@yhmtsai
Copy link
Member Author

yhmtsai commented Mar 23, 2023

there's one reference from precond. They use L1-scalar Jacobi
but they also use the absolute value of diagonal part
https://etna.math.kent.edu/vol.55.2022/pp687-705.dir/pp687-705.pdf

@MarcelKoch
Copy link
Member

The L1 smoother from sec. 6 is an improvement on the hybrid smoother from sec 5. The hybrid smoother applies a local smoother only on the local matrix block of a distributed matrix and ignores the non-local part. The L1 smoother than takes the non-local part into account through the D^l matrix.
Again, this might also work for non-distributed matrices, but maybe it would not be the L1 smoother people would expect. But from the other paper you linked, people seem to be not very careful about this definition.

@MarcelKoch
Copy link
Member

Perhaps naming it 'lumped-Jacobi' would be better suited to differentiate between those two variants. And I guess the L1 from the first paper could be implemented as another distributed preconditioner.
Also, the paper you mentioned does a full row sum, so it also includes the diagonal value.

@yhmtsai
Copy link
Member Author

yhmtsai commented Mar 23, 2023

The first paper considers the diagonal entries are positive, so they are equivalent.
Distributed on mpi or distributed on the block to me are the same thing, which are ways to cut the matrices.
Applying L1 to LocalPreconditioner and to Schwarz are enough to distinguish.
With lumped, we still need to indicate it is L1 in the comments

@MarcelKoch
Copy link
Member

My argument is that what is implemented in this PR is not the L1 from the paper, so we should not call it that. But honestly, I don't have a great overview of how the term L1 smoother is usually interpreted.

@MarcelKoch
Copy link
Member

MarcelKoch commented Mar 23, 2023

And in your interpretation, there are still the diagonal entries missing in the sum. The preconditioner is D + D^l. Sorry, I missed the part that initializes the array with the (block) diagonal.

@yhmtsai
Copy link
Member Author

yhmtsai commented Mar 23, 2023

It should be theorically L1 smoother. otherwise, I just add some random Jacobi.
(I need to use [] not {} to represent row index set because I can not display {} with github latex)
scalar Jacobi: $\Omega_k = [k]$
block Jacobi: $\Omega_k = [row \in \text{block}_k]$

@MarcelKoch
Copy link
Member

The L1 smoother (Jacobi or GS) explicitly references distributed matrices. I still think that if we want to follow the naming of the paper, the variant implemented here should not be called L1. Instead, we should add a new distributed preconditioner which consists of a local solver (Jacobi, or in the future perhaps GS) and a diagonal matrix with the row-wise sum of the non-local part.

@MarcelKoch MarcelKoch self-requested a review April 6, 2023 15:53
@upsj upsj requested a review from pratikvn April 6, 2023 15:54
@pratikvn
Copy link
Member

pratikvn commented Apr 6, 2023

Maybe I am misunderstanding something, but in the non-distributed case, this is basically diagonal relaxation with weights equal to row sums, right ? So, I think it definitely makes sense to have as a smoother as an alternative to the scalar Jacobi smoother.

I think the name L1 is well suited because you are taking the L1 norm on the row space vectors ? $|v| = \sum_{n}|v_i|$.

Not sure if there is some mathematical basis for this yet, we could also try seeing if any other p-norms also make sense, Particularly with $p=\infty$, $p=2$.

@MarcelKoch
Copy link
Member

If we go with @pratikvn suggestion, and base this version on the L1 row norm, then we need to take the absolute value of the diagonal.

@yhmtsai
Copy link
Member Author

yhmtsai commented May 25, 2023

I am not sure whether to take the absolute value of the diagonal value.
It might make sense for the scalar version but not for the block version.
What's the meeting of the absolute of diagonal block to the diagonal value?

@MarcelKoch
Copy link
Member

Taking the absolute value is the only way for this preconditioner to make sense. And for blocks, I would just go with taking the absolute value component-wise. I think then it makes the most sense to add up these block-wise. Currently, you seem to only update the diagonal values of the diagonal block, or maybe I'm mistaken.

@yhmtsai
Copy link
Member Author

yhmtsai commented May 30, 2023

Yes, from the block-version, it only updates the diagonal value.
I think l1 is mostly from taking the summation of the absolute value in non-local block's row, not the entire row.
(l1 norm of non-local-block rows) L1 does not apply to the local block.
From the formula, it does not change the other values in the block.
In the smoother paper, they only consider the positive value in diagonal (see 5.5), although they have additional properties more than positive diagonal values.
In another paper, they consider the SPD cases, so the diagonal values are also positive.
When the diagonal value is positive, it does not change the result no matter whether we take the absolute value of the diagonal value or not.
Another treatment is to $A_{ii} + sign(A_{ii}) D^{l_1}$ if -A is SPD for scalar Jacobi,

From the paper or formula perspective, I prefer only adding values on the diagonal value without take absolute on the diagonal value first.
For the negative diagonal value, we do not know whether it works or not from the papers currently.

@yhmtsai yhmtsai force-pushed the l1_jacobi branch 2 times, most recently from 803333e to 5fc569a Compare July 1, 2024 09:47
* If it is true, it generates the preconditioner on A + Diag(sum_{k in
* off diagonal block of i} A_ik) not A.
*/
bool GKO_FACTORY_PARAMETER_SCALAR(l1, false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not convinced that l1 is a good name. Perhaps aggregate, or aggregate_l1.
One issue with just l1 is that we will also have the distributed L1 preconditioner, as mentioned in the original paper. But that is quite different from the one here, so we need better naming to prevent confusion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they are the same thing. They only use different way to define block (by node, diagonal block, or value), but they all add the L1 norm from the row in off-block-diag to the diag.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are not the same. Using the l1 defined here will not give you a distributed preconditioner. Also the distributed version can be used both with local Jacobi and Gauss-Seidel preconditioner.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not get it.
If you have the block list, the diagonal block { $B_i$ } and the off diagonal block { $O_i$ }
the L1 does the diagonal value update
the diagonal value $d_k$ in $B_i$ is updated by $d_k$ += sum abs( value in the same row as $d_k$ in $O_i$ )

For sure, L1 in this pr is specific for Jacobi because we can only know the block of block Jacobi in generation.
scalar L1 Jacobi takes each diagonal value as the block
block L1 Jacobi takes the blocks generated or given as the block
distributed L1 takes the local matrix (assume squared) as the block.

The diagonal treatment are the same

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point is this is a variation of the sequential jacobi. In the distributed case, we would have a separate L1 preconditioner (or with a different name) that is not specific to jacobi.
Having both with the same name would be confusing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see

Copy link

sonarcloud bot commented Jul 3, 2024

@MarcelKoch MarcelKoch self-requested a review July 11, 2024 13:16
@MarcelKoch MarcelKoch added this to the Ginkgo 1.9.0 milestone Aug 26, 2024
yhmtsai and others added 6 commits September 12, 2024 22:42
nvhpc 23.3 may optimize the formula in different way for scaled_value in make_diag_dominant, which lead the updated value is slightly different
Copy link
Member

@pratikvn pratikvn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some doc suggestions, otherwise LGTM!

Comment on lines +377 to +378
* If it is true, it generates the preconditioner on A + Diag(sum_{k in
* off diagonal block of i} A_ik) not A.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think two things are missing:

  1. You should mention that you take the absolute values of the elements.
  2. Maybe also add a note for the scalar case: "When block_size=1, this will be equivalent to taking the inverse of the row-sums as the preconditioner."

Copy link
Member

@MarcelKoch MarcelKoch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for changing the name, for me the name is fine now.
I do have some questions about the block version though.

Comment on lines +375 to +376
* Use L1 Jacboi, which is introduced in the paper A. H. Baker et al.
* "Multigrid smoothers for ultraparallel computing."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reference is not helpful, since it only mentions the parallel use case, which is not implemented here.

Comment on lines +377 to +378
* If it is true, it generates the preconditioner on A + Diag(sum_{k in
* off diagonal block of i} A_ik) not A.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, you should also take the absolute value of the diagonal, otherwise referencing the L1 norm doesn't make sense.

}
off_diag += abs(vals[i]);
}
diag[row] += off_diag;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
diag[row] += off_diag;
diag[row] = abs(diag[row]) + off_diag;

Comment on lines +377 to +378
* If it is true, it generates the preconditioner on A + Diag(sum_{k in
* off diagonal block of i} A_ik) not A.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also need to explain what happens if max_block_size > 1. TBH I still don't know how to define the aggregate in this case (i.e. should it be block-wise, or just added to the diagonal, is the absolute value taken of the full diagonal block, or just the diagonal?)

}
off_diag += abs(csr->get_const_values()[i]);
}
diag->get_values()[row] += off_diag;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
diag->get_values()[row] += off_diag;
diag->get_values()[row] = abs(diag->get_values()[row]) + off_diag;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we discussed this a little bit before, but did not get a reasonable choice when the diagonal values are negative.
L1 is originally designed for postive diagonal value

using Mtx = typename TestFixture::Mtx;
using index_type = typename TestFixture::index_type;
using value_type = typename TestFixture::value_type;
auto mtx = Mtx::create(this->exec, gko::dim<2>{4}, 8);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto mtx = Mtx::create(this->exec, gko::dim<2>{4}, 8);
auto mtx = Mtx::create(this->exec, gko::dim<2>{4, 4}, 8);

IMO the single parameter dim constructor is a bit confusing.

Comment on lines +1146 to +1149
1 1
2 2
-5 1
1 -1 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1 1
2 2
-5 1
1 -1 1
1 1 0 0
2 2 0 0
-5 0 1 0
1 0 -1 1

using Mtx = typename TestFixture::Mtx;
using index_type = typename TestFixture::index_type;
using value_type = typename TestFixture::value_type;
auto mtx = Mtx::create(this->exec, gko::dim<2>{4}, 9);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto mtx = Mtx::create(this->exec, gko::dim<2>{4}, 9);
auto mtx = Mtx::create(this->exec, gko::dim<2>{4, 4}, 9);

Comment on lines +1166 to +1169
l({{2.0, 1.0, 0.0, 0.0},
{2.0, 4.0, 0.0, 0.0},
{-5.0, 0.0, 6.0, 0.0},
{1.0, 0.0, -1.0, 2.0}}),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem right. Are the off-diagonal values of the diagonal block also added to the diagonal of the diagonal block? IMO that shouldn't be the case, only values from the off-diagonal blocks should be used (at least according to the original reference).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is the case, right?
the blocks here are (0, 0), (1, 1), and (2:4, 2:4)

@@ -252,6 +257,66 @@ TEST_F(Jacobi,
}


TEST_F(Jacobi, ScalarInLargeMatrixEquivalentToRef)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
TEST_F(Jacobi, ScalarInLargeMatrixEquivalentToRef)
TEST_F(Jacobi, ScalarL1LargeMatrixEquivalentToRef)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1:ST:ready-for-review This PR is ready for review mod:core This is related to the core module. mod:cuda This is related to the CUDA module. mod:hip This is related to the HIP module. mod:reference This is related to the reference module. reg:testing This is related to testing. type:preconditioner This is related to the preconditioners
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants