-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the L1 jacobi #1310
base: develop
Are you sure you want to change the base?
Add the L1 jacobi #1310
Conversation
I'm a bit confused. The paper talks about the L1 Jacobi only in a distributed setting, ie the off-diagonal entries you mentioned are the entries of our non-local matrix in the distributed matrix. But your implementation is for non-distributed matrices. Can you perhaps provide another reference for the non-distributed case? |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #1310 +/- ##
===========================================
+ Coverage 90.03% 90.82% +0.79%
===========================================
Files 759 577 -182
Lines 61167 48872 -12295
===========================================
- Hits 55074 44390 -10684
+ Misses 6093 4482 -1611 ☔ View full report in Codecov by Sentry. |
it's only based on the nonoverlapping partition. |
I think sec. 6.2 means using the preconditioner from 6. |
If the partition is only designed with the mpi (let me use the mpi here to distinguish), L1 is indeed normal Jacobi |
there's one reference from precond. They use L1-scalar Jacobi |
The L1 smoother from sec. 6 is an improvement on the hybrid smoother from sec 5. The hybrid smoother applies a local smoother only on the local matrix block of a distributed matrix and ignores the non-local part. The L1 smoother than takes the non-local part into account through the |
Perhaps naming it 'lumped-Jacobi' would be better suited to differentiate between those two variants. And I guess the L1 from the first paper could be implemented as another distributed preconditioner. |
The first paper considers the diagonal entries are positive, so they are equivalent. |
My argument is that what is implemented in this PR is not the L1 from the paper, so we should not call it that. But honestly, I don't have a great overview of how the term L1 smoother is usually interpreted. |
|
It should be theorically L1 smoother. otherwise, I just add some random Jacobi. |
The L1 smoother (Jacobi or GS) explicitly references distributed matrices. I still think that if we want to follow the naming of the paper, the variant implemented here should not be called L1. Instead, we should add a new distributed preconditioner which consists of a local solver (Jacobi, or in the future perhaps GS) and a diagonal matrix with the row-wise sum of the non-local part. |
Maybe I am misunderstanding something, but in the non-distributed case, this is basically diagonal relaxation with weights equal to row sums, right ? So, I think it definitely makes sense to have as a smoother as an alternative to the scalar Jacobi smoother. I think the name L1 is well suited because you are taking the L1 norm on the row space vectors ? Not sure if there is some mathematical basis for this yet, we could also try seeing if any other p-norms also make sense, Particularly with |
If we go with @pratikvn suggestion, and base this version on the L1 row norm, then we need to take the absolute value of the diagonal. |
I am not sure whether to take the absolute value of the diagonal value. |
Taking the absolute value is the only way for this preconditioner to make sense. And for blocks, I would just go with taking the absolute value component-wise. I think then it makes the most sense to add up these block-wise. Currently, you seem to only update the diagonal values of the diagonal block, or maybe I'm mistaken. |
Yes, from the block-version, it only updates the diagonal value. From the paper or formula perspective, I prefer only adding values on the diagonal value without take absolute on the diagonal value first. |
803333e
to
5fc569a
Compare
* If it is true, it generates the preconditioner on A + Diag(sum_{k in | ||
* off diagonal block of i} A_ik) not A. | ||
*/ | ||
bool GKO_FACTORY_PARAMETER_SCALAR(l1, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not convinced that l1
is a good name. Perhaps aggregate
, or aggregate_l1
.
One issue with just l1
is that we will also have the distributed L1 preconditioner, as mentioned in the original paper. But that is quite different from the one here, so we need better naming to prevent confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think they are the same thing. They only use different way to define block (by node, diagonal block, or value), but they all add the L1 norm from the row in off-block-diag to the diag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are not the same. Using the l1 defined here will not give you a distributed preconditioner. Also the distributed version can be used both with local Jacobi and Gauss-Seidel preconditioner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not get it.
If you have the block list, the diagonal block {
the L1 does the diagonal value update
the diagonal value
For sure, L1 in this pr is specific for Jacobi because we can only know the block of block Jacobi in generation.
scalar L1 Jacobi takes each diagonal value as the block
block L1 Jacobi takes the blocks generated or given as the block
distributed L1 takes the local matrix (assume squared) as the block.
The diagonal treatment are the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point is this is a variation of the sequential jacobi. In the distributed case, we would have a separate L1 preconditioner (or with a different name) that is not specific to jacobi.
Having both with the same name would be confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see
Quality Gate passedIssues Measures |
nvhpc 23.3 may optimize the formula in different way for scaled_value in make_diag_dominant, which lead the updated value is slightly different
Co-authored-by: Marcel Koch <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some doc suggestions, otherwise LGTM!
* If it is true, it generates the preconditioner on A + Diag(sum_{k in | ||
* off diagonal block of i} A_ik) not A. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think two things are missing:
- You should mention that you take the absolute values of the elements.
- Maybe also add a note for the scalar case: "When
block_size=1
, this will be equivalent to taking the inverse of the row-sums as the preconditioner."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for changing the name, for me the name is fine now.
I do have some questions about the block version though.
* Use L1 Jacboi, which is introduced in the paper A. H. Baker et al. | ||
* "Multigrid smoothers for ultraparallel computing." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reference is not helpful, since it only mentions the parallel use case, which is not implemented here.
* If it is true, it generates the preconditioner on A + Diag(sum_{k in | ||
* off diagonal block of i} A_ik) not A. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, you should also take the absolute value of the diagonal, otherwise referencing the L1 norm doesn't make sense.
} | ||
off_diag += abs(vals[i]); | ||
} | ||
diag[row] += off_diag; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
diag[row] += off_diag; | |
diag[row] = abs(diag[row]) + off_diag; |
* If it is true, it generates the preconditioner on A + Diag(sum_{k in | ||
* off diagonal block of i} A_ik) not A. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You also need to explain what happens if max_block_size > 1
. TBH I still don't know how to define the aggregate in this case (i.e. should it be block-wise, or just added to the diagonal, is the absolute value taken of the full diagonal block, or just the diagonal?)
} | ||
off_diag += abs(csr->get_const_values()[i]); | ||
} | ||
diag->get_values()[row] += off_diag; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
diag->get_values()[row] += off_diag; | |
diag->get_values()[row] = abs(diag->get_values()[row]) + off_diag; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we discussed this a little bit before, but did not get a reasonable choice when the diagonal values are negative.
L1 is originally designed for postive diagonal value
using Mtx = typename TestFixture::Mtx; | ||
using index_type = typename TestFixture::index_type; | ||
using value_type = typename TestFixture::value_type; | ||
auto mtx = Mtx::create(this->exec, gko::dim<2>{4}, 8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto mtx = Mtx::create(this->exec, gko::dim<2>{4}, 8); | |
auto mtx = Mtx::create(this->exec, gko::dim<2>{4, 4}, 8); |
IMO the single parameter dim
constructor is a bit confusing.
1 1 | ||
2 2 | ||
-5 1 | ||
1 -1 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 1 | |
2 2 | |
-5 1 | |
1 -1 1 | |
1 1 0 0 | |
2 2 0 0 | |
-5 0 1 0 | |
1 0 -1 1 |
using Mtx = typename TestFixture::Mtx; | ||
using index_type = typename TestFixture::index_type; | ||
using value_type = typename TestFixture::value_type; | ||
auto mtx = Mtx::create(this->exec, gko::dim<2>{4}, 9); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto mtx = Mtx::create(this->exec, gko::dim<2>{4}, 9); | |
auto mtx = Mtx::create(this->exec, gko::dim<2>{4, 4}, 9); |
l({{2.0, 1.0, 0.0, 0.0}, | ||
{2.0, 4.0, 0.0, 0.0}, | ||
{-5.0, 0.0, 6.0, 0.0}, | ||
{1.0, 0.0, -1.0, 2.0}}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem right. Are the off-diagonal values of the diagonal block also added to the diagonal of the diagonal block? IMO that shouldn't be the case, only values from the off-diagonal blocks should be used (at least according to the original reference).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is the case, right?
the blocks here are (0, 0), (1, 1), and (2:4, 2:4)
@@ -252,6 +257,66 @@ TEST_F(Jacobi, | |||
} | |||
|
|||
|
|||
TEST_F(Jacobi, ScalarInLargeMatrixEquivalentToRef) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TEST_F(Jacobi, ScalarInLargeMatrixEquivalentToRef) | |
TEST_F(Jacobi, ScalarL1LargeMatrixEquivalentToRef) |
This add the L1 to Jacobi. The detail is shown in https://epubs.siam.org/doi/abs/10.1137/100798806$M$ (which is diagonal of A for scalar Jacobi or diagonal block of A)$M' = M + D^{\mathcal{l}_1}$ whose $D^{\mathcal{l}_1}$ is a diagonal matrix with
original preconditioner is on
L1 smoother is on
$$D_{ii}^{\mathcal{l}1} = \sum{j \in \text{the off-diagonal block of corresponding block of i}}{|A_{ij}|}$$
I am not sure whether the theorems in the paper still work for negative diagonal entries.