Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add zero-checks to axpy-like operations #1573

Merged
merged 4 commits into from
Feb 20, 2025
Merged

Add zero-checks to axpy-like operations #1573

merged 4 commits into from
Feb 20, 2025

Conversation

upsj
Copy link
Member

@upsj upsj commented Mar 18, 2024

Operations like $\alpha A x + 0 y$ may propagate NaNs from y to the output despite the 0 coefficient. This can be avoided by checking the beta scaling factors for zero explicitly.

TODO:

  • add reference tests for matrices
  • add generic tests for Dense

@upsj upsj self-assigned this Mar 18, 2024
@ginkgo-bot ginkgo-bot added reg:testing This is related to testing. mod:cuda This is related to the CUDA module. mod:openmp This is related to the OpenMP module. mod:reference This is related to the reference module. type:matrix-format This is related to the Matrix formats mod:hip This is related to the HIP module. mod:dpcpp This is related to the DPC++ module. labels Mar 18, 2024
Copy link
Collaborator

@fritzgoebel fritzgoebel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MarcelKoch MarcelKoch self-requested a review April 5, 2024 09:27
@yhmtsai yhmtsai self-requested a review April 5, 2024 09:33
@MarcelKoch MarcelKoch added this to the Ginkgo 1.8.0 milestone Apr 5, 2024
Copy link
Member

@MarcelKoch MarcelKoch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm in general. Maybe there are still some static_cast missing to make it compile.

Comment on lines 383 to 385
[&beta_val](const type& x) {
return is_zero(beta_val) ? zero(beta_val) : beta_val * x;
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor performance comment, you could try switching the lambda and zero check, i.e.

is_zero(beta) 
  ? [&beta_val](const type& x) { return zero(beta); } 
  :  [&beta_val](const type& x) { return beta_val * x; }

But this might not work, since the two branches of the ?: operator have different types. And it might increase compile times, since it might compile the kernel two times

arithmetic_type result = c->at(row, j);
result *= beta_val;
arithmetic_type result =
is_zero(beta_val) ? zero(beta_val) : beta_val * c->at(row, j);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
is_zero(beta_val) ? zero(beta_val) : beta_val * c->at(row, j);
is_zero(beta_val) ? zero(beta_val) : beta_val * static_cast<arithmetic_type>(c->at(row, j));

to make it compile

@yhmtsai
Copy link
Member

yhmtsai commented Apr 15, 2024

Is there any reason to avoid propagation of NaN? If there's no performance penalty, I think propagation of NaN is easier to know the algorithm does not work out due to some arthmetic error.

@MarcelKoch
Copy link
Member

@yhmtsai if you are computing for example y = 0.5 * A * x + 0.0 * y then propagating NaNs from y is unnecessary, since it's mathematical equivalent to just leaving y out. This can easily happen, if y is not initialized before that computation.

@yhmtsai
Copy link
Member

yhmtsai commented Apr 15, 2024

no, 0 * NaN should be NaN not zero, so it is not mathimatical equality by just leaving them out.
Yes, it might happen in unitialized memory, but I would say it should be properly initialized or using the call without touching the unitialization put. (for us, we may proparbably provide A->apply(alpha, x, y) for y = alpha * A * x)

@yhmtsai
Copy link
Member

yhmtsai commented Apr 15, 2024

I know current vendor library usually treat 0 as do not touch due to BLAS.
I am not sure the other routines hold the same rule

@upsj
Copy link
Member Author

upsj commented Apr 15, 2024

0 * NaN should be NaN not zero

that makes calculations more fragile, we already do similar things (special cases) for zeros inside our solver kernels

@MarcelKoch
Copy link
Member

@yhmtsai we treat 0 * NaN = 0 only in the case of axpy-like operation. So there will still be NaN propagation in normal SpMV, dot-products, etc. But for these axpy-like operations, users will not care about the IEEE standard. For them, initializing x by setting it to 0 and multiplying it with 0 should be equivalent.

@tcojean tcojean modified the milestones: Ginkgo 1.8.0, Ginkgo 1.9.0 May 31, 2024
@MarcelKoch MarcelKoch modified the milestones: Ginkgo 1.9.0, Ginkgo 1.10.0 Dec 9, 2024
@upsj upsj marked this pull request as ready for review February 18, 2025 12:45
@upsj upsj force-pushed the nan_spmv branch 2 times, most recently from 25975e4 to f732df9 Compare February 18, 2025 12:51
@upsj upsj added the 1:ST:ready-for-review This PR is ready for review label Feb 18, 2025
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a mental note for me: the load balanced spmv is also covered, since it used the Dense::scale function if beta != nullptr.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is inconsistent in SpMV, which should be the same up to the rounding error IMO

@upsj upsj added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review labels Feb 19, 2025
@upsj upsj merged commit 25b59ec into develop Feb 20, 2025
10 of 11 checks passed
@upsj upsj deleted the nan_spmv branch February 20, 2025 07:15
@yhmtsai
Copy link
Member

yhmtsai commented Feb 20, 2025

In AXPBY SpMV, it takes (nnz + 3n) ValueType + (nnz + n + 1) IndexType
I think the condition are the same in kernel, so we do not have divengent branch but it might change some kerrnel register usage or optimization.
it only changes from (nnz + 3n) ValueType + (nnz + n + 1) IndexType to (nnz + 2n) ValueType + (nnz + n + 1) IndexType
if the right hand side is not able to stay in cache, we might still need nnz for the right hand side vector
we will only change from (2nnz + 2n) ValueType + (nnz + n + 1) IndexType to (2nnz + n) ValueType + (nnz + n + 1) IndexType
I do not think the benefit worth enough such that user might not notice there are some nan turns out.
To me, the case we discuss in meeting is that user set the beta to zero to use $alpha A x$ not do $Ax$ then scale, where we should provide apply(A, x, y) for that.
I know BLAS treats zero as non-operation (without documentation though?) and many vendors follow that.
To be clear, there is no doubt that we should still aim for performance not nan propagation issue.
However, if there's no more work, we should try to keep propagation if it can help user know something wrong not luckily get the solution.
In this pr, we do more work to avoid the propagation, which seems to be opposite direction.
That's why I am not in favor of this pr.

@upsj
Copy link
Member Author

upsj commented Feb 20, 2025

The behavior seems well-documented, e.g. with GEMV

When BETA is supplied as zero then Y need not be set on input.

This is also not meant as a performance optimization, rather a safety feature. The argument is mainly that this doesn't have a negative performance impact.

What sources of NaNs can we imagine? Either they come from uninitialized memory, or they come from divergent operations, any others I am missing? Regardless of the source, in an application, if I multiply something with an explicit zero, I as a programmer am saying that I don't care what these values were.

I'm having a hard time imagining inside Ginkgo's solvers where we would end up with a zero scalar that could cause problematic NaNs to disappear. As soon as a NaN appears in a vector, it will infect scalars through reductions and other vectors via updates, I really don't see a way how a scalar could become zero after a NaN was introduced in a vector, which is where this issue would pop up. Can you think of a case? I have the impression that we are discussing something that will not happen in practice.

@upsj
Copy link
Member Author

upsj commented Feb 20, 2025

Here is the paper Pratik referred to in today's meeting: Proposed Consistent Exception Handling for the BLAS and LAPACK

@yhmtsai
Copy link
Member

yhmtsai commented Feb 21, 2025

We seem to use opposite view to see this problem like user-specific vs numerical error during application.
In my case, it will be more from the divergent case.
It is also a range issue of precision. It quite often some vector goes to inf and them we have some scalar is divided by norm (=zero). It may gives zero and inf in the first place but later start to introduce NaN (such as 0*inf).
zero and inf is before nan. nan is usually introduced by zero and inf combination.
It is in the whole application. It is good that we can catch just whenever it happens, but adding check to each operation affects the performance. We definitely not go this direction.
Users specify zero to me is a intentional action, so they should be able to call another function to perform desired operation.
rather call another function than set a scalar to zero.
Currently it mixed the user-defined (explicit) zero and normal (implicit during application) zero.
Good to know they have described the behavior in the document, which reduces my concern a bit.

@MarcelKoch
Copy link
Member

Users specify zero to me is a intentional action, so they should be able to call another function to perform desired operation. rather call another function than set a scalar to zero.

Just a note, in some cases users don't have any other option. For example, if you want to store the result of $y = \alpha M x$, then users can (and should) only use the advanced apply with $\beta = 0$. We don't need an extra function for that, we just need the advanced apply to do what is mathematically expected.
I can also imagine that users from a more mathematical background will assume that y.scale(0) and y.fill(0) will behave the same.

@yhmtsai
Copy link
Member

yhmtsai commented Feb 21, 2025

That's the reason why I said we should provide apply(alpha, b, x) for this behavior.
from the function call, it is written in $y = \alpha M x + 0 * y$ not $\alpha M x$
If they consider the number can always be represented in computer and do not know anything about the initialization, then it might be a possible thought.
With some programming knowledge, they propabably know it can be anything if they do not initialize value and operate on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1:ST:ready-to-merge This PR is ready to merge. mod:cuda This is related to the CUDA module. mod:dpcpp This is related to the DPC++ module. mod:hip This is related to the HIP module. mod:openmp This is related to the OpenMP module. mod:reference This is related to the reference module. reg:testing This is related to testing. type:matrix-format This is related to the Matrix formats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants