Add zero-checks to axpy-like operations #1573

upsj · 2024-03-18T02:11:21Z

Operations like $\alpha A x + 0 y$ may propagate NaNs from y to the output despite the 0 coefficient. This can be avoided by checking the beta scaling factors for zero explicitly.

TODO:

add reference tests for matrices
add generic tests for Dense

fritzgoebel

LGTM

MarcelKoch

lgtm in general. Maybe there are still some static_cast missing to make it compile.

MarcelKoch · 2024-04-08T11:38:54Z

common/cuda_hip/matrix/csr_kernels.hpp.inc

+        [&beta_val](const type& x) {
+            return is_zero(beta_val) ? zero(beta_val) : beta_val * x;
+        });


minor performance comment, you could try switching the lambda and zero check, i.e.

is_zero(beta) ? [&beta_val](const type& x) { return zero(beta); } : [&beta_val](const type& x) { return beta_val * x; }

But this might not work, since the two branches of the ?: operator have different types. And it might increase compile times, since it might compile the kernel two times

MarcelKoch · 2024-04-08T12:12:20Z

reference/matrix/ell_kernels.cpp

-            arithmetic_type result = c->at(row, j);
-            result *= beta_val;
+            arithmetic_type result =
+                is_zero(beta_val) ? zero(beta_val) : beta_val * c->at(row, j);


Suggested change

is_zero(beta_val) ? zero(beta_val) : beta_val * c->at(row, j);

is_zero(beta_val) ? zero(beta_val) : beta_val * static_cast<arithmetic_type>(c->at(row, j));

to make it compile

yhmtsai · 2024-04-15T07:03:53Z

Is there any reason to avoid propagation of NaN? If there's no performance penalty, I think propagation of NaN is easier to know the algorithm does not work out due to some arthmetic error.

MarcelKoch · 2024-04-15T08:07:43Z

@yhmtsai if you are computing for example y = 0.5 * A * x + 0.0 * y then propagating NaNs from y is unnecessary, since it's mathematical equivalent to just leaving y out. This can easily happen, if y is not initialized before that computation.

yhmtsai · 2024-04-15T08:19:13Z

no, 0 * NaN should be NaN not zero, so it is not mathimatical equality by just leaving them out.
Yes, it might happen in unitialized memory, but I would say it should be properly initialized or using the call without touching the unitialization put. (for us, we may proparbably provide A->apply(alpha, x, y) for y = alpha * A * x)

yhmtsai · 2024-04-15T08:23:25Z

I know current vendor library usually treat 0 as do not touch due to BLAS.
I am not sure the other routines hold the same rule

upsj · 2024-04-15T11:20:35Z

0 * NaN should be NaN not zero

that makes calculations more fragile, we already do similar things (special cases) for zeros inside our solver kernels

MarcelKoch · 2024-04-15T11:49:52Z

@yhmtsai we treat 0 * NaN = 0 only in the case of axpy-like operation. So there will still be NaN propagation in normal SpMV, dot-products, etc. But for these axpy-like operations, users will not care about the IEEE standard. For them, initializing x by setting it to 0 and multiplying it with 0 should be equivalent.

MarcelKoch · 2025-02-18T14:32:48Z

common/cuda_hip/matrix/csr_kernels.template.cpp

Just a mental note for me: the load balanced spmv is also covered, since it used the Dense::scale function if beta != nullptr.

It is inconsistent in SpMV, which should be the same up to the rounding error IMO

common/unified/matrix/dense_kernels.template.cpp

This prevents NaNs from polluting the output

yhmtsai · 2025-02-20T17:08:32Z

In AXPBY SpMV, it takes (nnz + 3n) ValueType + (nnz + n + 1) IndexType
I think the condition are the same in kernel, so we do not have divengent branch but it might change some kerrnel register usage or optimization.
it only changes from (nnz + 3n) ValueType + (nnz + n + 1) IndexType to (nnz + 2n) ValueType + (nnz + n + 1) IndexType
if the right hand side is not able to stay in cache, we might still need nnz for the right hand side vector
we will only change from (2nnz + 2n) ValueType + (nnz + n + 1) IndexType to (2nnz + n) ValueType + (nnz + n + 1) IndexType
I do not think the benefit worth enough such that user might not notice there are some nan turns out.
To me, the case we discuss in meeting is that user set the beta to zero to use $alpha A x$ not do $Ax$ then scale, where we should provide apply(A, x, y) for that.
I know BLAS treats zero as non-operation (without documentation though?) and many vendors follow that.
To be clear, there is no doubt that we should still aim for performance not nan propagation issue.
However, if there's no more work, we should try to keep propagation if it can help user know something wrong not luckily get the solution.
In this pr, we do more work to avoid the propagation, which seems to be opposite direction.
That's why I am not in favor of this pr.

upsj · 2025-02-20T17:37:16Z

The behavior seems well-documented, e.g. with GEMV

When BETA is supplied as zero then Y need not be set on input.

This is also not meant as a performance optimization, rather a safety feature. The argument is mainly that this doesn't have a negative performance impact.

What sources of NaNs can we imagine? Either they come from uninitialized memory, or they come from divergent operations, any others I am missing? Regardless of the source, in an application, if I multiply something with an explicit zero, I as a programmer am saying that I don't care what these values were.

I'm having a hard time imagining inside Ginkgo's solvers where we would end up with a zero scalar that could cause problematic NaNs to disappear. As soon as a NaN appears in a vector, it will infect scalars through reductions and other vectors via updates, I really don't see a way how a scalar could become zero after a NaN was introduced in a vector, which is where this issue would pop up. Can you think of a case? I have the impression that we are discussing something that will not happen in practice.

upsj · 2025-02-20T18:55:21Z

Here is the paper Pratik referred to in today's meeting: Proposed Consistent Exception Handling for the BLAS and LAPACK

yhmtsai · 2025-02-21T08:49:50Z

We seem to use opposite view to see this problem like user-specific vs numerical error during application.
In my case, it will be more from the divergent case.
It is also a range issue of precision. It quite often some vector goes to inf and them we have some scalar is divided by norm (=zero). It may gives zero and inf in the first place but later start to introduce NaN (such as 0*inf).
zero and inf is before nan. nan is usually introduced by zero and inf combination.
It is in the whole application. It is good that we can catch just whenever it happens, but adding check to each operation affects the performance. We definitely not go this direction.
Users specify zero to me is a intentional action, so they should be able to call another function to perform desired operation.
rather call another function than set a scalar to zero.
Currently it mixed the user-defined (explicit) zero and normal (implicit during application) zero.
Good to know they have described the behavior in the document, which reduces my concern a bit.

MarcelKoch · 2025-02-21T09:44:16Z

Users specify zero to me is a intentional action, so they should be able to call another function to perform desired operation. rather call another function than set a scalar to zero.

Just a note, in some cases users don't have any other option. For example, if you want to store the result of $y = \alpha M x$, then users can (and should) only use the advanced apply with $\beta = 0$. We don't need an extra function for that, we just need the advanced apply to do what is mathematically expected.
I can also imagine that users from a more mathematical background will assume that y.scale(0) and y.fill(0) will behave the same.

yhmtsai · 2025-02-21T10:01:04Z

That's the reason why I said we should provide apply(alpha, b, x) for this behavior.
from the function call, it is written in $y = \alpha M x + 0 * y$ not $\alpha M x$
If they consider the number can always be represented in computer and do not know anything about the initialization, then it might be a possible thought.
With some programming knowledge, they propabably know it can be anything if they do not initialize value and operate on it.

upsj self-assigned this Mar 18, 2024

fritzgoebel approved these changes Mar 25, 2024

View reviewed changes

MarcelKoch self-requested a review April 5, 2024 09:27

yhmtsai self-requested a review April 5, 2024 09:33

MarcelKoch added this to the Ginkgo 1.8.0 milestone Apr 5, 2024

MarcelKoch approved these changes Apr 8, 2024

View reviewed changes

tcojean modified the milestones: Ginkgo 1.8.0, Ginkgo 1.9.0 May 31, 2024

MarcelKoch modified the milestones: Ginkgo 1.9.0, Ginkgo 1.10.0 Dec 9, 2024

upsj force-pushed the nan_spmv branch from f7c8214 to 8c94f14 Compare February 18, 2025 12:44

upsj marked this pull request as ready for review February 18, 2025 12:45

upsj force-pushed the nan_spmv branch 2 times, most recently from 25975e4 to f732df9 Compare February 18, 2025 12:51

upsj added the 1:ST:ready-for-review This PR is ready for review label Feb 18, 2025

MarcelKoch approved these changes Feb 18, 2025

View reviewed changes

upsj force-pushed the nan_spmv branch from f732df9 to 5bf2825 Compare February 19, 2025 21:53

upsj added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review labels Feb 19, 2025

upsj added 4 commits February 20, 2025 00:46

add zero-checks to axpy-like operations

d6c9797

This prevents NaNs from polluting the output

add reference tests for zero NaN behavior

2b825e0

fix FetchContent invocation

3c9f026

revert scale zero changes for multiple scaling factors

9b91142

upsj force-pushed the nan_spmv branch from 5bf2825 to 9b91142 Compare February 19, 2025 23:46

upsj merged commit 25b59ec into develop Feb 20, 2025
10 of 11 checks passed

upsj deleted the nan_spmv branch February 20, 2025 07:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add zero-checks to axpy-like operations #1573

Add zero-checks to axpy-like operations #1573

upsj commented Mar 18, 2024 •

edited

Loading

fritzgoebel left a comment

MarcelKoch left a comment

MarcelKoch Apr 8, 2024

MarcelKoch Apr 8, 2024

yhmtsai commented Apr 15, 2024

MarcelKoch commented Apr 15, 2024

yhmtsai commented Apr 15, 2024

yhmtsai commented Apr 15, 2024

upsj commented Apr 15, 2024

MarcelKoch commented Apr 15, 2024

MarcelKoch Feb 18, 2025

yhmtsai Feb 20, 2025

yhmtsai commented Feb 20, 2025

upsj commented Feb 20, 2025

upsj commented Feb 20, 2025

yhmtsai commented Feb 21, 2025

MarcelKoch commented Feb 21, 2025

yhmtsai commented Feb 21, 2025

	is_zero(beta_val) ? zero(beta_val) : beta_val * c->at(row, j);
	is_zero(beta_val) ? zero(beta_val) : beta_val * static_cast<arithmetic_type>(c->at(row, j));

Add zero-checks to axpy-like operations #1573

Add zero-checks to axpy-like operations #1573

Conversation

upsj commented Mar 18, 2024 • edited Loading

fritzgoebel left a comment

Choose a reason for hiding this comment

MarcelKoch left a comment

Choose a reason for hiding this comment

MarcelKoch Apr 8, 2024

Choose a reason for hiding this comment

MarcelKoch Apr 8, 2024

Choose a reason for hiding this comment

yhmtsai commented Apr 15, 2024

MarcelKoch commented Apr 15, 2024

yhmtsai commented Apr 15, 2024

yhmtsai commented Apr 15, 2024

upsj commented Apr 15, 2024

MarcelKoch commented Apr 15, 2024

MarcelKoch Feb 18, 2025

Choose a reason for hiding this comment

yhmtsai Feb 20, 2025

Choose a reason for hiding this comment

yhmtsai commented Feb 20, 2025

upsj commented Feb 20, 2025

upsj commented Feb 20, 2025

yhmtsai commented Feb 21, 2025

MarcelKoch commented Feb 21, 2025

yhmtsai commented Feb 21, 2025

upsj commented Mar 18, 2024 •

edited

Loading